Skip to main content

Showing 1–12 of 12 results for author: Tsai, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  2. arXiv:2401.12208  [pdf, other

    cs.CV cs.CL

    CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

    Authors: Zhihong Chen, Maya Varma, Jean-Benoit Delbrouck, Magdalini Paschali, Louis Blankemeier, Dave Van Veen, Jeya Maria Jose Valanarasu, Alaa Youssef, Joseph Paul Cohen, Eduardo Pontes Reis, Emily B. Tsai, Andrew Johnston, Cameron Olsen, Tanishq Mathew Abraham, Sergios Gatidis, Akshay S. Chaudhari, Curtis Langlotz

    Abstract: Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, develo** FMs that can accurately interpret CXRs is challengin… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 24 pages, 8 figures

  3. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2210.03941  [pdf, other

    cs.CV cs.CL

    Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling

    Authors: Hsin-Ying Lee, Hung-Ting Su, Bing-Chen Tsai, Tsung-Han Wu, Jia-Fong Yeh, Winston H. Hsu

    Abstract: While recent large-scale video-language pre-training made great progress in video question answering, the design of spatial modeling of video-language models is less fine-grained than that of image-language models; existing practices of temporal modeling also suffer from weak and noisy alignment between modalities. To learn fine-grained visual understanding, we decouple spatial-temporal modeling a… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

    Comments: BMVC 2022. Code is available at https://github.com/shinying/dest

  5. arXiv:2108.04542  [pdf, other

    cs.AI cs.CV

    TrUMAn: Trope Understanding in Movies and Animations

    Authors: Hung-Ting Su, Po-Wei Shen, Bing-Chen Tsai, Wen-Feng Cheng, Ke-Jyun Wang, Winston H. Hsu

    Abstract: Understanding and comprehending video content is crucial for many real-world applications such as search and recommendation systems. While recent progress of deep learning has boosted performance on various tasks using visual cues, deep cognition to reason intentions, motivation, or causality remains challenging. Existing datasets that aim to examine video reasoning capability focus on visual sign… ▽ More

    Submitted 21 August, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

    Comments: CIKM 2021. The first two authors contributed equally to this work

  6. arXiv:2010.10042  [pdf, other

    cs.CL

    Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation

    Authors: Yasuhide Miura, Yuhao Zhang, Emily Bao Tsai, Curtis P. Langlotz, Dan Jurafsky

    Abstract: Neural image-to-text radiology report generation systems offer the potential to improve radiology reporting by reducing the repetitive process of report drafting and identifying possible medical errors. However, existing report generation systems, despite achieving high performances on natural language generation metrics such as CIDEr or BLEU, still suffer from incomplete and inconsistent generati… ▽ More

    Submitted 12 April, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: Accepted to NAACL-HLT 2021

  7. arXiv:2006.04451  [pdf, other

    cs.CV

    Novel Adaptive Binary Search Strategy-First Hybrid Pyramid- and Clustering-Based CNN Filter Pruning Method without Parameters Setting

    Authors: Kuo-Liang Chung, Yu-Lun Chang, Bo-Wei Tsai

    Abstract: Pruning redundant filters in CNN models has received growing attention. In this paper, we propose an adaptive binary search-first hybrid pyramid- and clustering-based (ABSHPC-based) method for pruning filters automatically. In our method, for each convolutional layer, initially a hybrid pyramid data structure is constructed to store the hierarchical information of each filter. Given a tolerant acc… ▽ More

    Submitted 30 April, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

  8. arXiv:2005.09218  [pdf, other

    cs.LG stat.ML

    Large Margin Mechanism and Pseudo Query Set on Cross-Domain Few-Shot Learning

    Authors: Jia-Fong Yeh, Hsin-Ying Lee, Bing-Chen Tsai, Yi-Rong Chen, **-Chia Huang, Winston H. Hsu

    Abstract: In recent years, few-shot learning problems have received a lot of attention. While methods in most previous works were trained and tested on datasets in one single domain, cross-domain few-shot learning is a brand-new branch of few-shot learning problems, where models handle datasets in different domains between training and testing phases. In this paper, to solve the problem that the model is pr… ▽ More

    Submitted 6 February, 2024; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: Full version of the CDFSL competition report (in CVPRW'20), archived

  9. arXiv:1911.02541  [pdf, other

    cs.CL

    Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports

    Authors: Yuhao Zhang, Derek Merck, Emily Bao Tsai, Christopher D. Manning, Curtis P. Langlotz

    Abstract: Neural abstractive summarization models are able to generate summaries which have high overlap with human references. However, existing models are not optimized for factual correctness, a critical metric in real-world applications. In this work, we develop a general framework where we evaluate the factual correctness of a generated summary by fact-checking it automatically against its reference us… ▽ More

    Submitted 27 April, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

    Comments: ACL2020. 13 pages with appendices

  10. arXiv:1910.12453  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Asynchronous Methods for Model-Based Reinforcement Learning

    Authors: Yunzhi Zhang, Ignasi Clavera, Boren Tsai, Pieter Abbeel

    Abstract: Significant progress has been made in the area of model-based reinforcement learning. State-of-the-art algorithms are now able to match the asymptotic performance of model-free methods while being significantly more data efficient. However, this success has come at a price: state-of-the-art model-based methods require significant computation interleaved with data collection, resulting in run times… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: 10 pages, CoRL 2019

  11. arXiv:1901.02999  [pdf

    cs.AI cs.HC

    PFML-based Semantic BCI Agent for Game of Go Learning and Prediction

    Authors: Chang-Shing Lee, Mei-Hui Wang, Li-Wei Ko, Bo-Yu Tsai, Yi-Lin Tsai, Sheng-Chi Yang, Lu-An Lin, Yi-Hsiu Lee, Hirofumi Ohashi, Naoyuki Kubota, Nan Shuo

    Abstract: This paper presents a semantic brain computer interface (BCI) agent with particle swarm optimization (PSO) based on a Fuzzy Markup Language (FML) for Go learning and prediction applications. Additionally, we also establish an Open Go Darkforest (OGD) cloud platform with Facebook AI research (FAIR) open source Darkforest and ELF OpenGo AI bots. The Japanese robot Palro will simultaneously predict t… ▽ More

    Submitted 9 January, 2019; originally announced January 2019.

  12. arXiv:1704.08509  [pdf, other

    cs.CV cs.AI

    No More Discrimination: Cross City Adaptation of Road Scene Segmenters

    Authors: Yi-Hsin Chen, Wei-Yu Chen, Yu-Ting Chen, Bo-Cheng Tsai, Yu-Chiang Frank Wang, Min Sun

    Abstract: Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases. Instead of collecting a large number of annotated images of each city of interest to train or refine the segmenter, we propose an unsupervised learning app… ▽ More

    Submitted 27 April, 2017; originally announced April 2017.

    Comments: 13 pages, 10 figures