Skip to main content

Showing 1–10 of 10 results for author: Dozat, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2305.02549  [pdf, other

    cs.CL cs.CV cs.LG

    FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction

    Authors: Chen-Yu Lee, Chun-Liang Li, Hao Zhang, Timothy Dozat, Vincent Perot, Guolong Su, Xiang Zhang, Kihyuk Sohn, Nikolai Glushnev, Renshen Wang, Joshua Ainslie, Shangbang Long, Siyang Qin, Yasuhisa Fujii, Nan Hua, Tomas Pfister

    Abstract: The recent advent of self-supervised pre-training techniques has led to a surge in the use of multimodal learning in form document understanding. However, existing approaches that extend the mask language modeling to other modalities require careful multi-task tuning, complex reconstruction target designs, or additional pre-training data. In FormNetV2, we introduce a centralized multimodal graph c… ▽ More

    Submitted 13 June, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  4. arXiv:2211.00922  [pdf, other

    cs.CL

    Dialect-robust Evaluation of Generated Text

    Authors: Jiao Sun, Thibault Sellam, Elizabeth Clark, Tu Vu, Timothy Dozat, Dan Garrette, Aditya Siddhant, Jacob Eisenstein, Sebastian Gehrmann

    Abstract: Evaluation metrics that are not robust to dialect variation make it impossible to tell how well systems perform for many groups of users, and can even penalize systems for producing text in lower-resource dialects. However, currently, there exists no way to quantify how metrics respond to change in the dialect of a generated utterance. We thus formalize dialect robustness and dialect awareness as… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  5. arXiv:2210.00193  [pdf, other

    cs.CL

    FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

    Authors: Parker Riley, Timothy Dozat, Jan A. Botha, Xavier Garcia, Dan Garrette, Jason Riesa, Orhan Firat, Noah Constant

    Abstract: We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese. Source documents are selected to enable detailed analysis of phenomena of interest, including lexically distinct terms and distr… ▽ More

    Submitted 3 October, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

    Comments: Published in TACL Vol. 11 (2023)

  6. arXiv:2203.08411  [pdf, other

    cs.CL cs.CV cs.LG

    FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction

    Authors: Chen-Yu Lee, Chun-Liang Li, Timothy Dozat, Vincent Perot, Guolong Su, Nan Hua, Joshua Ainslie, Renshen Wang, Yasuhisa Fujii, Tomas Pfister

    Abstract: Sequence modeling has demonstrated state-of-the-art performance on natural language and document understanding tasks. However, it is challenging to correctly serialize tokens in form-like documents in practice due to their variety of layout patterns. We propose FormNet, a structure-aware sequence model to mitigate the suboptimal serialization of forms. First, we design Rich Attention that leverage… ▽ More

    Submitted 23 March, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted to ACL 2022

  7. arXiv:2102.01335  [pdf, other

    cs.CL cs.AI

    Neural Data Augmentation via Example Extrapolation

    Authors: Kenton Lee, Kelvin Guu, Luheng He, Tim Dozat, Hyung Won Chung

    Abstract: In many applications of machine learning, certain categories of examples may be underrepresented in the training data, causing systems to underperform on such "few-shot" cases at test time. A common remedy is to perform data augmentation, such as by duplicating underrepresented examples, or heuristically synthesizing new examples. But these remedies often fail to cover the full diversity and compl… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

  8. arXiv:1901.10457  [pdf, other

    cs.CL

    Universal Dependency Parsing from Scratch

    Authors: Peng Qi, Timothy Dozat, Yuhao Zhang, Christopher D. Manning

    Abstract: This paper describes Stanford's system at the CoNLL 2018 UD Shared Task. We introduce a complete neural pipeline system that takes raw text as input, and performs all tasks required by the shared task, ranging from tokenization and sentence segmentation, to POS tagging and dependency parsing. Our single system submission achieved very competitive performance on big treebanks. Moreover, after fixin… ▽ More

    Submitted 29 January, 2019; originally announced January 2019.

    Comments: In Proceedings of the CoNLL 2018 UD Shared Task. First three authors contributed roughly equally. Github repo: https://github.com/stanfordnlp/stanfordnlp Website: https://stanfordnlp.github.io/stanfordnlp/

  9. arXiv:1807.01396  [pdf, other

    cs.CL

    Simpler but More Accurate Semantic Dependency Parsing

    Authors: Timothy Dozat, Christopher D. Manning

    Abstract: While syntactic dependency annotations concentrate on the surface or functional structure of a sentence, semantic dependency annotations aim to capture between-word relationships that are more closely related to the meaning of a sentence, using graph-structured representations. We extend the LSTM-based syntactic parser of Dozat and Manning (2017) to train on and generate these graph structures. Th… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.

    Comments: ACL 2018 short paper

  10. arXiv:1611.01734  [pdf, ps, other

    cs.CL cs.NE

    Deep Biaffine Attention for Neural Dependency Parsing

    Authors: Timothy Dozat, Christopher D. Manning

    Abstract: This paper builds off recent work from Kiperwasser & Goldberg (2016) using neural attention in a simple graph-based dependency parser. We use a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, with biaffine classifiers to predict arcs and labels. Our parser gets state of the art or near state of the art performance on standard treebanks for six different lan… ▽ More

    Submitted 9 March, 2017; v1 submitted 6 November, 2016; originally announced November 2016.

    Comments: Accepted to ICLR 2017; updated with new results and comparison to more recent models, including current state-of-the-art