Skip to main content

Showing 1–19 of 19 results for author: Kocisky, T

.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2205.11388  [pdf, other

    cs.CL cs.LG

    StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models

    Authors: Adam Liška, Tomáš Kočiský, Elena Gribovskaya, Tayfun Terzi, Eren Sezener, Devang Agrawal, Cyprien de Masson d'Autume, Tim Scholtes, Manzil Zaheer, Susannah Young, Ellen Gilsenan-McMahon, Sophia Austin, Phil Blunsom, Angeliki Lazaridou

    Abstract: Knowledge and language understanding of models evaluated through question answering (QA) has been usually studied on static snapshots of knowledge, like Wikipedia. However, our world is dynamic, evolves over time, and our models' knowledge becomes outdated. To study how semi-parametric QA models and their underlying parametric language models (LMs) adapt to evolving knowledge, we construct a new l… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  4. arXiv:2202.01709  [pdf, other

    cs.CL cs.LG

    Towards Coherent and Consistent Use of Entities in Narrative Generation

    Authors: Pinelopi Papalampidi, Kris Cao, Tomas Kocisky

    Abstract: Large pre-trained language models (LMs) have demonstrated impressive capabilities in generating long, fluent text; however, there is little to no analysis on their ability to maintain entity coherence and consistency. In this work, we focus on the end task of narrative generation and systematically analyse the long-range entity coherence and consistency in generated stories. First, we propose a se… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  5. arXiv:2102.01951  [pdf, other

    cs.CL cs.AI

    Mind the Gap: Assessing Temporal Generalization in Neural Language Models

    Authors: Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d'Autume, Tomas Kocisky, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, Phil Blunsom

    Abstract: Our world is open-ended, non-stationary, and constantly evolving; thus what we talk about and how we talk about it change over time. This inherent dynamic nature of language contrasts with the current static language modelling paradigm, which trains and evaluates models on utterances from overlap** time periods. Despite impressive recent progress, we demonstrate that Transformer-XL language mode… ▽ More

    Submitted 26 October, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: To appear as a Spotlight at NeurIPS 2021

  6. arXiv:1909.01792  [pdf, other

    cs.CL

    Mogrifier LSTM

    Authors: Gábor Melis, Tomáš Kočiský, Phil Blunsom

    Abstract: Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mut… ▽ More

    Submitted 29 January, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

  7. arXiv:1901.11373  [pdf, other

    cs.LG cs.CL stat.ML

    Learning and Evaluating General Linguistic Intelligence

    Authors: Dani Yogatama, Cyprien de Masson d'Autume, Jerome Connor, Tomas Kocisky, Mike Chrzanowski, Lingpeng Kong, Angeliki Lazaridou, Wang Ling, Lei Yu, Chris Dyer, Phil Blunsom

    Abstract: We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation to evaluate them against these criteria through a series of ex… ▽ More

    Submitted 31 January, 2019; originally announced January 2019.

  8. arXiv:1807.01670  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Encoding Spatial Relations from Natural Language

    Authors: Tiago Ramalho, Tomáš Kočiský, Frederic Besse, S. M. Ali Eslami, Gábor Melis, Fabio Viola, Phil Blunsom, Karl Moritz Hermann

    Abstract: Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes.… ▽ More

    Submitted 5 July, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

  9. arXiv:1805.09208  [pdf, other

    stat.ML cs.CL cs.LG

    Pushing the bounds of dropout

    Authors: Gábor Melis, Charles Blundell, Tomáš Kočiský, Karl Moritz Hermann, Chris Dyer, Phil Blunsom

    Abstract: We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that… ▽ More

    Submitted 27 September, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

  10. arXiv:1712.07040  [pdf, other

    cs.CL cs.AI cs.NE

    The NarrativeQA Reading Comprehension Challenge

    Authors: Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, Edward Grefenstette

    Abstract: Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecti… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

  11. arXiv:1706.02596  [pdf, other

    cs.CL cs.AI cs.NE

    Dynamic Integration of Background Knowledge in Neural NLU Systems

    Authors: Dirk Weissenborn, Tomáš Kočiský, Chris Dyer

    Abstract: Common-sense and background knowledge is required to understand natural language, but in most neural natural language understanding (NLU) systems, this knowledge must be acquired from training corpora during learning, and then it is static at test time. We introduce a new architecture for the dynamic integration of explicit background knowledge in NLU models. A general-purpose reading module reads… ▽ More

    Submitted 21 August, 2018; v1 submitted 8 June, 2017; originally announced June 2017.

  12. arXiv:1611.02554  [pdf, ps, other

    cs.CL cs.AI cs.NE

    The Neural Noisy Channel

    Authors: Lei Yu, Phil Blunsom, Chris Dyer, Edward Grefenstette, Tomas Kocisky

    Abstract: We formulate sequence to sequence transduction as a noisy channel decoding problem and use recurrent neural networks to parameterise the source and channel models. Unlike direct models which can suffer from explaining-away effects during training, noisy channel models must produce outputs that explain their inputs, and their component models can be trained with not only paired training samples but… ▽ More

    Submitted 6 March, 2017; v1 submitted 8 November, 2016; originally announced November 2016.

    Comments: ICLR 2017

  13. arXiv:1609.09315  [pdf, other

    cs.CL cs.AI cs.NE

    Semantic Parsing with Semi-Supervised Sequential Autoencoders

    Authors: Tomáš Kočiský, Gábor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, Karl Moritz Hermann

    Abstract: We present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing. The unsupervised component is based on a generative model in which latent sentences generate the unpaired logical forms. We apply this method to a number of semantic parsing tasks focusing on domains with limited access to labelled training data and extend those datasets with synthetically gener… ▽ More

    Submitted 29 September, 2016; originally announced September 2016.

  14. arXiv:1604.01946  [pdf, ps, other

    cs.LG cs.NE

    Optimizing Performance of Recurrent Neural Networks on GPUs

    Authors: Jeremy Appleyard, Tomas Kocisky, Phil Blunsom

    Abstract: As recurrent neural networks become larger and deeper, training times for single networks are rising into weeks or even months. As such there is a significant incentive to improve the performance and scalability of these networks. While GPUs have become the hardware of choice for training and deploying recurrent models, the implementations employed often make use of only basic optimizations for th… ▽ More

    Submitted 7 April, 2016; originally announced April 2016.

  15. arXiv:1603.06744  [pdf, other

    cs.CL cs.NE

    Latent Predictor Networks for Code Generation

    Authors: Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Andrew Senior, Fumin Wang, Phil Blunsom

    Abstract: Many language generation tasks require the production of text conditioned on both structured and unstructured inputs. We present a novel neural network architecture which generates an output sequence conditioned on an arbitrary number of input functions. Crucially, our approach allows both the choice of conditioning context and the granularity of generation, for example characters or tokens, to be… ▽ More

    Submitted 8 June, 2016; v1 submitted 22 March, 2016; originally announced March 2016.

  16. arXiv:1509.06664  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Reasoning about Entailment with Neural Attention

    Authors: Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom

    Abstract: While most approaches to automatically recognizing entailment relations have used classifiers employing hand engineered features derived from complex natural language processing pipelines, in practice their performance has been only slightly better than bag-of-word pair classifiers using only lexical similarity. The only attempt so far to build an end-to-end differentiable neural network for entai… ▽ More

    Submitted 1 March, 2016; v1 submitted 22 September, 2015; originally announced September 2015.

    Comments: ICLR 2016 camera-ready, 9 pages, 10 figures (incl. subfigures)

    MSC Class: 68T50 ACM Class: I.2.6; I.2.7

  17. arXiv:1506.03340  [pdf, other

    cs.CL cs.AI cs.NE

    Teaching Machines to Read and Comprehend

    Authors: Karl Moritz Hermann, Tomáš Kočiský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, Phil Blunsom

    Abstract: Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides la… ▽ More

    Submitted 19 November, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

    Comments: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 14 pages, 13 figures

  18. arXiv:1405.0947  [pdf, other

    cs.CL

    Learning Bilingual Word Representations by Marginalizing Alignments

    Authors: Tomáš Kočiský, Karl Moritz Hermann, Phil Blunsom

    Abstract: We present a probabilistic model that simultaneously learns alignments and distributed representations for bilingual data. By marginalizing over word alignments the model captures a larger semantic context than prior work relying on hard alignments. The advantage of this approach is demonstrated in a cross-lingual classification task, where we outperform the prior published state of the art.

    Submitted 5 May, 2014; originally announced May 2014.

    Comments: Proceedings of ACL 2014 (Short Papers)

  19. arXiv:1307.0441  [pdf, ps, other

    cs.DB cs.DS

    Aggregation and Ordering in Factorised Databases

    Authors: Nurzhan Bakibayev, Tomáš Kočiský, Dan Olteanu, Jakub Závodný

    Abstract: A common approach to data analysis involves understanding and manipulating succinct representations of data. In earlier work, we put forward a succinct representation system for relational data called factorised databases and reported on the main-memory query engine FDB for select-project-join queries on such databases. In this paper, we extend FDB to support a larger class of practical queries… ▽ More

    Submitted 1 July, 2013; originally announced July 2013.

    Comments: 12 pages, 8 figures