-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Formalising Concepts as Grounded Abstractions
Authors:
Stephen Clark,
Alexander Lerchner,
Tamara von Glehn,
Olivier Tieleman,
Richard Tanburn,
Misha Dashevskiy,
Matko Bosnjak
Abstract:
The notion of concept has been studied for centuries, by philosophers, linguists, cognitive scientists, and researchers in artificial intelligence (Margolis & Laurence, 1999). There is a large literature on formal, mathematical models of concepts, including a whole sub-field of AI -- Formal Concept Analysis -- devoted to this topic (Ganter & Obiedkov, 2016). Recently, researchers in machine learni…
▽ More
The notion of concept has been studied for centuries, by philosophers, linguists, cognitive scientists, and researchers in artificial intelligence (Margolis & Laurence, 1999). There is a large literature on formal, mathematical models of concepts, including a whole sub-field of AI -- Formal Concept Analysis -- devoted to this topic (Ganter & Obiedkov, 2016). Recently, researchers in machine learning have begun to investigate how methods from representation learning can be used to induce concepts from raw perceptual data (Higgins, Sonnerat, et al., 2018). The goal of this report is to provide a formal account of concepts which is compatible with this latest work in deep learning.
The main technical goal of this report is to show how techniques from representation learning can be married with a lattice-theoretic formulation of conceptual spaces. The mathematics of partial orders and lattices is a standard tool for modelling conceptual spaces (Ch.2, Mitchell (1997), Ganter and Obiedkov (2016)); however, there is no formal work that we are aware of which defines a conceptual lattice on top of a representation that is induced using unsupervised deep learning (Goodfellow et al., 2016). The advantages of partially-ordered lattice structures are that these provide natural mechanisms for use in concept discovery algorithms, through the meets and joins of the lattice.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
AlignNet: Unsupervised Entity Alignment
Authors:
Antonia Creswell,
Kyriacos Nikiforou,
Oriol Vinyals,
Andre Saraiva,
Rishabh Kabra,
Loic Matthey,
Chris Burgess,
Malcolm Reynolds,
Richard Tanburn,
Marta Garnelo,
Murray Shanahan
Abstract:
Recently developed deep learning models are able to learn to segment scenes into component objects without supervision. This opens many new and exciting avenues of research, allowing agents to take objects (or entities) as inputs, rather that pixels. Unfortunately, while these models provide excellent segmentation of a single frame, they do not keep track of how objects segmented at one time-step…
▽ More
Recently developed deep learning models are able to learn to segment scenes into component objects without supervision. This opens many new and exciting avenues of research, allowing agents to take objects (or entities) as inputs, rather that pixels. Unfortunately, while these models provide excellent segmentation of a single frame, they do not keep track of how objects segmented at one time-step correspond (or align) to those at a later time-step. The alignment (or correspondence) problem has impeded progress towards using object representations in downstream tasks. In this paper we take steps towards solving the alignment problem, presenting the AlignNet, an unsupervised alignment module.
△ Less
Submitted 21 July, 2020; v1 submitted 17 July, 2020;
originally announced July 2020.
-
Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
Authors:
Tom Le Paine,
Caglar Gulcehre,
Bobak Shahriari,
Misha Denil,
Matt Hoffman,
Hubert Soyer,
Richard Tanburn,
Steven Kapturowski,
Neil Rabinowitz,
Duncan Williams,
Gabriel Barth-Maron,
Ziyu Wang,
Nando de Freitas,
Worlds Team
Abstract:
This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. We also introduce a suite of eight tasks that combine these three properties, and show that R2D3 can solve several of the tasks where other state of the art methods (both with and without demonstrations) fai…
▽ More
This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. We also introduce a suite of eight tasks that combine these three properties, and show that R2D3 can solve several of the tasks where other state of the art methods (both with and without demonstrations) fail to see even a single successful trajectory after tens of billions of steps of exploration.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
High-fidelity adiabatic quantum computation using the intrinsic Hamiltonian of a spin system: Application to the experimental factorization of 291311
Authors:
Zhaokai Li,
Nikesh S. Dattani,
Xi Chen,
Xiaomei Liu,
Hengyan Wang,
Richard Tanburn,
Hongwei Chen,
Xinhua Peng,
Jiangfeng Du
Abstract:
In previous implementations of adiabatic quantum algorithms using spin systems, the average Hamiltonian method with Trotter's formula was conventionally adopted to generate an effective instantaneous Hamiltonian that simulates an adiabatic passage. However, this approach had issues with the precision of the effective Hamiltonian and with the adiabaticity of the evolution. In order to address these…
▽ More
In previous implementations of adiabatic quantum algorithms using spin systems, the average Hamiltonian method with Trotter's formula was conventionally adopted to generate an effective instantaneous Hamiltonian that simulates an adiabatic passage. However, this approach had issues with the precision of the effective Hamiltonian and with the adiabaticity of the evolution. In order to address these, we here propose and experimentally demonstrate a novel scheme for adiabatic quantum computation by using the intrinsic Hamiltonian of a realistic spin system to represent the problem Hamiltonian while adiabatically driving the system by an extrinsic Hamiltonian directly induced by electromagnetic pulses. In comparison to the conventional method, we observed two advantages of our approach: improved ease of implementation and higher fidelity. As a showcase example of our approach, we experimentally factor 291311, which is larger than any other quantum factorization known.
△ Less
Submitted 25 June, 2017;
originally announced June 2017.
-
Crushing runtimes in adiabatic quantum computation with Energy Landscape Manipulation (ELM): Application to Quantum Factoring
Authors:
Richard Tanburn,
Oliver Lunt,
Nikesh S. Dattani
Abstract:
We introduce two methods for speeding up adiabatic quantum computations by increasing the energy between the ground and first excited states. Our methods are even more general. They can be used to shift a Hamiltonian's density of states away from the ground state, so that fewer states occupy the low-lying energies near the minimum, hence allowing for faster adiabatic passages to find the ground st…
▽ More
We introduce two methods for speeding up adiabatic quantum computations by increasing the energy between the ground and first excited states. Our methods are even more general. They can be used to shift a Hamiltonian's density of states away from the ground state, so that fewer states occupy the low-lying energies near the minimum, hence allowing for faster adiabatic passages to find the ground state with less risk of getting caught in an undesired low-lying excited state during the passage. Even more generally, our methods can be used to transform a discrete optimization problem into a new one whose unique minimum still encodes the desired answer, but with the objective function's values forming a different landscape. Aspects of the landscape such as the objective function's range, or the values of certain coefficients, or how many different inputs lead to a given output value, can be decreased *or* increased. One of the many examples for which these methods are useful is in finding the ground state of a Hamiltonian using NMR: If it is difficult to find a molecule such that the distances between the spins match the interactions in the Hamiltonian, the interactions in the Hamiltonian can be changed without at all changing the ground state. We apply our methods to an AQC algorithm for integer factorization, and the first method reduces the maximum runtime in our example by up to 754%, and the second method reduces the maximum runtime of another example by up to 250%. These two methods may also be combined.
△ Less
Submitted 26 October, 2015;
originally announced October 2015.
-
Reducing multi-qubit interactions in adiabatic quantum computation without adding auxiliary qubits. Part 2: The "split-reduc" method and its application to quantum determination of Ramsey numbers
Authors:
Emile Okada,
Richard Tanburn,
Nikesh S. Dattani
Abstract:
Quantum annealing has recently been used to determine the Ramsey numbers R(m,2) for 3 < m < 9 and R(3,3) [Bian et al. (2013) PRL 111, 130505]. This was greatly celebrated as the largest experimental implementation of an adiabatic evolution algorithm to that date. However, in that computation, more than 66% of the qubits used were auxiliary qubits, so the sizes of the Ramsey number Hamiltonians use…
▽ More
Quantum annealing has recently been used to determine the Ramsey numbers R(m,2) for 3 < m < 9 and R(3,3) [Bian et al. (2013) PRL 111, 130505]. This was greatly celebrated as the largest experimental implementation of an adiabatic evolution algorithm to that date. However, in that computation, more than 66% of the qubits used were auxiliary qubits, so the sizes of the Ramsey number Hamiltonians used were tremendously smaller than the full 128-qubit capacity of the device used. The reason these auxiliary qubits were needed was because the best quantum annealing devices at the time (and still now) cannot implement multi-qubit interactions beyond 2-qubit interactions, and they are also limited in their capacity for 2-qubit interactions. We present a method which allows the full qubit capacity of a quantum annealing device to be used, by reducing multi-qubit and 2-qubit interactions. With our method, the device used in the 2013 Ramsey number quantum computation could have determined R(16,2) and R(4,3) with under 10 minutes of runtime.
△ Less
Submitted 28 August, 2015;
originally announced August 2015.
-
Reducing multi-qubit interactions in adiabatic quantum computation without adding auxiliary qubits. Part 1: The "deduc-reduc" method and its application to quantum factorization of numbers
Authors:
Richard Tanburn,
Emile Okada,
Nike Dattani
Abstract:
Adiabatic quantum computing has recently been used to factor 56153 [Dattani & Bryans, arXiv:1411.6758] at room temperature, which is orders of magnitude larger than any number attempted yet using Shor's algorithm (circuit-based quantum computation). However, this number is still vastly smaller than RSA-768 which is the largest number factored thus far on a classical computer. We address a major is…
▽ More
Adiabatic quantum computing has recently been used to factor 56153 [Dattani & Bryans, arXiv:1411.6758] at room temperature, which is orders of magnitude larger than any number attempted yet using Shor's algorithm (circuit-based quantum computation). However, this number is still vastly smaller than RSA-768 which is the largest number factored thus far on a classical computer. We address a major issue arising in the scaling of adiabatic quantum factorization to much larger numbers. Namely, the existence of many 4-qubit, 3-qubit and 2-qubit interactions in the Hamiltonians. We showcase our method on various examples, one of which shows that we can remove 94% of the 4-qubit interactions and 83% of the 3-qubit interactions in the factorization of a 25-digit number with almost no effort, without adding any auxiliary qubits. Our method is not limited to quantum factoring. Its importance extends to the wider field of discrete optimization. Any CSP (constraint-satisfiability problem), psuedo-boolean optimization problem, or QUBO (quadratic unconstrained Boolean optimization) problem can in principle benefit from the "deduction-reduction" method which we introduce in this paper. We provide an open source code which takes in a Hamiltonian (or a discrete discrete function which needs to be optimized), and returns a Hamiltonian that has the same unique ground state(s), no new auxiliary variables, and as few multi-qubit (multi-variable) terms as possible with deduc-reduc.
△ Less
Submitted 1 October, 2015; v1 submitted 19 August, 2015;
originally announced August 2015.