Search | arXiv e-print repository

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2309.00369 [pdf, other]

Bayesian estimation and reconstruction of marine surface contaminant dispersion

Authors: Yang Liu, Christopher M. Harvey, Frederick E. Hamlyn, Cunjia Liu

Abstract: Discharge of hazardous substances into the marine environment poses a substantial risk to both public health and the ecosystem. In such incidents, it is imperative to accurately estimate the release strength of the source and reconstruct the spatio-temporal dispersion of the substances based on the collected measurements. In this study, we propose an integrated estimation framework to tackle this… ▽ More Discharge of hazardous substances into the marine environment poses a substantial risk to both public health and the ecosystem. In such incidents, it is imperative to accurately estimate the release strength of the source and reconstruct the spatio-temporal dispersion of the substances based on the collected measurements. In this study, we propose an integrated estimation framework to tackle this challenge, which can be used in conjunction with a sensor network or a mobile sensor for environment monitoring. We employ the fundamental convection-diffusion partial differential equation (PDE) to represent the general dispersion of a physical quantity in a non-uniform flow field. The PDE model is spatially discretised into a linear state-space model using the dynamic transient finite-element method (FEM) so that the characterisation of time-varying dispersion can be cast into the problem of inferring the model states from sensor measurements. We also consider imperfect sensing phenomena, including miss-detection and signal quantisation, which are frequently encountered when using a sensor network. This complicated sensor process introduces nonlinearity into the Bayesian estimation process. A Rao-Blackwellised particle filter (RBPF) is designed to provide an effective solution by exploiting the linear structure of the state-space model, whereas the nonlinearity of the measurement model can be handled by Monte Carlo approximation with particles. The proposed framework is validated using a simulated oil spill incident in the Baltic sea with real ocean flow data. The results show the efficacy of the developed spatio-temporal dispersion model and estimation schemes in the presence of imperfect measurements. Moreover, the parameter selection process is discussed, along with some comparison studies to illustrate the advantages of the proposed algorithm over existing methods. △ Less

Submitted 4 September, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

arXiv:2305.11559 [pdf, other]

The Barriers to Online Clothing Websites for Visually Impaired People: An Interview and Observation Approach to Understanding Needs

Authors: Amnah Alluqmani, Morgan Harvey, Ziqi Zhang

Abstract: Visually impaired (VI) people often face challenges when performing everyday tasks and identify shop** for clothes as one of the most challenging. Many engage in online shop**, which eliminates some challenges of physical shop**. However, clothes shop** online suffers from many other limitations and barriers. More research is needed to address these challenges, and extant works often base… ▽ More Visually impaired (VI) people often face challenges when performing everyday tasks and identify shop** for clothes as one of the most challenging. Many engage in online shop**, which eliminates some challenges of physical shop**. However, clothes shop** online suffers from many other limitations and barriers. More research is needed to address these challenges, and extant works often base their findings on interviews alone, providing only subjective, recall-biased information. We conducted two complementary studies using both observational and interview approaches to fill a gap in understanding about VI people's behaviour when selecting and purchasing clothes online. Our findings show that shop** websites suffer from inaccurate, misleading, and contradictory clothing descriptions; that VI people mainly rely on (unreliable) search tools and check product descriptions by reviewing customer comments. Our findings also indicate that VI people are hesitant to accept assistance from automated, but that trust in such systems could be improved if researchers can develop systems that better accommodate users' needs and preferences. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2301.13507 [pdf, ps, other]

An Analysis of Classification Approaches for Hit Song Prediction using Engineered Metadata Features with Lyrics and Audio Features

Authors: Mengyisong Zhao, Morgan Harvey, David Cameron, Frank Hopfgartner, Valerie J. Gillet

Abstract: Hit song prediction, one of the emerging fields in music information retrieval (MIR), remains a considerable challenge. Being able to understand what makes a given song a hit is clearly beneficial to the whole music industry. Previous approaches to hit song prediction have focused on using audio features of a record. This study aims to improve the prediction result of the top 10 hits among Billboa… ▽ More Hit song prediction, one of the emerging fields in music information retrieval (MIR), remains a considerable challenge. Being able to understand what makes a given song a hit is clearly beneficial to the whole music industry. Previous approaches to hit song prediction have focused on using audio features of a record. This study aims to improve the prediction result of the top 10 hits among Billboard Hot 100 songs using more alternative metadata, including song audio features provided by Spotify, song lyrics, and novel metadata-based features (title topic, popularity continuity and genre class). Five machine learning approaches are applied, including: k-nearest neighbours, Naive Bayes, Random Forest, Logistic Regression and Multilayer Perceptron. Our results show that Random Forest (RF) and Logistic Regression (LR) with all features (including novel features, song audio features and lyrics features) outperforms other models, achieving 89.1% and 87.2% accuracy, and 0.91 and 0.93 AUC, respectively. Our findings also demonstrate the utility of our novel music metadata features, which contributed most to the models' discriminative performance. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2111.01359 [pdf, other]

Unfoldings and Nets of Regular Polytopes

Authors: Satyan L. Devadoss, Matthew Harvey

Abstract: Over a decade ago, it was shown that every edge unfolding of the Platonic solids was without self-overlap, yielding a valid net. We consider this property for regular polytopes in arbitrary dimensions, notably the simplex, cube, and orthoplex. It was recently proven that all unfoldings of the $n$-cube yield nets. We show this is also true for the $n$-simplex and the $4$-orthoplex but demonstrate i… ▽ More Over a decade ago, it was shown that every edge unfolding of the Platonic solids was without self-overlap, yielding a valid net. We consider this property for regular polytopes in arbitrary dimensions, notably the simplex, cube, and orthoplex. It was recently proven that all unfoldings of the $n$-cube yield nets. We show this is also true for the $n$-simplex and the $4$-orthoplex but demonstrate its surprising failure for any orthoplex of higher dimension. △ Less

Submitted 1 November, 2021; originally announced November 2021.

Comments: 12 pages, 6 figures

arXiv:1812.07081 [pdf, other]

doi 10.1145/3295750.3298923

Understanding Mobile Search Task Relevance and User Behaviour in Context

Authors: Mohammad Aliannejadi, Morgan Harvey, Luca Costa, Matthew Pointon, Fabio Crestani

Abstract: Improvements in mobile technologies have led to a dramatic change in how and when people access and use information, and is having a profound impact on how users address their daily information needs. Smart phones are rapidly becoming our main method of accessing information and are frequently used to perform `on-the-go' search tasks. As research into information retrieval continues to evolve, eva… ▽ More Improvements in mobile technologies have led to a dramatic change in how and when people access and use information, and is having a profound impact on how users address their daily information needs. Smart phones are rapidly becoming our main method of accessing information and are frequently used to perform `on-the-go' search tasks. As research into information retrieval continues to evolve, evaluating search behaviour in context is relatively new. Previous research has studied the effects of context through either self-reported diary studies or quantitative log analysis; however, neither approach is able to accurately capture context of use at the time of searching. In this study, we aim to gain a better understanding of task relevance and search behaviour via a task-based user study (n=31) employing a bespoke Android app. The app allowed us to accurately capture the user's context when completing tasks at different times of the day over the period of a week. Through analysis of the collected data, we gain a better understanding of how using smart phones on the go impacts search behaviour, search performance and task relevance and whether or not the actual context is an important factor. △ Less

Submitted 13 January, 2019; v1 submitted 17 December, 2018; originally announced December 2018.

Comments: To appear in CHIIR 2019 in Glasgow, UK

arXiv:1710.10629 [pdf, other]

Dimensionality reduction methods for molecular simulations

Authors: Stefan Doerr, Igor Ariz-Extreme, Matthew J. Harvey, Gianni De Fabritiis

Abstract: Molecular simulations produce very high-dimensional data-sets with millions of data points. As analysis methods are often unable to cope with so many dimensions, it is common to use dimensionality reduction and clustering methods to reach a reduced representation of the data. Yet these methods often fail to capture the most important features necessary for the construction of a Markov model. Here… ▽ More Molecular simulations produce very high-dimensional data-sets with millions of data points. As analysis methods are often unable to cope with so many dimensions, it is common to use dimensionality reduction and clustering methods to reach a reduced representation of the data. Yet these methods often fail to capture the most important features necessary for the construction of a Markov model. Here we demonstrate the results of various dimensionality reduction methods on two simulation data-sets, one of protein folding and another of protein-ligand binding. The methods tested include a k-means clustering variant, a non-linear auto encoder, principal component analysis and tICA. The dimension-reduced data is then used to estimate the implied timescales of the slowest process by a Markov state model analysis to assess the quality of the projection. The projected dimensions learned from the data are visualized to demonstrate which conformations the various methods choose to represent the molecular process. △ Less

Submitted 2 November, 2017; v1 submitted 29 October, 2017; originally announced October 2017.

Comments: 11 pages, 10 figures

arXiv:1301.3861 [pdf]

Inference for Belief Networks Using Coupling From the Past

Authors: Michael Harvey, Radford M. Neal

Abstract: Inference for belief networks using Gibbs sampling produces a distribution for unobserved variables that differs from the correct distribution by a (usually) unknown error, since convergence to the right distribution occurs only asymptotically. The method of "coupling from the past" samples from exactly the correct distribution by (conceptually) running dependent Gibbs sampling simulations from… ▽ More Inference for belief networks using Gibbs sampling produces a distribution for unobserved variables that differs from the correct distribution by a (usually) unknown error, since convergence to the right distribution occurs only asymptotically. The method of "coupling from the past" samples from exactly the correct distribution by (conceptually) running dependent Gibbs sampling simulations from every possible starting state from a time far enough in the past that all runs reach the same state at time t=0. Explicitly considering every possible state is intractable for large networks, however. We propose a method for layered noisy-or networks that uses a compact, but often imprecise, summary of a set of states. This method samples from exactly the correct distribution, and requires only about twice the time per step as ordinary Gibbs sampling, but it may require more simulation steps than would be needed if chains were tracked exactly. △ Less

Submitted 16 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

Report number: UAI-P-2000-PG-256-263

arXiv:0911.4178 [pdf]

Folksonomic Tag Clouds as an Aid to Content Indexing

Authors: Morgan Harvey, Mark Baillie, Ian Ruthven, David Elsweiler

Abstract: Social tagging systems have recently developed as a popular method of data organisation on the Internet. These systems allow users to organise their content in a way that makes sense to them, rather than forcing them to use a pre-determined and rigid set of categorisations. These folksonomies provide well populated sources of unstructured tags describing web resources which could potentially be… ▽ More Social tagging systems have recently developed as a popular method of data organisation on the Internet. These systems allow users to organise their content in a way that makes sense to them, rather than forcing them to use a pre-determined and rigid set of categorisations. These folksonomies provide well populated sources of unstructured tags describing web resources which could potentially be used as semantic index terms for these resources. However getting people to agree on what tags best describe a resource is a difficult problem, therefore any feature which increases the consistency and stability of terms chosen would be extremely beneficial. We investigate how the provision of a tag cloud, a weighted list of terms commonly used to assist in browsing a folksonomy, during the tagging process itself influences the tags produced and how difficult the user perceived the task to be. We show that illustrating the most popular tags to users assists in the tagging process and encourages a stable and consistent folksonomy to form. △ Less

Submitted 21 November, 2009; originally announced November 2009.

Comments: SIGIR 2009 Workshop on Search in Social Media (SSM 2009)

ACM Class: H.5; H.3

Showing 1–9 of 9 results for author: Harvey, M