Search | arXiv e-print repository

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:1308.3657 [pdf, ps, other]

doi 10.1109/SocialCom.2013.17

Hoodsquare: Modeling and Recommending Neighborhoods in Location-based Social Networks

Authors: Amy X. Zhang, Anastasios Noulas, Salvatore Scellato, Cecilia Mascolo

Abstract: Information garnered from activity on location-based social networks can be harnessed to characterize urban spaces and organize them into neighborhoods. In this work, we adopt a data-driven approach to the identification and modeling of urban neighborhoods using location-based social networks. We represent geographic points in the city using spatio-temporal information about Foursquare user check-… ▽ More Information garnered from activity on location-based social networks can be harnessed to characterize urban spaces and organize them into neighborhoods. In this work, we adopt a data-driven approach to the identification and modeling of urban neighborhoods using location-based social networks. We represent geographic points in the city using spatio-temporal information about Foursquare user check-ins and semantic information about places, with the goal of develo** features to input into a novel neighborhood detection algorithm. The algorithm first employs a similarity metric that assesses the homogeneity of a geographic area, and then with a simple mechanism of geographic navigation, it detects the boundaries of a city's neighborhoods. The models and algorithms devised are subsequently integrated into a publicly available, map-based tool named Hoodsquare that allows users to explore activities and neighborhoods in cities around the world. Finally, we evaluate Hoodsquare in the context of a recommendation application where user profiles are matched to urban neighborhoods. By comparing with a number of baselines, we demonstrate how Hoodsquare can be used to accurately predict the home neighborhood of Twitter users. We also show that we are able to suggest neighborhoods geographically constrained in size, a desirable property in mobile recommendation scenarios for which geographical precision is key. △ Less

Submitted 16 August, 2013; originally announced August 2013.

Comments: ASE/IEEE SocialCom 2013

arXiv:1306.1704 [pdf, ps, other]

doi 10.1145/2487575.2487616

Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement

Authors: Dmytro Karamshuk, Anastasios Noulas, Salvatore Scellato, Vincenzo Nicosia, Cecilia Mascolo

Abstract: The problem of identifying the optimal location for a new retail store has been the focus of past research, especially in the field of land economy, due to its importance in the success of a business. Traditional approaches to the problem have factored in demographics, revenue and aggregated human flow statistics from nearby or remote areas. However, the acquisition of relevant data is usually exp… ▽ More The problem of identifying the optimal location for a new retail store has been the focus of past research, especially in the field of land economy, due to its importance in the success of a business. Traditional approaches to the problem have factored in demographics, revenue and aggregated human flow statistics from nearby or remote areas. However, the acquisition of relevant data is usually expensive. With the growth of location-based social networks, fine grained data describing user mobility and popularity of places has recently become attainable. In this paper we study the predictive power of various machine learning features on the popularity of retail stores in the city through the use of a dataset collected from Foursquare in New York. The features we mine are based on two general signals: geographic, where features are formulated according to the types and density of nearby places, and user mobility, which includes transitions between venues or the incoming flow of mobile users from distant areas. Our evaluation suggests that the best performing features are common across the three different commercial chains considered in the analysis, although variations may exist too, as explained by heterogeneities in the way retail facilities attract users. We also show that performance improves significantly when combining multiple features in supervised learning algorithms, suggesting that the retail success of a business may depend on multiple factors. △ Less

Submitted 25 February, 2014; v1 submitted 7 June, 2013; originally announced June 2013.

Comments: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, Chicago, 2013, Pages 793-801

ACM Class: H.2.8

arXiv:1305.6974 [pdf, other]

doi 10.1007/978-3-642-36461-7_7

Applications of Temporal Graph Metrics to Real-World Networks

Authors: John Tang, Ilias Leontiadis, Salvatore Scellato, Vincenzo Nicosia, Cecilia Mascolo, Mirco Musolesi, Vito Latora

Abstract: Real world networks exhibit rich temporal information: friends are added and removed over time in online social networks; the seasons dictate the predator-prey relationship in food webs; and the propagation of a virus depends on the network of human contacts throughout the day. Recent studies have demonstrated that static network analysis is perhaps unsuitable in the study of real world network si… ▽ More Real world networks exhibit rich temporal information: friends are added and removed over time in online social networks; the seasons dictate the predator-prey relationship in food webs; and the propagation of a virus depends on the network of human contacts throughout the day. Recent studies have demonstrated that static network analysis is perhaps unsuitable in the study of real world network since static paths ignore time order, which, in turn, results in static shortest paths overestimating available links and underestimating their true corresponding lengths. Temporal extensions to centrality and efficiency metrics based on temporal shortest paths have also been proposed. Firstly, we analyse the roles of key individuals of a corporate network ranked according to temporal centrality within the context of a bankruptcy scandal; secondly, we present how such temporal metrics can be used to study the robustness of temporal networks in presence of random errors and intelligent attacks; thirdly, we study containment schemes for mobile phone malware which can spread via short range radio, similar to biological viruses; finally, we study how the temporal network structure of human interactions can be exploited to effectively immunise human populations. Through these applications we demonstrate that temporal metrics provide a more accurate and effective analysis of real-world networks compared to their static counterparts. △ Less

Submitted 29 May, 2013; originally announced May 2013.

Comments: 25 pages

Journal ref: Chapter in Temporal Networks (Petter Holme and Jari Saramäki editors). Springer. Berlin, Heidelberg 2013

arXiv:1303.6460 [pdf, ps, other]

doi 10.1140/epjb/e2013-40253-6

Social and place-focused communities in location-based online social networks

Authors: Chloë Brown, Vincenzo Nicosia, Salvatore Scellato, Anastasios Noulas, Cecilia Mascolo

Abstract: Thanks to widely available, cheap Internet access and the ubiquity of smartphones, millions of people around the world now use online location-based social networking services. Understanding the structural properties of these systems and their dependence upon users' habits and mobility has many potential applications, including resource recommendation and link prediction. Here, we construct and ch… ▽ More Thanks to widely available, cheap Internet access and the ubiquity of smartphones, millions of people around the world now use online location-based social networking services. Understanding the structural properties of these systems and their dependence upon users' habits and mobility has many potential applications, including resource recommendation and link prediction. Here, we construct and characterise social and place-focused graphs by using longitudinal information about declared social relationships and about users' visits to physical places collected from a popular online location-based social service. We show that although the social and place-focused graphs are constructed from the same data set, they have quite different structural properties. We find that the social and location-focused graphs have different global and meso-scale structure, and in particular that social and place-focused communities have negligible overlap. Consequently, group inference based on community detection performed on the social graph alone fails to isolate place-focused groups, even though these do exist in the network. By studying the evolution of tie structure within communities, we show that the time period over which location data are aggregated has a substantial impact on the stability of place-focused communities, and that information about place-based groups may be more useful for user-centric applications than that obtained from the analysis of social communities alone. △ Less

Submitted 1 July, 2013; v1 submitted 26 March, 2013; originally announced March 2013.

Comments: 11 pages, 5 figures

Journal ref: C. Brown, V. Nicosia, S. Scellato, A. Noulas, C. Mascolo "Social and place-focused communities in location-based online social networks", Eur. Phys. J. B 86 (6), 290 (2013)

arXiv:1108.5355 [pdf, other]

doi 10.1371/journal.pone.0037027

A tale of many cities: universal patterns in human urban mobility

Authors: Anastasios Noulas, Salvatore Scellato, Renaud Lambiotte, Massimiliano Pontil, Cecilia Mascolo

Abstract: The advent of geographic online social networks such as Foursquare, where users voluntarily signal their current location, opens the door to powerful studies on human movement. In particular the fine granularity of the location data, with GPS accuracy down to 10 meters, and the worldwide scale of Foursquare adoption are unprecedented. In this paper we study urban mobility patterns of people in sev… ▽ More The advent of geographic online social networks such as Foursquare, where users voluntarily signal their current location, opens the door to powerful studies on human movement. In particular the fine granularity of the location data, with GPS accuracy down to 10 meters, and the worldwide scale of Foursquare adoption are unprecedented. In this paper we study urban mobility patterns of people in several metropolitan cities around the globe by analyzing a large set of Foursquare users. Surprisingly, while there are variations in human movement in different cities, our analysis shows that those are predominantly due to different distributions of places across different urban environments. Moreover, a universal law for human mobility is identified, which isolates as a key component the rank-distance, factoring in the number of places between origin and destination, rather than pure physical distance, as considered in some previous works. Building on our findings, we also show how a rank-based movement model accurately captures real human movements in different cities. Our results shed new light on the driving factors of urban human mobility, with potential applications for urban planning, location-based advertisement and even social studies. △ Less

Submitted 11 October, 2011; v1 submitted 24 August, 2011; originally announced August 2011.

arXiv:0711.2780 [pdf, ps, other]

Epcast: Controlled Dissemination in Human-based Wireless Networks by means of Epidemic Spreading Models

Authors: Salvatore Scellato, Cecilia Mascolo, Mirco Musolesi, Vito Latora

Abstract: Epidemics-inspired techniques have received huge attention in recent years from the distributed systems and networking communities. These algorithms and protocols rely on probabilistic message replication and redundancy to ensure reliable communication. Moreover, they have been successfully exploited to support group communication in distributed systems, broadcasting, multicasting and informatio… ▽ More Epidemics-inspired techniques have received huge attention in recent years from the distributed systems and networking communities. These algorithms and protocols rely on probabilistic message replication and redundancy to ensure reliable communication. Moreover, they have been successfully exploited to support group communication in distributed systems, broadcasting, multicasting and information dissemination in fixed and mobile networks. However, in most of the existing work, the probability of infection is determined heuristically, without relying on any analytical model. This often leads to unnecessarily high transmission overheads. In this paper we show that models of epidemic spreading in complex networks can be applied to the problem of tuning and controlling the dissemination of information in wireless ad hoc networks composed of devices carried by individuals, i.e., human-based networks. The novelty of our idea resides in the evaluation and exploitation of the structure of the underlying human network for the automatic tuning of the dissemination process in order to improve the protocol performance. We evaluate the results using synthetic mobility models and real human contacts traces. △ Less

Submitted 18 November, 2007; originally announced November 2007.

Showing 1–8 of 8 results for author: Scellato, S