-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
CLSE: Corpus of Linguistically Significant Entities
Authors:
Aleksandr Chuklin,
Justin Zhao,
Mihir Kale
Abstract:
One of the biggest challenges of natural language generation (NLG) is the proper handling of named entities. Named entities are a common source of grammar mistakes such as wrong prepositions, wrong article handling, or incorrect entity inflection. Without factoring linguistic representation, such errors are often underrepresented when evaluating on a small set of arbitrarily picked argument values…
▽ More
One of the biggest challenges of natural language generation (NLG) is the proper handling of named entities. Named entities are a common source of grammar mistakes such as wrong prepositions, wrong article handling, or incorrect entity inflection. Without factoring linguistic representation, such errors are often underrepresented when evaluating on a small set of arbitrarily picked argument values, or when translating a dataset from a linguistically simpler language, like English, to a linguistically complex language, like Russian. However, for some applications, broadly precise grammatical correctness is critical -- native speakers may find entity-related grammar errors silly, jarring, or even offensive.
To enable the creation of more linguistically diverse NLG datasets, we release a Corpus of Linguistically Significant Entities (CLSE) annotated by linguist experts. The corpus includes 34 languages and covers 74 different semantic types to support various applications from airline ticketing to video games. To demonstrate one possible use of CLSE, we produce an augmented version of the Schema-Guided Dialog Dataset, SGD-CLSE. Using the CLSE's entities and a small number of human translations, we create a linguistically representative NLG evaluation benchmark in three languages: French (high-resource), Marathi (low-resource), and Russian (highly inflected language). We establish quality baselines for neural, template-based, and hybrid NLG systems and discuss the strengths and weaknesses of each approach.
△ Less
Submitted 30 August, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Text Generation with Text-Editing Models
Authors:
Eric Malmi,
Yue Dong,
Jonathan Mallinson,
Aleksandr Chuklin,
Jakub Adamek,
Daniil Mirylenka,
Felix Stahlberg,
Sebastian Krause,
Shankar Kumar,
Aliaksei Severyn
Abstract:
Text-editing models have recently become a prominent alternative to seq2seq models for monolingual text-generation tasks such as grammatical error correction, simplification, and style transfer. These tasks share a common trait - they exhibit a large amount of textual overlap between the source and target texts. Text-editing models take advantage of this observation and learn to generate the outpu…
▽ More
Text-editing models have recently become a prominent alternative to seq2seq models for monolingual text-generation tasks such as grammatical error correction, simplification, and style transfer. These tasks share a common trait - they exhibit a large amount of textual overlap between the source and target texts. Text-editing models take advantage of this observation and learn to generate the output by predicting edit operations applied to the source sequence. In contrast, seq2seq models generate outputs word-by-word from scratch thus making them slow at inference time. Text-editing models provide several benefits over seq2seq models including faster inference speed, higher sample efficiency, and better control and interpretability of the outputs. This tutorial provides a comprehensive overview of text-editing models and current state-of-the-art approaches, and analyzes their pros and cons. We discuss challenges related to productionization and how these models can be used to mitigate hallucination and bias, both pressing challenges in the field of text generation.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions
Authors:
Mohammad Aliannejadi,
Julia Kiseleva,
Aleksandr Chuklin,
Jeffrey Dalton,
Mikhail Burtsev
Abstract:
Enabling open-domain dialogue systems to ask clarifying questions when appropriate is an important direction for improving the quality of the system response. Namely, for cases when a user request is not specific enough for a conversation system to provide an answer right away, it is desirable to ask a clarifying question to increase the chances of retrieving a satisfying answer. To address the pr…
▽ More
Enabling open-domain dialogue systems to ask clarifying questions when appropriate is an important direction for improving the quality of the system response. Namely, for cases when a user request is not specific enough for a conversation system to provide an answer right away, it is desirable to ask a clarifying question to increase the chances of retrieving a satisfying answer. To address the problem of 'asking clarifying questions in open-domain dialogues': (1) we collect and release a new dataset focused on open-domain single- and multi-turn conversations, (2) we benchmark several state-of-the-art neural baselines, and (3) we propose a pipeline consisting of offline and online steps for evaluating the quality of clarifying questions in various dialogues. These contributions are suitable as a foundation for further research.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
ConvAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ)
Authors:
Mohammad Aliannejadi,
Julia Kiseleva,
Aleksandr Chuklin,
Jeff Dalton,
Mikhail Burtsev
Abstract:
This document presents a detailed description of the challenge on clarifying questions for dialogue systems (ClariQ). The challenge is organized as part of the Conversational AI challenge series (ConvAI3) at Search Oriented Conversational AI (SCAI) EMNLP workshop in 2020. The main aim of the conversational systems is to return an appropriate answer in response to the user requests. However, some u…
▽ More
This document presents a detailed description of the challenge on clarifying questions for dialogue systems (ClariQ). The challenge is organized as part of the Conversational AI challenge series (ConvAI3) at Search Oriented Conversational AI (SCAI) EMNLP workshop in 2020. The main aim of the conversational systems is to return an appropriate answer in response to the user requests. However, some user requests might be ambiguous. In IR settings such a situation is handled mainly thought the diversification of the search result page. It is however much more challenging in dialogue settings with limited bandwidth. Therefore, in this challenge, we provide a common evaluation framework to evaluate mixed-initiative conversations. Participants are asked to rank clarifying questions in an information-seeking conversations. The challenge is organized in two stages where in Stage 1 we evaluate the submissions in an offline setting and single-turn conversations. Top participants of Stage 1 get the chance to have their model tested by human annotators.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Prosody Modifications for Question-Answering in Voice-Only Settings
Authors:
Aleksandr Chuklin,
Aliaksei Severyn,
Johanne Trippas,
Enrique Alfonseca,
Hanna Silen,
Damiano Spina
Abstract:
Many popular form factors of digital assistants---such as Amazon Echo, Apple Homepod, or Google Home---enable the user to hold a conversation with these systems based only on the speech modality. The lack of a screen presents unique challenges. To satisfy the information need of a user, the presentation of the answer needs to be optimized for such voice-only interactions. In this paper, we propose…
▽ More
Many popular form factors of digital assistants---such as Amazon Echo, Apple Homepod, or Google Home---enable the user to hold a conversation with these systems based only on the speech modality. The lack of a screen presents unique challenges. To satisfy the information need of a user, the presentation of the answer needs to be optimized for such voice-only interactions. In this paper, we propose a task of evaluating the usefulness of audio transformations (i.e., prosodic modifications) for voice-only question answering. We introduce a crowdsourcing setup where we evaluate the quality of our proposed modifications along multiple dimensions corresponding to the informativeness, naturalness, and ability of the user to identify key parts of the answer. We offer a set of prosodic modifications that highlight potentially important parts of the answer using various acoustic cues. Our experiments show that some of these prosodic modifications lead to better comprehension at the expense of only slightly degraded naturalness of the audio.
△ Less
Submitted 2 October, 2019; v1 submitted 11 June, 2018;
originally announced June 2018.
-
Incorporating Clicks, Attention and Satisfaction into a Search Engine Result Page Evaluation Model
Authors:
Aleksandr Chuklin,
Maarten de Rijke
Abstract:
Modern search engine result pages often provide immediate value to users and organize information in such a way that it is easy to navigate. The core ranking function contributes to this and so do result snippets, smart organization of result blocks and extensive use of one-box answers or side panels. While they are useful to the user and help search engines to stand out, such features present two…
▽ More
Modern search engine result pages often provide immediate value to users and organize information in such a way that it is easy to navigate. The core ranking function contributes to this and so do result snippets, smart organization of result blocks and extensive use of one-box answers or side panels. While they are useful to the user and help search engines to stand out, such features present two big challenges for evaluation. First, the presence of such elements on a search engine result page (SERP) may lead to the absence of clicks, which is, however, not related to dissatisfaction, so-called "good abandonments." Second, the non-linear layout and visual difference of SERP items may lead to non-trivial patterns of user attention, which is not captured by existing evaluation metrics.
In this paper we propose a model of user behavior on a SERP that jointly captures click behavior, user attention and satisfaction, the CAS model, and demonstrate that it gives more accurate predictions of user actions and self-reported satisfaction than existing models based on clicks alone. We use the CAS model to build a novel evaluation metric that can be applied to non-linear SERP layouts and that can account for the utility that users obtain directly on a SERP. We demonstrate that this metric shows better agreement with user-reported satisfaction than conventional evaluation metrics.
△ Less
Submitted 2 September, 2016;
originally announced September 2016.
-
The Anatomy of Relevance: Topical, Snippet and Perceived Relevance in Search Result Evaluation
Authors:
Aleksandr Chuklin,
Maarten de Rijke
Abstract:
Currently, the quality of a search engine is often determined using so-called topical relevance, i.e., the match between the user intent (expressed as a query) and the content of the document. In this work we want to draw attention to two aspects of retrieval system performance affected by the presentation of results: result attractiveness ("perceived relevance") and immediate usefulness of the sn…
▽ More
Currently, the quality of a search engine is often determined using so-called topical relevance, i.e., the match between the user intent (expressed as a query) and the content of the document. In this work we want to draw attention to two aspects of retrieval system performance affected by the presentation of results: result attractiveness ("perceived relevance") and immediate usefulness of the snippets ("snippet relevance"). Perceived relevance may influence discoverability of good topical documents and seemingly better rankings may in fact be less useful to the user if good-looking snippets lead to irrelevant documents or vice-versa. And result items on a search engine result page (SERP) with high snippet relevance may add towards the total utility gained by the user even without the need to click those items.
We start by motivating the need to collect different aspects of relevance (topical, perceived and snippet relevances) and how these aspects can improve evaluation measures. We then discuss possible ways to collect these relevance aspects using crowdsourcing and the challenges arising from that.
△ Less
Submitted 26 January, 2015;
originally announced January 2015.
-
Effective protocols for low-distance file synchronization
Authors:
Aleksandr Chuklin
Abstract:
Suppose that we have two similar files stored on different computers. We need to send the file from the first computer to the second one trying to minimize the number of bits transmitted. This article presents a survey of results known for this communication complexity problem in the case when files are "similar" in the sense of Hamming distance. We mainly systematize earlier results obtained by v…
▽ More
Suppose that we have two similar files stored on different computers. We need to send the file from the first computer to the second one trying to minimize the number of bits transmitted. This article presents a survey of results known for this communication complexity problem in the case when files are "similar" in the sense of Hamming distance. We mainly systematize earlier results obtained by various authors in 1990s and 2000s and discuss its connection with coding theory, hashing algorithms and other domains of computer science. In particular cases we propose some improvements of previous constructions.
△ Less
Submitted 23 February, 2011;
originally announced February 2011.