Search | arXiv e-print repository

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2309.07638 [pdf, other]

doi 10.1016/j.cose.2024.103731

WASM-MUTATE: Fast and Effective Binary Diversification for WebAssembly

Authors: Javier Cabrera-Arteaga, Nicholas Fitzgerald, Martin Monperrus, Benoit Baudry

Abstract: WebAssembly is the fourth officially endorsed Web language. It is recognized because of its efficiency and design, focused on security. Yet, its swiftly expanding ecosystem lacks robust software diversification systems. We introduce WASM-MUTATE, a diversification engine specifically designed for WebAssembly. Our engine meets several essential criteria: 1) To quickly generate functionally identical… ▽ More WebAssembly is the fourth officially endorsed Web language. It is recognized because of its efficiency and design, focused on security. Yet, its swiftly expanding ecosystem lacks robust software diversification systems. We introduce WASM-MUTATE, a diversification engine specifically designed for WebAssembly. Our engine meets several essential criteria: 1) To quickly generate functionally identical, yet behaviorally diverse, WebAssembly variants, 2) To be universally applicable to any WebAssembly program, irrespective of the source programming language, and 3) Generated variants should counter side-channels. By leveraging an e-graph data structure, WASM-MUTATE is implemented to meet both speed and efficacy. We evaluate WASM-MUTATE by conducting experiments on 404 programs, which include real-world applications. Our results highlight that WASM-MUTATE can produce tens of thousands of unique and efficient WebAssembly variants within minutes. Significantly, WASM-MUTATE can safeguard WebAssembly binaries against timing side-channel attacks,especially those of the Spectre type. △ Less

Submitted 17 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Report number: volume 139

Journal ref: Computers & Security, 2024

arXiv:2306.10231 [pdf, other]

GLIMMER: generalized late-interaction memory reranker

Authors: Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Sumit Sanghai, William W. Cohen, Joshua Ainslie

Abstract: Memory-augmentation is a powerful approach for efficiently incorporating external information into language models, but leads to reduced performance relative to retrieving text. Recent work introduced LUMEN, a memory-retrieval hybrid that partially pre-computes memory and updates memory representations on the fly with a smaller live encoder. We propose GLIMMER, which improves on this approach th… ▽ More Memory-augmentation is a powerful approach for efficiently incorporating external information into language models, but leads to reduced performance relative to retrieving text. Recent work introduced LUMEN, a memory-retrieval hybrid that partially pre-computes memory and updates memory representations on the fly with a smaller live encoder. We propose GLIMMER, which improves on this approach through 1) exploiting free access to the powerful memory representations by applying a shallow reranker on top of memory to drastically improve retrieval quality at low cost, and 2) incorporating multi-task training to learn a general and higher quality memory and live encoder. GLIMMER achieves strong gains in performance at faster speeds compared to LUMEN and FiD on the KILT benchmark of knowledge-intensive tasks. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2304.02603 [pdf, other]

doi 10.1108/GKMC-06-2023-0195

A Checklist to Publish Collections as Data in GLAM Institutions

Authors: Gustavo Candela, Nele Gabriëls, Sally Chambers, Thuy-An Pham, Sarah Ames, Neil Fitzgerald, Katrine Hofmann, Victor Harbo, Abigail Potter, Meghan Ferriter, Eileen Manchester, Alba Irollo, Ellen Van Keer, Mahendra Mahey, Olga Holownia, Milena Dobreva

Abstract: Large-scale digitization in Galleries, Libraries, Archives and Museums (GLAM) created the conditions for providing access to collections as data. It opened new opportunities to explore, use and reuse digital collections. Strong proponents of collections as data are the Innovation Labs which provided numerous examples of publishing datasets under open licenses in order to reuse digital content in n… ▽ More Large-scale digitization in Galleries, Libraries, Archives and Museums (GLAM) created the conditions for providing access to collections as data. It opened new opportunities to explore, use and reuse digital collections. Strong proponents of collections as data are the Innovation Labs which provided numerous examples of publishing datasets under open licenses in order to reuse digital content in novel and creative ways. Within the current transition to the emerging data spaces, clouds for cultural heritage and open science, the need to identify practices which support more GLAM institutions to offer datasets becomes a priority, especially within the smaller and medium-sized institutions. This paper answers the need to support GLAM institutions in facilitating the transition into publishing their digital content and to introduce collections as data services; this will also help their future efficient contribution to data spaces and cultural heritage clouds. It offers a checklist that can be used for both creating and evaluating digital collections suitable for computational use. The main contributions of this paper are i) a methodology for devising a checklist to create and assess digital collections for computational use; ii) a checklist to create and assess digital collections suitable for use with computational methods; iii) the assessment of the checklist against the practice of institutions innovating in the Collections as data field; and iv) the results obtained after the application and recommendations for the use of the checklist in GLAM institutions. △ Less

Submitted 13 November, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

Comments: This is an original manuscript of an article published by Emerald Publishing Limited in Global Knowledge, Memory and Communication on 9 November 2023, available online: https://doi.org/10.1108/GKMC-06-2023-0195

arXiv:2301.10448 [pdf, other]

Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute

Authors: Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Joshua Ainslie, Sumit Sanghai, Fei Sha, William Cohen

Abstract: Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs… ▽ More Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs a severe quality penalty as the memory representations are not conditioned on the current input. We propose LUMEN, a hybrid between these two extremes, pre-computing the majority of the retrieval representation and completing the encoding on the fly using a live encoder that is conditioned on the question and fine-tuned for the task. We show that LUMEN significantly outperforms pure memory on multiple question-answering tasks while being much cheaper than FiD, and outperforms both for any given compute budget. Moreover, the advantage of LUMEN over FiD increases with model size. △ Less

Submitted 2 June, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: ICML 2023

arXiv:2212.08153 [pdf, other]

FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference

Authors: Michiel de Jong, Yury Zemlyanskiy, Joshua Ainslie, Nicholas FitzGerald, Sumit Sanghai, Fei Sha, William Cohen

Abstract: Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while… ▽ More Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while the majority of inference time results from memory bandwidth constraints in the decoder. We propose two simple changes to the FiD architecture to alleviate memory bandwidth constraints, and speed up inference by 7x. This allows us to use a much larger decoder at modest cost. We denote FiD with the above modifications as FiDO, and show that it strongly improves performance over existing FiD models for a wide range of inference budgets. For example, FiDO-Large-XXL performs faster inference than FiD-Base and achieves better performance than FiD-Large. △ Less

Submitted 2 June, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: ACL Findings 2023

arXiv:2110.06176 [pdf, other]

Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

Authors: Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Fei Sha, William Cohen

Abstract: Natural language understanding tasks such as open-domain question answering often require retrieving and assimilating factual information from multiple sources. We propose to address this problem by integrating a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge. Specifically, our method represents knowledge with `mention memory', a tab… ▽ More Natural language understanding tasks such as open-domain question answering often require retrieving and assimilating factual information from multiple sources. We propose to address this problem by integrating a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge. Specifically, our method represents knowledge with `mention memory', a table of dense vector representations of every entity mention in a corpus. The proposed model - TOME - is a Transformer that accesses the information through internal memory layers in which each entity mention in the input passage attends to the mention memory. This approach enables synthesis of and reasoning over many disparate sources of information within a single Transformer model. In experiments using a memory of 150 million Wikipedia mentions, TOME achieves strong performance on several open-domain knowledge-intensive tasks, including the claim verification benchmarks HoVer and FEVER and several entity-based QA benchmarks. We also show that the model learns to attend to informative mentions without any direct supervision. Finally we demonstrate that the model can generalize to new unseen entities by updating the memory without retraining. △ Less

Submitted 19 April, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

arXiv:2106.07352 [pdf, other]

MOLEMAN: Mention-Only Linking of Entities with a Mention Annotation Network

Authors: Nicholas FitzGerald, Jan A. Botha, Daniel Gillick, Daniel M. Bikel, Tom Kwiatkowski, Andrew McCallum

Abstract: We present an instance-based nearest neighbor approach to entity linking. In contrast to most prior entity retrieval systems which represent each entity with a single vector, we build a contextualized mention-encoder that learns to place similar mentions of the same entity closer in vector space than mentions of different entities. This approach allows all mentions of an entity to serve as "class… ▽ More We present an instance-based nearest neighbor approach to entity linking. In contrast to most prior entity retrieval systems which represent each entity with a single vector, we build a contextualized mention-encoder that learns to place similar mentions of the same entity closer in vector space than mentions of different entities. This approach allows all mentions of an entity to serve as "class prototypes" as inference involves retrieving from the full set of labeled entity mentions in the training set and applying the nearest mention neighbor's entity label. Our model is trained on a large multilingual corpus of mention pairs derived from Wikipedia hyperlinks, and performs nearest neighbor inference on an index of 700 million mentions. It is simpler to train, gives more interpretable predictions, and outperforms all other systems on two multilingual entity linking benchmarks. △ Less

Submitted 22 July, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: Accepted to ACL 2021, edit to add missing Turkish results in Tables 2 and 7

arXiv:2008.00920 [pdf, other]

On The Plurality of Graphs

Authors: Nicole Fitzgerald, Jacopo Tagliabue

Abstract: We conduct a series of experiments designed to empirically demonstrate the effects of varying the structural features of a multi-agent emergent communication game framework. Specifically, we model the interactions (edges) between individual agents (nodes)as the structure of a graph generated according to a series of known random graph generating algorithms. Confirming the hypothesis proposed in [1… ▽ More We conduct a series of experiments designed to empirically demonstrate the effects of varying the structural features of a multi-agent emergent communication game framework. Specifically, we model the interactions (edges) between individual agents (nodes)as the structure of a graph generated according to a series of known random graph generating algorithms. Confirming the hypothesis proposed in [10], we show that the two factors of variation induced in this work, namely 1) the graph-generating process and 2) the centrality measure according to which edges are sampled, in fact play a significant role in determining the dynamics of language emergence within the population at hand. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: Manuscript accepted at NETREASON @ ECAI2020

arXiv:2005.14253 [pdf, ps, other]

Empirical Evaluation of Pretraining Strategies for Supervised Entity Linking

Authors: Thibault Févry, Nicholas FitzGerald, Livio Baldini Soares, Tom Kwiatkowski

Abstract: In this work, we present an entity linking model which combines a Transformer architecture with large scale pretraining from Wikipedia links. Our model achieves the state-of-the-art on two commonly used entity linking datasets: 96.7% on CoNLL and 94.9% on TAC-KBP. We present detailed analyses to understand what design choices are important for entity linking, including choices of negative entity c… ▽ More In this work, we present an entity linking model which combines a Transformer architecture with large scale pretraining from Wikipedia links. Our model achieves the state-of-the-art on two commonly used entity linking datasets: 96.7% on CoNLL and 94.9% on TAC-KBP. We present detailed analyses to understand what design choices are important for entity linking, including choices of negative entity candidates, Transformer architecture, and input perturbations. Lastly, we present promising results on more challenging settings such as end-to-end entity linking and entity linking without in-domain training data. △ Less

Submitted 28 May, 2020; originally announced May 2020.

Comments: 11 pages, 8 figures, appearing at AKBC 2020

arXiv:2004.07202 [pdf, other]

Entities as Experts: Sparse Memory Access with Entity Supervision

Authors: Thibault Févry, Livio Baldini Soares, Nicholas FitzGerald, Eunsol Choi, Tom Kwiatkowski

Abstract: We focus on the problem of capturing declarative knowledge about entities in the learned parameters of a language model. We introduce a new model - Entities as Experts (EAE) - that can access distinct memories of the entities mentioned in a piece of text. Unlike previous efforts to integrate entity knowledge into sequence models, EAE's entity representations are learned directly from text. We show… ▽ More We focus on the problem of capturing declarative knowledge about entities in the learned parameters of a language model. We introduce a new model - Entities as Experts (EAE) - that can access distinct memories of the entities mentioned in a piece of text. Unlike previous efforts to integrate entity knowledge into sequence models, EAE's entity representations are learned directly from text. We show that EAE's learned representations capture sufficient knowledge to answer TriviaQA questions such as "Which Dr. Who villain has been played by Roger Delgado, Anthony Ainley, Eric Roberts?", outperforming an encoder-generator Transformer model with 10x the parameters. According to the LAMA knowledge probes, EAE contains more factual knowledge than a similarly sized BERT, as well as previous approaches that integrate external sources of entity knowledge. Because EAE associates parameters with specific entities, it only needs to access a fraction of its parameters at inference time, and we show that the correct identification and representation of entities is essential to EAE's performance. △ Less

Submitted 6 October, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

arXiv:2001.03765 [pdf, other]

Learning Cross-Context Entity Representations from Text

Authors: Jeffrey Ling, Nicholas FitzGerald, Zifei Shan, Livio Baldini Soares, Thibault Févry, David Weiss, Tom Kwiatkowski

Abstract: Language modeling tasks, in which words, or word-pieces, are predicted on the basis of a local context, have been very effective for learning word embeddings and context dependent representations of phrases. Motivated by the observation that efforts to code world knowledge into machine readable knowledge bases or human readable encyclopedias tend to be entity-centric, we investigate the use of a f… ▽ More Language modeling tasks, in which words, or word-pieces, are predicted on the basis of a local context, have been very effective for learning word embeddings and context dependent representations of phrases. Motivated by the observation that efforts to code world knowledge into machine readable knowledge bases or human readable encyclopedias tend to be entity-centric, we investigate the use of a fill-in-the-blank task to learn context independent representations of entities from the text contexts in which those entities were mentioned. We show that large scale training of neural models allows us to learn high quality entity representations, and we demonstrate successful results on four domains: (1) existing entity-level ty** benchmarks, including a 64% error reduction over previous work on TypeNet (Murty et al., 2018); (2) a novel few-shot category reconstruction task; (3) existing entity linking benchmarks, where we match the state-of-the-art on CoNLL-Aida without linking-specific features and obtain a score of 89.8% on TAC-KBP 2010 without using any alias table, external knowledge base or in domain training data and (4) answering trivia questions, which uniquely identify entities. Our global entity representations encode fine-grained type categories, such as Scottish footballers, and can answer trivia questions such as: Who was the last inmate of Spandau jail in Berlin? △ Less

Submitted 11 January, 2020; originally announced January 2020.

arXiv:1911.04362 [pdf, other]

To Populate is To Regulate

Authors: Nicole Fitzgerald

Abstract: We examine the effects of instantiating Lewis signaling games within a population of speaker and listener agents with the aim of producing a set of general and robust representations of unstructured pixel data. Preliminary experiments suggest that the set of representations associated with languages generated within a population outperform those generated between a single speaker-listener pair on… ▽ More We examine the effects of instantiating Lewis signaling games within a population of speaker and listener agents with the aim of producing a set of general and robust representations of unstructured pixel data. Preliminary experiments suggest that the set of representations associated with languages generated within a population outperform those generated between a single speaker-listener pair on this objective, making a case for the adoption of population-based approaches in emergent communication studies. Furthermore, post-hoc analysis reveals that population-based learning induces a number of novel factors to the conventional emergent communication setup, inviting a wide range of future research questions regarding communication dynamics and the flow of information within them. △ Less

Submitted 6 November, 2019; originally announced November 2019.

Comments: EmeCom Neurips 2019

arXiv:1906.03158 [pdf, other]

Matching the Blanks: Distributional Similarity for Relation Learning

Authors: Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, Tom Kwiatkowski

Abstract: General purpose relation extractors, which can model arbitrary relations, are a core aspiration in information extraction. Efforts have been made to build general purpose extractors that represent relations with their surface forms, or which jointly embed surface forms with relations from an existing knowledge graph. However, both of these approaches are limited in their ability to generalize. In… ▽ More General purpose relation extractors, which can model arbitrary relations, are a core aspiration in information extraction. Efforts have been made to build general purpose extractors that represent relations with their surface forms, or which jointly embed surface forms with relations from an existing knowledge graph. However, both of these approaches are limited in their ability to generalize. In this paper, we build on extensions of Harris' distributional hypothesis to relations, as well as recent advances in learning text representations (specifically, BERT), to build task agnostic relation representations solely from entity-linked text. We show that these representations significantly outperform previous work on exemplar based relation extraction (FewRel) even without using any of that task's training data. We also show that models initialized with our task agnostic representations, and then tuned on supervised relation extraction datasets, significantly outperform the previous methods on SemEval 2010 Task 8, KBP37, and TACRED. △ Less

Submitted 7 June, 2019; originally announced June 2019.

Comments: To appear at ACL 2019

arXiv:1805.05377 [pdf, other]

Large-Scale QA-SRL Parsing

Authors: Nicholas FitzGerald, Julian Michael, Luheng He, Luke Zettlemoyer

Abstract: We present a new large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations, and the first high-quality QA-SRL parser. Our corpus, QA-SRL Bank 2.0, consists of over 250,000 question-answer pairs for over 64,000 sentences across 3 domains and was gathered with a new crowd-sourcing scheme that we show has high precision and good recall at modest cost. We also present ne… ▽ More We present a new large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations, and the first high-quality QA-SRL parser. Our corpus, QA-SRL Bank 2.0, consists of over 250,000 question-answer pairs for over 64,000 sentences across 3 domains and was gathered with a new crowd-sourcing scheme that we show has high precision and good recall at modest cost. We also present neural models for two QA-SRL subtasks: detecting argument spans for a predicate and generating questions to label the semantic relationship. The best models achieve question accuracy of 82.6% and span-level accuracy of 77.6% (under human evaluation) on the full pipelined QA-SRL prediction task. They can also, as we show, be used to gather additional annotations at low cost. △ Less

Submitted 14 May, 2018; originally announced May 2018.

Comments: 10 pages, 3 figures, 8 tables. Accepted to ACL 2018

arXiv:1805.03716 [pdf, other]

Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

Authors: Omer Levy, Kenton Lee, Nicholas FitzGerald, Luke Zettlemoyer

Abstract: LSTMs were introduced to combat vanishing gradients in simple RNNs by augmenting them with gated additive recurrent connections. We present an alternative view to explain the success of LSTMs: the gates themselves are versatile recurrent models that provide more representational power than previously appreciated. We do this by decoupling the LSTM's gates from the embedded simple RNN, producing a n… ▽ More LSTMs were introduced to combat vanishing gradients in simple RNNs by augmenting them with gated additive recurrent connections. We present an alternative view to explain the success of LSTMs: the gates themselves are versatile recurrent models that provide more representational power than previously appreciated. We do this by decoupling the LSTM's gates from the embedded simple RNN, producing a new class of RNNs where the recurrence computes an element-wise weighted sum of context-independent functions of the input. Ablations on a range of problems demonstrate that the gating mechanism alone performs as well as an LSTM in most settings, strongly suggesting that the gates are doing much more in practice than just alleviating vanishing gradients. △ Less

Submitted 9 May, 2018; originally announced May 2018.

Comments: ACL 2018

arXiv:1206.6423 [pdf]

A Joint Model of Language and Perception for Grounded Attribute Learning

Authors: Cynthia Matuszek, Nicholas FitzGerald, Luke Zettlemoyer, Liefeng Bo, Dieter Fox

Abstract: As robots become more ubiquitous and capable, it becomes ever more important to enable untrained users to easily interact with them. Recently, this has led to study of the language grounding problem, where the goal is to extract representations of the meanings of natural language tied to perception and actuation in the physical world. In this paper, we present an approach for joint learning of lan… ▽ More As robots become more ubiquitous and capable, it becomes ever more important to enable untrained users to easily interact with them. Recently, this has led to study of the language grounding problem, where the goal is to extract representations of the meanings of natural language tied to perception and actuation in the physical world. In this paper, we present an approach for joint learning of language and perception models for grounded attribute induction. Our perception model includes attribute classifiers, for example to detect object color and shape, and the language model is based on a probabilistic categorial grammar that enables the construction of rich, compositional meaning representations. The approach is evaluated on the task of interpreting sentences that describe sets of objects in a physical workspace. We demonstrate accurate task performance and effective latent-variable concept induction in physical grounded scenes. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

arXiv:1206.5344 [pdf, other]

Gain Scheduling Control of Gas Turbine Engines: Absolute Stability by Finding a Common Lyapunov Matrix

Authors: Mehrdad Pakmehr, Nathan Fitzgerald, Eric Feron, Jeff Shamma, Alireza Behbahani

Abstract: This manuscript aims to develop and describe gain scheduling control concept for a gas turbine engine which drives a variable pitch propeller. An architecture for gain-scheduling control is developed that controls the turboshaft engine for large thrust commands in stable fashion with good performance. Fuel ow and propeller pitch angle are the two control inputs of the system. New stability proof h… ▽ More This manuscript aims to develop and describe gain scheduling control concept for a gas turbine engine which drives a variable pitch propeller. An architecture for gain-scheduling control is developed that controls the turboshaft engine for large thrust commands in stable fashion with good performance. Fuel ow and propeller pitch angle are the two control inputs of the system. New stability proof has been developed for gain scheduling control of gas turbine engines using global linearization and LMI techniques. This approach guarantees absolute stability of the closed loop gas turbine engine systems with gain-scheduling controllers. △ Less

Submitted 22 June, 2012; originally announced June 2012.

Comments: 15 pages, 21 figures

Showing 1–19 of 19 results for author: Fitzgerald, N