Skip to main content

Showing 1–20 of 20 results for author: Vilnis, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11409  [pdf, other

    cs.CL cs.AI

    CodeGemma: Open Code Models Based on Gemma

    Authors: CodeGemma Team, Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A. Choquette-Choo, **gyue Shen, Joe Kelley, Kshitij Bansal, Luke Vilnis, Mateo Wirth, Paul Michel, Peter Choy, Pratik Joshi, Ravin Kumar, Sarmad Hashmi, Shubham Agrawal, Zhitao Gong, Jane Fine, Tris Warkentin, Ale Jakse Hartman, Bin Ni, Kathy Korevec , et al. (2 additional authors not shown)

    Abstract: This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: v1: 11 pages, 4 figures, 5 tables. v2: Update metadata

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2308.14903  [pdf, other

    cs.CL

    MEMORY-VQ: Compression for Tractable Internet-Scale Memory

    Authors: Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontañón, William W. Cohen, Sumit Sanghai, Joshua Ainslie

    Abstract: Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations. We propose MEMORY-VQ, a new method to reduce stor… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  4. arXiv:2212.10770  [pdf, other

    cs.CL

    ImPaKT: A Dataset for Open-Schema Knowledge Base Construction

    Authors: Luke Vilnis, Zach Fisher, Bhargav Kanagal, Patrick Murray, Sumit Sanghai

    Abstract: Large language models have ushered in a golden age of semantic parsing. The seq2seq paradigm allows for open-schema and abstractive attribute and relation extraction given only small amounts of finetuning data. Language model pretraining has simultaneously enabled great strides in natural language inference, reasoning about entailment and implication in free text. These advances motivate us to con… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

    Comments: 14 pages. Preprint

  5. arXiv:2210.15458  [pdf, other

    cs.CL cs.LG stat.ML

    Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

    Authors: Luke Vilnis, Yury Zemlyanskiy, Patrick Murray, Alexandre Passos, Sumit Sanghai

    Abstract: Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation. Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and… ▽ More

    Submitted 1 June, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: 17 pages, to appear at ICML 2023

  6. arXiv:2010.04831  [pdf, other

    cs.LG cs.AI stat.ML

    Improving Local Identifiability in Probabilistic Box Embeddings

    Authors: Shib Sankar Dasgupta, Michael Boratko, Dongxu Zhang, Luke Vilnis, Xiang Lorraine Li, Andrew McCallum

    Abstract: Geometric embeddings have recently received attention for their natural ability to represent transitive asymmetric relations via containment. Box embeddings, where objects are represented by n-dimensional hyperrectangles, are a particularly promising example of such an embedding as they are closed under intersection and their volume can be calculated easily, allowing them to naturally represent ca… ▽ More

    Submitted 28 October, 2020; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted at NeurIPS2020

  7. arXiv:1809.10835  [pdf, other

    cs.LG cs.CL stat.ML

    Embedded-State Latent Conditional Random Fields for Sequence Labeling

    Authors: Dung Thai, Sree Harsha Ramesh, Shikhar Murty, Luke Vilnis, Andrew McCallum

    Abstract: Complex textual information extraction tasks are often posed as sequence labeling or \emph{shallow parsing}, where fields are extracted using local labels made consistent through probabilistic inference in a graphical model with constrained transitions. Recently, it has become common to locally parametrize these models using rich features extracted by recurrent neural networks (such as LSTM), whil… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

  8. arXiv:1807.05127  [pdf, other

    cs.CL

    Hierarchical Losses and New Resources for Fine-grained Entity Ty** and Linking

    Authors: Shikhar Murty*, Patrick Verga*, Luke Vilnis, Irena Radovanovic, Andrew McCallum

    Abstract: Extraction from raw text to a knowledge base of entities and fine-grained types is often cast as prediction into a flat set of entity and type labels, neglecting the rich hierarchies over types and entities contained in curated ontologies. Previous attempts to incorporate hierarchical structure have yielded little benefit and are restricted to shallow ontologies. This paper presents new methods us… ▽ More

    Submitted 13 July, 2018; originally announced July 2018.

    Comments: ACL 2018

  9. arXiv:1805.06627  [pdf, other

    stat.ML cs.LG

    Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures

    Authors: Luke Vilnis, Xiang Li, Shikhar Murty, Andrew McCallum

    Abstract: Embedding methods which enforce a partial order or lattice structure over the concept space, such as Order Embeddings (OE) (Vendrov et al., 2016), are a natural way to model transitive relational data (e.g. entailment graphs). However, OE learns a deterministic knowledge base, limiting expressiveness of queries and the ability to use uncertainty for both prediction and learning (e.g. learning from… ▽ More

    Submitted 17 May, 2018; originally announced May 2018.

    Comments: ACL 2018 camera-ready version, 14 pages including appendices

  10. arXiv:1711.05851  [pdf, other

    cs.CL cs.AI

    Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning

    Authors: Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum

    Abstract: Knowledge bases (KB), both automatically and manually constructed, are often incomplete --- many valid facts can be inferred from the KB by synthesizing existing information. A popular approach to KB completion is to infer new relations by combinatory reasoning over the information found along other paths connecting a pair of entities. Given the enormous size of KBs and the exponential number of p… ▽ More

    Submitted 30 December, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

    Comments: ICLR 2018

  11. arXiv:1711.05795  [pdf, other

    cs.CL cs.NE

    Finer Grained Entity Ty** with TypeNet

    Authors: Shikhar Murty, Patrick Verga, Luke Vilnis, Andrew McCallum

    Abstract: We consider the challenging problem of entity ty** over an extremely fine grained set of types, wherein a single mention or entity can have many simultaneous and often hierarchically-structured types. Despite the importance of the problem, there is a relative lack of resources in the form of fine-grained, deep type hierarchies aligned to existing knowledge bases. In response, we introduce TypeNe… ▽ More

    Submitted 15 November, 2017; originally announced November 2017.

    Comments: Accepted at 6th Workshop on Automated Knowledge Base Construction (AKBC) at NIPS 2017

  12. arXiv:1710.00880  [pdf, other

    cs.CL

    Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection

    Authors: Haw-Shiuan Chang, ZiYun Wang, Luke Vilnis, Andrew McCallum

    Abstract: Modeling hypernymy, such as poodle is-a dog, is an important generalization aid to many NLP tasks, such as entailment, coreference, relation extraction, and question answering. Supervised learning from labeled hypernym sources, such as WordNet, limits the coverage of these models, which can be addressed by learning hypernyms from unlabeled text. Existing unsupervised methods either do not scale to… ▽ More

    Submitted 29 May, 2018; v1 submitted 2 October, 2017; originally announced October 2017.

    Comments: NAACL 2018

  13. arXiv:1708.00553  [pdf, ps, other

    cs.CL

    Low-Rank Hidden State Embeddings for Viterbi Sequence Labeling

    Authors: Dung Thai, Shikhar Murty, Trapit Bansal, Luke Vilnis, David Belanger, Andrew McCallum

    Abstract: In textual information extraction and other sequence labeling tasks it is now common to use recurrent neural networks (such as LSTM) to form rich embedded representations of long-term input co-occurrence patterns. Representation of output co-occurrence patterns is typically limited to a hand-designed graphical model, such as a linear-chain CRF representing short-term Markov dependencies among succ… ▽ More

    Submitted 1 August, 2017; originally announced August 2017.

    Comments: 4 pages, ICML 2017 DeepStruct Workshop

  14. arXiv:1708.00549  [pdf, other

    cs.CL stat.ML

    Improved Representation Learning for Predicting Commonsense Ontologies

    Authors: Xiang Li, Luke Vilnis, Andrew McCallum

    Abstract: Recent work in learning ontologies (hierarchical and partially-ordered structures) has leveraged the intrinsic geometry of spaces of learned representations to make predictions that automatically obey complex structural constraints. We explore two extensions of one such model, the order-embedding model for hierarchical relation learning, with an aim towards improved performance on text data for co… ▽ More

    Submitted 1 August, 2017; originally announced August 2017.

    Comments: 4 pages, ICML 2017 DeepStruct Workshop

  15. arXiv:1511.06807  [pdf, other

    stat.ML cs.LG

    Adding Gradient Noise Improves Learning for Very Deep Networks

    Authors: Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens

    Abstract: Deep feedforward and recurrent networks have achieved impressive results in many perception and language processing applications. This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks. The main motivation for these architectural innovations is that they capture better domain knowledge, and importantly are easier to optimize than… ▽ More

    Submitted 20 November, 2015; originally announced November 2015.

  16. arXiv:1511.06349  [pdf, other

    cs.LG cs.CL

    Generating Sentences from a Continuous Space

    Authors: Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio

    Abstract: The standard recurrent neural network language model (RNNLM) generates sentences one word at a time and does not work from an explicit global sentence representation. In this work, we introduce and study an RNN-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences. This factorization allows it to explicitly model holistic properties… ▽ More

    Submitted 12 May, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: First two authors contributed equally. Work was done when all authors were at Google, Inc

    Journal ref: SIGNLL Conference on Computational Natural Language Learning (CONLL), 2016

  17. arXiv:1505.06169  [pdf, other

    cs.CL cs.LG

    Learning Dynamic Feature Selection for Fast Sequential Prediction

    Authors: Emma Strubell, Luke Vilnis, Kate Silverstein, Andrew McCallum

    Abstract: We present paired learning and inference algorithms for significantly reducing computation and increasing speed of the vector dot products in the classifiers that are at the heart of many NLP components. This is accomplished by partitioning the features into a sequence of templates which are ordered such that high confidence can often be reached using only a small fraction of all features. Paramet… ▽ More

    Submitted 22 May, 2015; originally announced May 2015.

    Comments: Appears in The 53rd Annual Meeting of the Association for Computational Linguistics, Bei**g, China, July 2015

  18. arXiv:1503.01397  [pdf, other

    stat.ML cs.CL cs.LG

    Bethe Projections for Non-Local Inference

    Authors: Luke Vilnis, David Belanger, Daniel Sheldon, Andrew McCallum

    Abstract: Many inference problems in structured prediction are naturally solved by augmenting a tractable dependency structure with complex, non-local auxiliary objectives. This includes the mean field family of variational inference algorithms, soft- or hard-constrained inference using Lagrangian relaxation or linear programming, collective graphical models, and forms of semi-supervised learning such as po… ▽ More

    Submitted 28 November, 2016; v1 submitted 4 March, 2015; originally announced March 2015.

    Comments: minor bug fix to appendix. appeared in UAI 2015

  19. arXiv:1412.6623  [pdf, other

    cs.CL cs.LG

    Word Representations via Gaussian Embedding

    Authors: Luke Vilnis, Andrew McCallum

    Abstract: Current work in lexical distributed representations maps each word to a point vector in low-dimensional space. Map** instead to a density provides many interesting advantages, including better capturing uncertainty about a representation and its relationships, expressing asymmetries more naturally than dot product or cosine similarity, and enabling more expressive parameterization of decision bo… ▽ More

    Submitted 1 May, 2015; v1 submitted 20 December, 2014; originally announced December 2014.

    Comments: 12 pages, published as conference paper at ICLR 2015

  20. arXiv:1410.8498  [pdf, other

    cs.CL cs.AI

    Training for Fast Sequential Prediction Using Dynamic Feature Selection

    Authors: Emma Strubell, Luke Vilnis, Andrew McCallum

    Abstract: We present paired learning and inference algorithms for significantly reducing computation and increasing speed of the vector dot products in the classifiers that are at the heart of many NLP components. This is accomplished by partitioning the features into a sequence of templates which are ordered such that high confidence can often be reached using only a small fraction of all features. Paramet… ▽ More

    Submitted 19 December, 2014; v1 submitted 30 October, 2014; originally announced October 2014.

    Comments: 5 pages, NIPS Modern ML + NLP Workshop 2014