Skip to main content

Showing 1–32 of 32 results for author: Santos, C N d

.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2311.10768  [pdf, other

    cs.CL

    Memory Augmented Language Models through Mixture of Word Experts

    Authors: Cicero Nogueira dos Santos, James Lee-Thorp, Isaac Noble, Chung-Ching Chang, David Uthus

    Abstract: Scaling up the number of parameters of language models has proven to be an effective approach to improve performance. For dense models, increasing model size proportionally increases the model's computation footprint. In this work, we seek to aggressively decouple learning capacity and FLOPs through Mixture-of-Experts (MoE) style models with large knowledge-rich vocabulary based routing functions… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 14 pages

  3. arXiv:2306.04009  [pdf, other

    cs.CL cs.AI

    Triggering Multi-Hop Reasoning for Question Answering in Language Models using Soft Prompts and Random Walks

    Authors: Kanishka Misra, Cicero Nogueira dos Santos, Siamak Shakeri

    Abstract: Despite readily memorizing world knowledge about entities, pre-trained language models (LMs) struggle to compose together two or more facts to perform multi-hop reasoning in question-answering tasks. In this work, we propose techniques that improve upon this limitation by relying on random walks over structured knowledge graphs. Specifically, we use soft prompts to guide LMs to chain together thei… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: Findings of ACL 2023

  4. arXiv:2210.04726  [pdf, other

    cs.CL cs.AI cs.LG

    Knowledge Prompts: Injecting World Knowledge into Language Models through Soft Prompts

    Authors: Cicero Nogueira dos Santos, Zhe Dong, Daniel Cer, John Nham, Siamak Shakeri, Jianmo Ni, Yun-hsuan Sung

    Abstract: Soft prompts have been recently proposed as a tool for adapting large frozen language models (LMs) to new tasks. In this work, we repurpose soft prompts to the task of injecting world knowledge into LMs. We introduce a method to train soft prompts via self-supervised learning on data from knowledge bases. The resulting soft knowledge prompts (KPs) are task independent and work as an external memor… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  5. arXiv:2205.12416  [pdf, other

    cs.CL

    Counterfactual Data Augmentation improves Factuality of Abstractive Summarization

    Authors: Dheeraj Rajagopal, Siamak Shakeri, Cicero Nogueira dos Santos, Eduard Hovy, Chung-Ching Chang

    Abstract: Abstractive summarization systems based on pretrained language models often generate coherent but factually inconsistent sentences. In this paper, we present a counterfactual data augmentation approach where we augment data with perturbed summaries that increase the training data diversity. Specifically, we present three augmentation approaches based on replacing (i) entities from other and the sa… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  6. arXiv:2204.11458  [pdf, other

    cs.CL cs.IR

    ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

    Authors: Kai Hui, Honglei Zhuang, Tao Chen, Zhen Qin, **g Lu, Dara Bahri, Ji Ma, Jai Prakash Gupta, Cicero Nogueira dos Santos, Yi Tay, Don Metzler

    Abstract: State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms, however, are not without flaws, i.e., running the model on all query-document pairs at inference-time incurs a significant computational cost. This paper propo… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: Findings of ACL 2022

  7. arXiv:2105.12932  [pdf, other

    cs.IR cs.CL

    Contrastive Fine-tuning Improves Robustness for Neural Rankers

    Authors: Xiaofei Ma, Cicero Nogueira dos Santos, Andrew O. Arnold

    Abstract: The performance of state-of-the-art neural rankers can deteriorate substantially when exposed to noisy inputs or applied to a new domain. In this paper, we present a novel method for fine-tuning neural rankers that can significantly improve their robustness to out-of-domain data and query perturbations. Specifically, a contrastive loss that compares data points in the representation space is combi… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Journal ref: Findings of ACL 2021

  8. arXiv:2105.05052  [pdf, other

    cs.CL cs.LG

    Joint Text and Label Generation for Spoken Language Understanding

    Authors: Yang Li, Ben Athiwaratkun, Cicero Nogueira dos Santos, Bing Xiang

    Abstract: Generalization is a central problem in machine learning, especially when data is limited. Using prior information to enforce constraints is the principled way of encouraging generalization. In this work, we propose to leverage the prior information embedded in pretrained language models (LM) to improve generalization for intent classification and slot labeling tasks with limited training data. Spe… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

  9. arXiv:2105.04623  [pdf, other

    cs.CL cs.AI

    Improving Factual Consistency of Abstractive Summarization via Question Answering

    Authors: Feng Nan, Cicero Nogueira dos Santos, Henghui Zhu, Patrick Ng, Kathleen McKeown, Ramesh Nallapati, Dejiao Zhang, Zhiguo Wang, Andrew O. Arnold, Bing Xiang

    Abstract: A commonly observed problem with the state-of-the art abstractive summarization models is that the generated summaries can be factually inconsistent with the input documents. The fact that automatic summarization may produce plausible-sounding yet inaccurate summaries is a major concern that limits its wide application. In this paper we present an approach to address factual consistency in summari… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: ACL-IJCNLP 2021

  10. arXiv:2104.08744  [pdf, other

    cs.CL

    Generative Context Pair Selection for Multi-hop Question Answering

    Authors: Dheeru Dua, Cicero Nogueira dos Santos, Patrick Ng, Ben Athiwaratkun, Bing Xiang, Matt Gardner, Sameer Singh

    Abstract: Compositional reasoning tasks like multi-hop question answering, require making latent decisions to get the final answer, given a question. However, crowdsourced datasets often capture only a slice of the underlying task distribution, which can induce unanticipated biases in models performing compositional reasoning. Furthermore, discriminatively trained models exploit such biases to get a better… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

  11. arXiv:2102.09130  [pdf, other

    cs.CL cs.AI

    Entity-level Factual Consistency of Abstractive Text Summarization

    Authors: Feng Nan, Ramesh Nallapati, Zhiguo Wang, Cicero Nogueira dos Santos, Henghui Zhu, Dejiao Zhang, Kathleen McKeown, Bing Xiang

    Abstract: A key challenge for abstractive summarization is ensuring factual consistency of the generated summary with respect to the original document. For example, state-of-the-art models trained on existing datasets exhibit entity hallucination, generating names of entities that are not present in the source document. We propose a set of new metrics to quantify the entity-level factual consistency of gene… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

    Comments: EACL 2021

  12. arXiv:2101.05779  [pdf, other

    cs.LG cs.CL

    Structured Prediction as Translation between Augmented Natural Languages

    Authors: Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang, Stefano Soatto

    Abstract: We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. Instead of tackling the problem by training task-specific discri… ▽ More

    Submitted 2 December, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

    Journal ref: International Conference on Learning Representations (ICLR) 2021

  13. arXiv:2012.10309  [pdf, other

    cs.CL

    Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

    Authors: Peng Shi, Patrick Ng, Zhiguo Wang, Henghui Zhu, Alexander Hanbo Li, Jun Wang, Cicero Nogueira dos Santos, Bing Xiang

    Abstract: Most recently, there has been significant interest in learning contextual representations for various NLP tasks, by leveraging large scale text corpora to train large neural language models with self-supervised learning objectives, such as Masked Language Model (MLM). However, based on a pilot study, we observe three issues of existing general-purpose language models when they are applied to text-… ▽ More

    Submitted 18 December, 2020; originally announced December 2020.

    Comments: Accepted to AAAI 2021

  14. arXiv:2011.13137  [pdf, other

    cs.CL cs.AI cs.LG

    Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction

    Authors: Yifan Gao, Henghui Zhu, Patrick Ng, Cicero Nogueira dos Santos, Zhiguo Wang, Feng Nan, Dejiao Zhang, Ramesh Nallapati, Andrew O. Arnold, Bing Xiang

    Abstract: In open-domain question answering, questions are highly likely to be ambiguous because users may not know the scope of relevant topics when formulating them. Therefore, a system needs to find possible interpretations of the question, and predict one or multiple plausible answers. When multiple plausible answers are found, the system should rewrite the question for each answer to resolve the ambigu… ▽ More

    Submitted 30 May, 2021; v1 submitted 26 November, 2020; originally announced November 2020.

    Comments: ACL 2021 main conference, 14 pages, 7 figures. Code will be released at https://github.com/amzn/refuel-open-domain-qa

  15. arXiv:2010.14660  [pdf, other

    cs.CL cs.LG

    DualTKB: A Dual Learning Bridge between Text and Knowledge Base

    Authors: Pierre L. Dognin, Igor Melnyk, Inkit Padhi, Cicero Nogueira dos Santos, Payel Das

    Abstract: In this work, we present a dual learning approach for unsupervised text to path and path to text transfers in Commonsense Knowledge Bases (KBs). We investigate the impact of weak supervision by creating a weakly supervised dataset and show that even a slight amount of supervision can significantly improve the model performance and enable better-quality transfers. We examine different model archite… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Equal Contributions of Authors Pierre L. Dognin, Igor Melnyk, and Inkit Padhi. Accepted at EMNLP'20

  16. arXiv:2010.06028  [pdf, other

    cs.CL

    End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

    Authors: Siamak Shakeri, Cicero Nogueira dos Santos, Henry Zhu, Patrick Ng, Feng Nan, Zhiguo Wang, Ramesh Nallapati, Bing Xiang

    Abstract: We propose an end-to-end approach for synthetic QA data generation. Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token. The likelihood produced in the generation process is used as a filtering… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  17. arXiv:2010.03073  [pdf, other

    cs.CL cs.IR

    Beyond [CLS] through Ranking by Generation

    Authors: Cicero Nogueira dos Santos, Xiaofei Ma, Ramesh Nallapati, Zhiheng Huang, Bing Xiang

    Abstract: Generative models for Information Retrieval, where ranking of documents is viewed as the task of generating a query from a document's language model, were very successful in various IR tasks in the past. However, with the advent of modern deep neural networks, attention has shifted to discriminative ranking functions that model the semantic similarity of documents and queries instead. Recently, de… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  18. arXiv:2009.13272  [pdf, other

    cs.CL cs.LG stat.ML

    Augmented Natural Language for Generative Sequence Labeling

    Authors: Ben Athiwaratkun, Cicero Nogueira dos Santos, Jason Krone, Bing Xiang

    Abstract: We propose a generative framework for joint sequence labeling and sentence-level classification. Our model performs multiple sequence labeling tasks at once using a single, shared natural language output space. Unlike prior discriminative methods, our model naturally incorporates label semantics and shares knowledge across tasks. Our framework is general purpose, performing well on few-shot, low-r… ▽ More

    Submitted 15 September, 2020; originally announced September 2020.

    Comments: To appear at EMNLP 2020

  19. arXiv:2009.10270  [pdf, other

    cs.IR

    Embedding-based Zero-shot Retrieval through Query Generation

    Authors: Davis Liang, Peng Xu, Siamak Shakeri, Cicero Nogueira dos Santos, Ramesh Nallapati, Zhiheng Huang, Bing Xiang

    Abstract: Passage retrieval addresses the problem of locating relevant passages, usually from a large corpus, given a query. In practice, lexical term-matching algorithms like BM25 are popular choices for retrieval owing to their efficiency. However, term-based matching algorithms often miss relevant passages that have no lexical overlap with the query and cannot be finetuned to downstream datasets. In this… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  20. arXiv:2005.03588  [pdf, other

    cs.CL cs.LG

    Learning Implicit Text Generation via Feature Matching

    Authors: Inkit Padhi, Pierre Dognin, Ke Bai, Cicero Nogueira dos Santos, Vijil Chenthamarakshan, Youssef Mroueh, Payel Das

    Abstract: Generative feature matching network (GFMN) is an approach for training implicit generative models for images by performing moment matching on features from pre-trained neural networks. In this paper, we present new GFMN formulations that are effective for sequential data. Our experimental results show the effectiveness of the proposed method, SeqGFMN, for three distinct generation tasks in English… ▽ More

    Submitted 8 May, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  21. arXiv:2002.02369  [pdf, other

    eess.IV cs.AI cs.CV

    Covering the News with (AI) Style

    Authors: Michele Merler, Cicero Nogueira dos Santos, Mauro Martino, Alfio M. Gliozzo, John R. Smith

    Abstract: We introduce a multi-modal discriminative and generative frame-work capable of assisting humans in producing visual content re-lated to a given theme, starting from a collection of documents(textual, visual, or both). This framework can be used by edit or to generate images for articles, as well as books or music album covers. Motivated by a request from the The New York Times (NYT) seeking help t… ▽ More

    Submitted 5 January, 2020; originally announced February 2020.

  22. arXiv:1904.02762  [pdf, other

    cs.CV cs.LG

    Learning Implicit Generative Models by Matching Perceptual Features

    Authors: Cicero Nogueira dos Santos, Youssef Mroueh, Inkit Padhi, Pierre Dognin

    Abstract: Perceptual features (PFs) have been used with great success in tasks such as transfer learning, style transfer, and super-resolution. However, the efficacy of PFs as key source of information for learning generative models is not well studied. We investigate here the use of PFs in the context of learning implicit generative models through moment matching (MM). More specifically, we propose a new e… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

    Comments: 16 pages

    Journal ref: ICCV 2019

  23. arXiv:1805.07685  [pdf, other

    cs.CL cs.LG

    Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer

    Authors: Cicero Nogueira dos Santos, Igor Melnyk, Inkit Padhi

    Abstract: We introduce a new approach to tackle the problem of offensive language in online social media. Our approach uses unsupervised text style transfer to translate offensive sentences into non-offensive ones. We propose a new method for training encoder-decoders using non-parallel data that combines a collaborative classifier, attention and the cycle consistency loss. Experimental results on data from… ▽ More

    Submitted 19 May, 2018; originally announced May 2018.

    Comments: ACL 2018

  24. arXiv:1805.04893  [pdf, other

    cs.CL

    Neural Coreference Resolution with Deep Biaffine Attention by Joint Mention Detection and Mention Clustering

    Authors: Rui Zhang, Cicero Nogueira dos Santos, Michihiro Yasunaga, Bing Xiang, Dragomir Radev

    Abstract: Coreference resolution aims to identify in a text all mentions that refer to the same real-world entity. The state-of-the-art end-to-end neural coreference model considers all text spans in a document as potential mentions and learns to link an antecedent for each possible mention. In this paper, we propose to improve the end-to-end coreference resolution system by (1) using a biaffine attention m… ▽ More

    Submitted 13 May, 2018; originally announced May 2018.

    Comments: ACL2018

  25. arXiv:1711.09395  [pdf, other

    cs.CL cs.AI cs.LG

    Improved Neural Text Attribute Transfer with Non-parallel Data

    Authors: Igor Melnyk, Cicero Nogueira dos Santos, Kahini Wadhawan, Inkit Padhi, Abhishek Kumar

    Abstract: Text attribute transfer using non-parallel data requires methods that can perform disentanglement of content and linguistic attributes. In this work, we propose multiple improvements over the existing approaches that enable the encoder-decoder framework to cope with the text attribute transfer from non-parallel data. We perform experiments on the sentiment transfer task using two datasets. For bot… ▽ More

    Submitted 4 December, 2017; v1 submitted 26 November, 2017; originally announced November 2017.

    Comments: NIPS 2017 Workshop on Learning Disentangled Representations: from Perception to Control

  26. arXiv:1708.04326  [pdf, ps, other

    cs.IR

    Improved Answer Selection with Pre-Trained Word Embeddings

    Authors: Rishav Chakravarti, Jiri Navratil, Cicero Nogueira dos Santos

    Abstract: This paper evaluates existing and newly proposed answer selection methods based on pre-trained word embeddings. Word embeddings are highly effective in various natural language processing tasks and their integration into traditional information retrieval (IR) systems allows for the capture of semantic relatedness between questions and answers. Empirical results on three publicly available data set… ▽ More

    Submitted 14 August, 2017; originally announced August 2017.

  27. arXiv:1707.02198  [pdf, other

    cs.LG

    Learning Loss Functions for Semi-supervised Learning via Discriminative Adversarial Networks

    Authors: Cicero Nogueira dos Santos, Kahini Wadhawan, Bowen Zhou

    Abstract: We propose discriminative adversarial networks (DAN) for semi-supervised learning and loss function learning. Our DAN approach builds upon generative adversarial networks (GANs) and conditional GANs but includes the key differentiator of using two discriminators instead of a generator and a discriminator. DAN can be seen as a framework to learn loss functions for predictors that also implements se… ▽ More

    Submitted 7 July, 2017; originally announced July 2017.

    Comments: 11 pages

  28. arXiv:1703.03130  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    A Structured Self-attentive Sentence Embedding

    Authors: Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio

    Abstract: This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with a… ▽ More

    Submitted 8 March, 2017; originally announced March 2017.

    Comments: 15 pages with appendix, 7 figures, 4 tables. Conference paper in 5th International Conference on Learning Representations (ICLR 2017)

  29. arXiv:1602.06023  [pdf, other

    cs.CL

    Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

    Authors: Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang

    Abstract: In this work, we model abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora. We propose several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-… ▽ More

    Submitted 26 August, 2016; v1 submitted 18 February, 2016; originally announced February 2016.

    Journal ref: The SIGNLL Conference on Computational Natural Language Learning (CoNLL), 2016

  30. Dirac equation and the Melvin Metric

    Authors: Luis C. Nunes dos Santos, Celso C. Barros Jr

    Abstract: A relativistic wave equation for spin 1/2 particles in the Melvin space-time, a space-time where the metric is determined by a magnectic field, is obtained. The effects of very intense magnetic fields in the energy levels, as intense as the ones expected to be produced in ultra-relativistic heavy-ion collisions, are investigated.

    Submitted 21 January, 2017; v1 submitted 28 August, 2015; originally announced August 2015.

    Comments: 12 pages, 3 figures

  31. arXiv:1505.05008  [pdf, other

    cs.CL

    Boosting Named Entity Recognition with Neural Character Embeddings

    Authors: Cicero Nogueira dos Santos, Victor GuimarĂ£es

    Abstract: Most state-of-the-art named entity recognition (NER) systems rely on handcrafted features and on the output of other NLP tasks such as part-of-speech (POS) tagging and text chunking. In this work we propose a language-independent NER system that uses automatically learned features only. Our approach is based on the CharWNN deep neural network, which uses word-level and character-level representati… ▽ More

    Submitted 25 May, 2015; v1 submitted 19 May, 2015; originally announced May 2015.

    Comments: 9 pages

  32. arXiv:1504.06580  [pdf, other

    cs.CL cs.LG cs.NE

    Classifying Relations by Ranking with Convolutional Neural Networks

    Authors: Cicero Nogueira dos Santos, Bing Xiang, Bowen Zhou

    Abstract: Relation classification is an important semantic processing task for which state-ofthe-art systems still rely on costly handcrafted features. In this work we tackle the relation classification task using a convolutional neural network that performs classification by ranking (CR-CNN). We propose a new pairwise ranking loss function that makes it easy to reduce the impact of artificial classes. We p… ▽ More

    Submitted 24 May, 2015; v1 submitted 24 April, 2015; originally announced April 2015.

    Comments: Accepted as a long paper in the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015)