Skip to main content

Showing 1–50 of 115 results for author: Socher, R

.
  1. arXiv:2402.06196  [pdf, other

    cs.CL cs.AI

    Large Language Models: A Survey

    Authors: Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao

    Abstract: Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffman… ▽ More

    Submitted 20 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2401.14423

  2. arXiv:2203.12187  [pdf, other

    cs.CL cs.AI

    Converse: A Tree-Based Modular Task-Oriented Dialogue System

    Authors: Tian Xie, Xinyi Yang, Angela S. Lin, Feihong Wu, Kazuma Hashimoto, ** Qu, Young Mo Kang, Wenpeng Yin, Huan Wang, Semih Yavuz, Gang Wu, Michael Jones, Richard Socher, Yingbo Zhou, Wenhao Liu, Caiming Xiong

    Abstract: Creating a system that can have meaningful conversations with humans to help accomplish tasks is one of the ultimate goals of Artificial Intelligence (AI). It has defined the meaning of AI since the beginning. A lot has been accomplished in this area recently, with voice assistant products entering our daily lives and chat bot systems becoming commonplace in customer service. At first glance there… ▽ More

    Submitted 9 May, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

  3. arXiv:2108.02755  [pdf, other

    cs.LG econ.GN

    The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning

    Authors: Stephan Zheng, Alexander Trott, Sunil Srinivasa, David C. Parkes, Richard Socher

    Abstract: AI and reinforcement learning (RL) have improved many areas, but are not yet widely adopted in economic policy design, mechanism design, or economics at large. At the same time, current economic methodology is limited by a lack of counterfactual data, simplistic behavioral models, and limited opportunities to experiment with policies and evaluate behavioral responses. Here we show that machine-lea… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: Substantial Extension of arXiv:2004.13332. SZ and AT contributed equally

  4. arXiv:2106.03357  [pdf, other

    stat.ML cs.LG

    Evaluating State-of-the-Art Classification Models Against Bayes Optimality

    Authors: Ryan Theisen, Huan Wang, Lav R. Varshney, Caiming Xiong, Richard Socher

    Abstract: Evaluating the inherent difficulty of a given data-driven classification problem is important for establishing absolute benchmarks and evaluating progress in the field. To this end, a natural quantity to consider is the \emph{Bayes error}, which measures the optimal classification error theoretically achievable for a given data distribution. While generally an intractable quantity, we show that we… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  5. arXiv:2012.14193  [pdf, other

    cs.LG stat.ML

    Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization

    Authors: Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, Krzysztof Geras

    Abstract: The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function. For instance, using a small learning rate does not guarantee stable optimization because the optimization trajectory has a tendency to steer towards regions of the loss surface with increasing local curvature. We ask whether this tendency is connected to the widely observed phenomen… ▽ More

    Submitted 11 June, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

    Comments: The last two authors contributed equally. Accepted to the International Conference on Machine Learning 2021

  6. arXiv:2012.12627  [pdf, other

    cs.CL cs.AI cs.DB cs.LG

    Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing

    Authors: Xi Victoria Lin, Richard Socher, Caiming Xiong

    Abstract: We present BRIDGE, a powerful sequential architecture for modeling dependencies between natural language questions and relational databases in cross-DB semantic parsing. BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question. The hybrid sequence is encoded by BERT with minimal subsequent layers and the… ▽ More

    Submitted 30 December, 2020; v1 submitted 23 December, 2020; originally announced December 2020.

    Comments: EMNLP Findings 2020 long paper extended; 23 pages

  7. arXiv:2010.13009  [pdf, other

    cs.CL cs.AI

    Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference

    Authors: Jian-Guo Zhang, Kazuma Hashimoto, Wenhao Liu, Chien-Sheng Wu, Yao Wan, Philip S. Yu, Richard Socher, Caiming Xiong

    Abstract: Intent detection is one of the core components of goal-oriented dialog systems, and detecting out-of-scope (OOS) intents is also a practically important skill. Few-shot learning is attracting much attention to mitigate data scarcity, but OOS detection becomes even more challenging. In this paper, we present a simple yet effective approach, discriminative nearest neighbor classification with deep s… ▽ More

    Submitted 24 October, 2020; originally announced October 2020.

    Comments: 19 pages, accepted by EMNLP 2020 main conference as a long paper. Code will be available at https://github.com/salesforce/DNNC-few-shot-intent

  8. arXiv:2010.11545  [pdf, other

    cs.LG

    Online Structured Meta-learning

    Authors: Huaxiu Yao, Yingbo Zhou, Mehrdad Mahdavi, Zhenhui Li, Richard Socher, Caiming Xiong

    Abstract: Learning quickly is of great importance for machine intelligence deployed in online platforms. With the capability of transferring knowledge from learned tasks, meta-learning has shown its effectiveness in online scenarios by continuously updating the model with the learned prior. However, current online meta-learning algorithms are limited to learn a globally-shared meta-learner, which may lead t… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: Accepted by NeurIPS 2020

  9. arXiv:2010.09030  [pdf, other

    cs.CL cs.LG

    Explaining and Improving Model Behavior with k Nearest Neighbor Representations

    Authors: Nazneen Fatema Rajani, Ben Krause, Wengpeng Yin, Tong Niu, Richard Socher, Caiming Xiong

    Abstract: Interpretability techniques in NLP have mainly focused on understanding individual predictions using attention visualization or gradient-based saliency maps over tokens. We propose using k nearest neighbor (kNN) representations to identify training examples responsible for a model's predictions and obtain a corpus-level understanding of the model's behavior. Apart from interpretability, we show th… ▽ More

    Submitted 18 October, 2020; originally announced October 2020.

  10. arXiv:2010.07126  [pdf

    cs.AI

    Explaining Creative Artifacts

    Authors: Lav R. Varshney, Nazneen Fatema Rajani, Richard Socher

    Abstract: Human creativity is often described as the mental process of combining associative elements into a new form, but emerging computational creativity algorithms may not operate in this manner. Here we develop an inverse problem formulation to deconstruct the products of combinatorial and compositional creativity into associative chains as a form of post-hoc interpretation that matches the human creat… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: 2020 Workshop on Human Interpretability in Machine Learning (WHI), at ICML 2020

  11. arXiv:2010.02584  [pdf, other

    cs.CL

    Universal Natural Language Processing with Limited Annotations: Try Few-shot Textual Entailment as a Start

    Authors: Wenpeng Yin, Nazneen Fatema Rajani, Dragomir Radev, Richard Socher, Caiming Xiong

    Abstract: A standard way to address different NLP problems is by first constructing a problem-specific dataset, then building a model to fit this dataset. To build the ultimate artificial intelligence, we desire a single machine that can handle diverse new problems, for which task-specific annotations are limited. We bring up textual entailment as a unified solver for such NLP problems. However, current res… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: EMNLP2020 Long, camera-ready

  12. arXiv:2009.13845  [pdf, other

    cs.CL cs.AI

    GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

    Authors: Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir Radev, Richard Socher, Caiming Xiong

    Abstract: We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We construct synthetic question-SQL pairs over high-quality tables via a synchronous context-free grammar (SCFG) induced from existing text-to-SQL datasets. We pre-train our model on the synthetic data using a novel te… ▽ More

    Submitted 28 May, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

    Comments: 16 pages; Accepted to ICLR 2021

  13. arXiv:2009.10056  [pdf, other

    cs.CL

    Composed Variational Natural Language Generation for Few-shot Intents

    Authors: Congying Xia, Caiming Xiong, Philip Yu, Richard Socher

    Abstract: In this paper, we focus on generating training examples for few-shot intents in the realistic imbalanced scenario. To build connections between existing many-shot intents and few-shot intents, we consider an intent as a combination of a domain and an action, and propose a composed variational natural language generator (CLANG), a transformer-based conditional variational autoencoder. CLANG utilize… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

    Comments: 10 pages, accepted to Findings of EMNLP 2020

  14. arXiv:2009.06367  [pdf, other

    cs.CL cs.LG

    GeDi: Generative Discriminator Guided Sequence Generation

    Authors: Ben Krause, Akhilesh Deepak Gotmare, Bryan McCann, Nitish Shirish Keskar, Shafiq Joty, Richard Socher, Nazneen Fatema Rajani

    Abstract: While large-scale language models (LMs) are able to imitate the distribution of natural language well enough to generate realistic text, it is difficult to control which regions of the distribution they generate. This is especially problematic because datasets used for training large LMs usually contain significant toxicity, hate, bias, and negativity. We propose GeDi as an efficient method for us… ▽ More

    Submitted 22 October, 2020; v1 submitted 14 September, 2020; originally announced September 2020.

  15. arXiv:2009.04087  [pdf, other

    cs.CL

    Central Yup'ik and Machine Translation of Low-Resource Polysynthetic Languages

    Authors: Christopher Liu, Laura Dominé, Kevin Chavez, Richard Socher

    Abstract: Machine translation tools do not yet exist for the Yup'ik language, a polysynthetic language spoken by around 8,000 people who live primarily in Southwest Alaska. We compiled a parallel text corpus for Yup'ik and English and developed a morphological parser for Yup'ik based on grammar rules. We trained a seq2seq neural machine translation model with attention to translate Yup'ik input into English… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

  16. arXiv:2007.15280  [pdf, other

    cs.CL cs.AI cs.DB

    Photon: A Robust Cross-Domain Text-to-SQL System

    Authors: Jichuan Zeng, Xi Victoria Lin, Caiming Xiong, Richard Socher, Michael R. Lyu, Irwin King, Steven C. H. Hoi

    Abstract: Natural language interfaces to databases (NLIDB) democratize end user access to relational data. Due to fundamental differences between natural language communication and programming, it is common for end users to issue questions that are ambiguous to the system or fall outside the semantic scope of its underlying query language. We present Photon, a robust, modular, cross-domain NLIDB that can fl… ▽ More

    Submitted 3 August, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: ACL 2020 system demonstration paper extended . The first two authors contributed equally to this work

  17. arXiv:2007.12626  [pdf, other

    cs.CL

    SummEval: Re-evaluating Summarization Evaluation

    Authors: Alexander R. Fabbri, Wojciech Kryściński, Bryan McCann, Caiming Xiong, Richard Socher, Dragomir Radev

    Abstract: The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization mode… ▽ More

    Submitted 1 February, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

    Comments: 11 pages, 4 tables, 2 figures; pre-MIT Press publication version

  18. arXiv:2007.02871  [pdf, other

    cs.CL

    DART: Open-Domain Structured Data Record to Text Generation

    Authors: Linyong Nan, Dragomir Radev, Rui Zhang, Amrit Rau, Abhinand Sivaprasad, Chiachun Hsieh, Xiangru Tang, Aadit Vyas, Neha Verma, Pranav Krishna, Yangxiaokang Liu, Nadia Irwanto, Jessica Pan, Faiaz Rahman, Ahmad Zaidi, Mutethia Mutuma, Yasin Tarabar, Ankit Gupta, Tao Yu, Yi Chern Tan, Xi Victoria Lin, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani

    Abstract: We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploi… ▽ More

    Submitted 12 April, 2021; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: NAACL 2021

  19. arXiv:2006.16537  [pdf, other

    cs.LG cs.CV math.OC stat.ML

    Theory-Inspired Path-Regularized Differential Network Architecture Search

    Authors: Pan Zhou, Caiming Xiong, Richard Socher, Steven C. H. Hoi

    Abstract: Despite its high search efficiency, differential architecture search (DARTS) often selects network architectures with dominated skip connections which lead to performance degradation. However, theoretical understandings on this issue remain absent, hindering the development of more advanced methods in a principled way. In this work, we solve this problem by theoretically analyzing the effects of v… ▽ More

    Submitted 12 October, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020 (oral)

  20. arXiv:2006.15222  [pdf, other

    cs.CL cs.LG q-bio.BM

    BERTology Meets Biology: Interpreting Attention in Protein Language Models

    Authors: Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani

    Abstract: Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability. In this work, we demonstrate a set of methods for analyzing protein Transformer models through the lens of attention. We show that attention: (1) captures the folding structure of proteins, connecting amino aci… ▽ More

    Submitted 28 March, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

    Comments: To appear in ICLR 2021

    ACM Class: I.2

  21. arXiv:2006.13436  [pdf, ps, other

    cs.LG stat.ML

    Towards Understanding Hierarchical Learning: Benefits of Neural Representations

    Authors: Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher

    Abstract: Deep neural networks can empirically perform efficient hierarchical learning, in which the layers learn useful representations of the data. However, how they make use of the intermediate representations are not explained by recent theories that relate them to "shallow learners" such as kernels. In this work, we demonstrate that intermediate neural representations add more flexibility to neural net… ▽ More

    Submitted 5 March, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: 41 pages, published in NeurIPS 2020

  22. arXiv:2006.13425  [pdf, other

    cs.CL

    A High-Quality Multilingual Dataset for Structured Documentation Translation

    Authors: Kazuma Hashimoto, Raffaella Buschiazzo, James Bradbury, Teresa Marshall, Richard Socher, Caiming Xiong

    Abstract: This paper presents a high-quality multilingual dataset for the documentation domain to advance research on localization of structured text. Unlike widely-used datasets for translation of plain text, we collect XML-structured parallel text segments from the online documentation for an enterprise software platform. These Web pages have been professionally translated from English into 16 languages a… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

    Comments: Published at WMT2019; the draft has been updated with our dataset's URL: https://github.com/salesforce/localization-xml-mt

  23. arXiv:2006.09595  [pdf, other

    cs.IR cs.AI cs.CL

    CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization

    Authors: Andre Esteva, Anuprit Kale, Romain Paulus, Kazuma Hashimoto, Wenpeng Yin, Dragomir Radev, Richard Socher

    Abstract: The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. As of May 2020, 128,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset Challenge. Here we present CO-Search, a retriever-ranker… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

  24. arXiv:2006.03732  [pdf, other

    cs.CV

    WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos

    Authors: Mingfei Gao, Yingbo Zhou, Ran Xu, Richard Socher, Caiming Xiong

    Abstract: Online action detection in untrimmed videos aims to identify an action as it happens, which makes it very important for real-time applications. Previous methods rely on tedious annotations of temporal action boundaries for training, which hinders the scalability of online action detection systems. We propose WOAD, a weakly supervised framework that can be trained using only video-class labels. WOA… ▽ More

    Submitted 18 May, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: CVPR2021

  25. arXiv:2005.12484  [pdf, other

    cs.CL

    Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading

    Authors: Yifan Gao, Chien-Sheng Wu, Shafiq Joty, Caiming Xiong, Richard Socher, Irwin King, Michael R. Lyu, Steven C. H. Hoi

    Abstract: The goal of conversational machine reading is to answer user questions given a knowledge base text which may require asking clarification questions. Existing approaches are limited in their decision making due to struggles in extracting question-related rules and reasoning about them. In this paper, we present a new framework of conversational machine reading that comprises a novel Explicit Memory… ▽ More

    Submitted 23 June, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

    Comments: ACL 2020, 11 pages, 3 figures

  26. arXiv:2005.04364  [pdf, other

    cs.CL cs.AI cs.CY cs.LG cs.NE

    It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations

    Authors: Samson Tan, Shafiq Joty, Min-Yen Kan, Richard Socher

    Abstract: Training on only perfect Standard English corpora predisposes pre-trained neural networks to discriminate against minorities from non-standard linguistic backgrounds (e.g., African American Vernacular English, Colloquial Singapore English, etc.). We perturb the inflectional morphology of words to craft plausible and semantically similar adversarial examples that expose these biases in popular NLP… ▽ More

    Submitted 9 May, 2020; originally announced May 2020.

    Comments: To appear in the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)

  27. arXiv:2005.00796  [pdf, other

    cs.CL

    A Simple Language Model for Task-Oriented Dialogue

    Authors: Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, Richard Socher

    Abstract: Task-oriented dialogue is often decomposed into three tasks: understanding user input, deciding actions, and generating a response. While such decomposition might suggest a dedicated model for each sub-task, we find a simple, unified approach leads to state-of-the-art performance on the MultiWOZ dataset. SimpleTOD is a simple approach to task-oriented dialogue that uses a single, causal language m… ▽ More

    Submitted 12 April, 2022; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: 22 Pages, 2 figures, 16 tables

  28. arXiv:2005.00730  [pdf, other

    cs.CL cs.LG

    ESPRIT: Explaining Solutions to Physical Reasoning Tasks

    Authors: Nazneen Fatema Rajani, Rui Zhang, Yi Chern Tan, Stephan Zheng, Jeremy Weiss, Aadit Vyas, Abhijit Gupta, Caiming XIong, Richard Socher, Dragomir Radev

    Abstract: Neural networks lack the ability to reason about qualitative physics and so cannot generalize to scenarios and tasks unseen during training. We propose ESPRIT, a framework for commonsense reasoning about qualitative physics in natural language that generates interpretable descriptions of physical events. We use a two-step approach of first identifying the pivotal physical events in an environment… ▽ More

    Submitted 13 May, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  29. arXiv:2004.13332  [pdf, other

    econ.GN cs.LG stat.ML

    The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies

    Authors: Stephan Zheng, Alexander Trott, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, David C. Parkes, Richard Socher

    Abstract: Tackling real-world socio-economic challenges requires designing and testing economic policies. However, this is hard in practice, due to a lack of appropriate (micro-level) economic data and limited opportunity to experiment. In this work, we train social planners that discover tax policies in dynamic economies that can effectively trade-off economic equality and productivity. We propose a two-le… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Comments: 46 pages, 21 figures

  30. arXiv:2004.06871  [pdf, other

    cs.CL

    TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue

    Authors: Chien-Sheng Wu, Steven Hoi, Richard Socher, Caiming Xiong

    Abstract: The underlying difference of linguistic patterns between general text and task-oriented dialogue makes existing pre-trained language models less useful in practice. In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling. To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modelin… ▽ More

    Submitted 1 October, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

    Comments: EMNLP 2020 camera-ready

  31. arXiv:2004.04290  [pdf, other

    eess.AS

    An investigation of phone-based subword units for end-to-end speech recognition

    Authors: Weiran Wang, Guangsen Wang, Aadyot Bhatnagar, Yingbo Zhou, Caiming Xiong, Richard Socher

    Abstract: Phones and their context-dependent variants have been the standard modeling units for conventional speech recognition systems, while characters and subwords have demonstrated their effectiveness for end-to-end recognition systems. We investigate the use of phone-based subwords, in particular, byte pair encoder (BPE), as modeling units for end-to-end speech recognition. In addition, we also develop… ▽ More

    Submitted 21 June, 2021; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: Interspeech 2020 final version. Implementation for reproducing the results can be found at: https://github.com/salesforce/transformerasr

  32. arXiv:2004.03497  [pdf, other

    q-bio.BM cs.LG stat.ML

    ProGen: Language Modeling for Protein Generation

    Authors: Ali Madani, Bryan McCann, Nikhil Naik, Nitish Shirish Keskar, Namrata Anand, Raphael R. Eguchi, Po-Ssu Huang, Richard Socher

    Abstract: Generative modeling for protein engineering is key to solving fundamental problems in synthetic biology, medicine, and material science. We pose protein engineering as an unsupervised sequence generation problem in order to leverage the exponentially growing set of proteins that lack costly, structural annotations. We train a 1.2B-parameter language model, ProGen, on ~280M protein sequences condit… ▽ More

    Submitted 7 March, 2020; originally announced April 2020.

  33. arXiv:2003.13525  [pdf, other

    cs.CV cs.LG

    Improving out-of-distribution generalization via multi-task self-supervised pretraining

    Authors: Isabela Albuquerque, Nikhil Naik, Junnan Li, Nitish Keskar, Richard Socher

    Abstract: Self-supervised feature representations have been shown to be useful for supervised classification, few-shot learning, and adversarial robustness. We show that features obtained using self-supervised learning are comparable to, or better than, supervised learning for domain generalization in computer vision. We introduce a new self-supervised pretext task of predicting responses to Gabor filter ba… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  34. arXiv:2003.01285  [pdf, other

    cs.CV

    Towards Noise-resistant Object Detection with Noisy Annotations

    Authors: Junnan Li, Caiming Xiong, Richard Socher, Steven Hoi

    Abstract: Training deep object detectors requires significant amount of human-annotated images with accurate object labels and bounding box coordinates, which are extremely expensive to acquire. Noisy annotations are much more easily accessible, but they could be detrimental for learning. We address the challenging problem of training object detectors with noisy annotations, where the noise contains a mixtu… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  35. arXiv:2002.09046  [pdf, other

    stat.ML cs.LG

    Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning

    Authors: Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

    Abstract: We introduce a parameterization method called Neural Bayes which allows computing statistical quantities that are in general difficult to compute and opens avenues for formulating new objectives for unsupervised representation learning. Specifically, given an observed random variable $\mathbf{x}$ and a latent discrete variable $z$, we can express $p(\mathbf{x}|z)$, $p(z|\mathbf{x})$ and $p(z)$ in… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

  36. arXiv:2002.08046  [pdf, other

    cs.LG cs.CL

    Tree-structured Attention with Hierarchical Accumulation

    Authors: Xuan-Phi Nguyen, Shafiq Joty, Steven C. H. Hoi, Richard Socher

    Abstract: Incorporating hierarchical structures like constituency trees has been shown to be effective for various natural language processing (NLP) tasks. However, it is evident that state-of-the-art (SOTA) sequence-based models like the Transformer struggle to encode such structures inherently. On the other hand, dedicated models like the Tree-LSTM, while explicitly modeling hierarchical structures, do no… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: ICLR 2020

  37. arXiv:2002.08024  [pdf, other

    cs.CL cs.AI cs.LG

    Non-Autoregressive Dialog State Tracking

    Authors: Hung Le, Richard Socher, Steven C. H. Hoi

    Abstract: Recent efforts in Dialogue State Tracking (DST) for task-oriented dialogues have progressed toward open-vocabulary or generation-based approaches where the models can generate slot value candidates from the dialogue history itself. These approaches have shown good performance gain, especially in complicated dialogue domains with dynamic slot values. However, they fall short in two aspects: (1) the… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: Accepted at ICLR 2020. International Conference on Learning Representations (2020)

  38. arXiv:2002.07394  [pdf, other

    cs.CV

    DivideMix: Learning with Noisy Labels as Semi-supervised Learning

    Authors: Junnan Li, Richard Socher, Steven C. H. Hoi

    Abstract: Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reducing the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

    Journal ref: International Conference on Learning Representations, 2020

  39. arXiv:2002.04010  [pdf, other

    cs.LG stat.ML

    Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width

    Authors: Yu Bai, Ben Krause, Huan Wang, Caiming Xiong, Richard Socher

    Abstract: We propose \emph{Taylorized training} as an initiative towards better understanding neural network training at finite width. Taylorized training involves training the $k$-th order Taylor expansion of the neural network at initialization, and is a principled extension of linearized training---a recently proposed theory for understanding the success of deep learning. We experiment with Taylorized… ▽ More

    Submitted 24 February, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  40. arXiv:2002.03647  [pdf, other

    cs.LG cs.AI stat.ML

    Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills

    Authors: Víctor Campos, Alexander Trott, Caiming Xiong, Richard Socher, Xavier Giro-i-Nieto, Jordi Torres

    Abstract: Acquiring abilities in the absence of a task-oriented reward function is at the frontier of reinforcement learning research. This problem has been studied through the lens of empowerment, which draws a connection between option discovery and information theory. Information-theoretic skill discovery methods have garnered much interest from the community, but little research has been conducted in un… ▽ More

    Submitted 3 August, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

    Comments: 17 pages, 11 figures. Code is publicly available at https://github.com/victorcampos7/edl

  41. arXiv:2002.03438  [pdf, ps, other

    cs.CL cs.CY cs.LG

    Limits of Detecting Text Generated by Large-Scale Language Models

    Authors: Lav R. Varshney, Nitish Shirish Keskar, Richard Socher

    Abstract: Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a s… ▽ More

    Submitted 9 February, 2020; originally announced February 2020.

    Comments: ITA 2020

  42. arXiv:1912.05086  [pdf, other

    cs.CV

    Learning from Noisy Anchors for One-stage Object Detection

    Authors: Hengduo Li, Zuxuan Wu, Chen Zhu, Caiming Xiong, Richard Socher, Larry S. Davis

    Abstract: State-of-the-art object detectors rely on regressing and classifying an extensive list of possible anchors, which are divided into positive and negative samples based on their intersection-over-union (IoU) with corresponding groundtruth objects. Such a harsh split conditioned on IoU results in binary labels that are potentially noisy and challenging for training. In this paper, we propose to mitig… ▽ More

    Submitted 28 May, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: CVPR 2020 camera ready

  43. arXiv:1911.10470  [pdf, other

    cs.CL

    Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering

    Authors: Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong

    Abstract: Answering questions that require multi-hop reasoning at web-scale necessitates retrieving multiple evidence documents, one of which often has little lexical or semantic relationship to the question. This paper introduces a new graph-based recurrent retrieval approach that learns to retrieve reasoning paths over the Wikipedia graph to answer multi-hop open-domain questions. Our retriever model trai… ▽ More

    Submitted 14 February, 2020; v1 submitted 24 November, 2019; originally announced November 2019.

    Comments: Published as a conference paper at ICLR 2020. Code is available at https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths

  44. arXiv:1911.03588  [pdf, other

    cs.CL cs.LG

    MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models

    Authors: Linqing Liu, Huan Wang, Jimmy Lin, Richard Socher, Caiming Xiong

    Abstract: Pretrained language models have led to significant performance gains in many NLP tasks. However, the intensive computing resources to train such models remain an issue. Knowledge distillation alleviates this problem by learning a light-weight student model. So far the distillation approaches are all task-specific. In this paper, we explore knowledge distillation under the multi-task learning setti… ▽ More

    Submitted 30 April, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

  45. arXiv:1911.03429  [pdf, other

    cs.CL cs.AI cs.LG

    ERASER: A Benchmark to Evaluate Rationalized NLP Models

    Authors: Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, Byron C. Wallace

    Abstract: State-of-the-art models in NLP are now predominantly based on deep neural networks that are opaque in terms of how they come to make predictions. This limitation has increased interest in designing more interpretable deep models for NLP that reveal the `reasoning' behind model outputs. But work in this direction has been conducted on different datasets and tasks with correspondingly unique aims an… ▽ More

    Submitted 24 April, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

    Comments: Accepted as a long paper to ACL2020 Website and leaderboard available at http://www.eraserbenchmark.com/ Code available at https://github.com/jayded/eraserbenchmark

  46. arXiv:1911.01417  [pdf, other

    cs.AI

    Kee** Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

    Authors: Alexander Trott, Stephan Zheng, Caiming Xiong, Richard Socher

    Abstract: While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward sha** often fails, as it renders learning vulnerable to local optima. We introduce a simple and effective model-free method to lea… ▽ More

    Submitted 4 November, 2019; originally announced November 2019.

    Comments: NeurIPS 2019

  47. arXiv:1910.13008  [pdf, other

    cs.CL

    Sketch-Fill-A-R: A Persona-Grounded Chit-Chat Generation Framework

    Authors: Michael Shum, Stephan Zheng, Wojciech Kryściński, Caiming Xiong, Richard Socher

    Abstract: Human-like chit-chat conversation requires agents to generate responses that are fluent, engaging and consistent. We propose Sketch-Fill-A-R, a framework that uses a persona-memory to generate chit-chat responses in three phases. First, it generates dynamic sketch responses with open slots. Second, it generates candidate responses by filling slots with parts of its stored persona traits. Lastly, i… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: 10 pages, 9 tables, 4 figures

  48. arXiv:1910.12840  [pdf, other

    cs.CL

    Evaluating the Factual Consistency of Abstractive Text Summarization

    Authors: Wojciech Kryściński, Bryan McCann, Caiming Xiong, Richard Socher

    Abstract: Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents. We propose a weakly-supervised, model-based approach for verifying factual consistency and identifying conflicts between source documents and a generated summary. Training data is generated by applying a series of rule-based transformations to the sente… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: 11 pages, 7 tables, 1 algorithm

  49. arXiv:1910.10245  [pdf, other

    stat.ML cs.LG

    Global Capacity Measures for Deep ReLU Networks via Path Sampling

    Authors: Ryan Theisen, Jason M. Klusowski, Huan Wang, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

    Abstract: Classical results on the statistical complexity of linear models have commonly identified the norm of the weights $\|w\|$ as a fundamental capacity measure. Generalizations of this measure to the setting of deep networks have been varied, though a frequently identified quantity is the product of weight norms of each layer. In this work, we show that for a large class of networks possessing a posit… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

  50. arXiv:1910.03544  [pdf, other

    cs.CL cs.AI

    Find or Classify? Dual Strategy for Slot-Value Predictions on Multi-Domain Dialog State Tracking

    Authors: Jian-Guo Zhang, Kazuma Hashimoto, Chien-Sheng Wu, Yao Wan, Philip S. Yu, Richard Socher, Caiming Xiong

    Abstract: Dialog state tracking (DST) is a core component in task-oriented dialog systems. Existing approaches for DST mainly fall into one of two categories, namely, ontology-based and ontology-free methods. An ontology-based method selects a value from a candidate-value list for each target slot, while an ontology-free method extracts spans from dialog contexts. Recent work introduced a BERT-based model t… ▽ More

    Submitted 28 October, 2020; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: 14 pages, accepted at the 9th Joint Conference on Lexical and Computational Semantics (*SEM 2020). This version fixes small errors