Skip to main content

Showing 51–100 of 152 results for author: Bowman, R

.
  1. arXiv:2110.08300  [pdf, other

    cs.CL cs.AI

    The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail

    Authors: Samuel R. Bowman

    Abstract: Researchers in NLP often frame and discuss research results in ways that serve to deemphasize the field's successes, often in response to the field's widespread hype. Though well-meaning, this has yielded many misleading or false claims about the limits of our best technology. This is a problem, and it may be more serious than it looks: It harms our credibility in ways that can make it harder to m… ▽ More

    Submitted 10 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Proceedings of ACL 2022

  2. arXiv:2110.08193  [pdf, other

    cs.CL

    BBQ: A Hand-Built Bias Benchmark for Question Answering

    Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, Samuel R. Bowman

    Abstract: It is well documented that NLP models learn social biases, but little work has been done on how these biases manifest in model outputs for applied tasks like question answering (QA). We introduce the Bias Benchmark for QA (BBQ), a dataset of question sets constructed by the authors that highlight attested social biases against people belonging to protected classes along nine social dimensions rele… ▽ More

    Submitted 15 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Accepted to ACL 2022 Findings. 20 pages, 10 figures

  3. arXiv:2109.08406  [pdf, other

    cs.CL

    Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers

    Authors: Jason Phang, Haokun Liu, Samuel R. Bowman

    Abstract: Despite the success of fine-tuning pretrained language encoders like BERT for downstream natural language understanding (NLU) tasks, it is still poorly understood how neural networks change after fine-tuning. In this work, we use centered kernel alignment (CKA), a method for comparing learned representations, to measure the similarity of representations in task-tuned models across layers. In exper… ▽ More

    Submitted 20 September, 2021; v1 submitted 17 September, 2021; originally announced September 2021.

    Comments: BlackboxNLP 2021

  4. arXiv:2109.06987  [pdf, other

    cs.CL

    NOPE: A Corpus of Naturally-Occurring Presuppositions in English

    Authors: Alicia Parrish, Sebastian Schuster, Alex Warstadt, Omar Agha, Soo-Hwan Lee, Zhuoye Zhao, Samuel R. Bowman, Tal Linzen

    Abstract: Understanding language requires gras** not only the overtly stated content, but also making inferences about things that were left unsaid. These inferences include presuppositions, a phenomenon by which a listener learns about new information through reasoning about what a speaker takes as given. Presuppositions require complex understanding of the lexical and syntactic properties that trigger t… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: CoNLL 2021. Data and code available at https://github.com/nyu-mll/nope

  5. arXiv:2109.06842  [pdf, other

    physics.ins-det

    Fast, high precision autofocus on a motorised microscope: automating blood sample imaging on the OpenFlexure Microscope

    Authors: Joe Knapper, Joel T. Collins, Julian Stirling, Samuel McDermott, William Wadsworth, Richard Bowman

    Abstract: The OpenFlexure Microscope is a 3D printed, low-cost microscope capable of automated image acquisition through the use of a motorised translation stage and a Raspberry Pi imaging system. This automation has applications in research and healthcare, including in supporting the diagnosis of malaria in low resource settings. The plasmodium parasites which cause malaria require high magnification imagi… ▽ More

    Submitted 15 September, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

  6. arXiv:2106.00840  [pdf, other

    cs.CL

    Comparing Test Sets with Item Response Theory

    Authors: Clara Vania, Phu Mon Htut, William Huang, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman

    Abstract: Recent years have seen numerous NLP datasets introduced to evaluate the performance of fine-tuned models on natural language understanding tasks. Recent results from large pretrained models, though, show that many of these datasets are largely saturated and unlikely to be able to detect further progress. What kind of datasets are still effective at discriminating among strong models, and what kind… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: ACL 2021

  7. arXiv:2106.00794  [pdf, other

    cs.CL cs.AI cs.HC

    What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?

    Authors: Nikita Nangia, Saku Sugawara, Harsh Trivedi, Alex Warstadt, Clara Vania, Samuel R. Bowman

    Abstract: Crowdsourcing is widely used to create data for common natural language understanding tasks. Despite the importance of these datasets for measuring and refining model understanding of language, there has been little focus on the crowdsourcing methods used for collecting the datasets. In this paper, we compare the efficacy of interventions that have been proposed in prior work as ways of improving… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: ACL 2021

  8. arXiv:2104.07179  [pdf, other

    cs.CL

    Does Putting a Linguist in the Loop Improve NLU Data Collection?

    Authors: Alicia Parrish, William Huang, Omar Agha, Soo-Hwan Lee, Nikita Nangia, Alex Warstadt, Karmanya Aggarwal, Emily Allaway, Tal Linzen, Samuel R. Bowman

    Abstract: Many crowdsourced NLP datasets contain systematic gaps and biases that are identified only after data collection is complete. Identifying these issues from early data samples during crowdsourcing should make mitigation more efficient, especially when done iteratively. We take natural language inference as a test case and ask whether it is beneficial to put a linguist `in the loop' during data coll… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: 14 pages, 10 figures

  9. arXiv:2104.02145  [pdf, other

    cs.CL

    What Will it Take to Fix Benchmarking in Natural Language Understanding?

    Authors: Samuel R. Bowman, George E. Dahl

    Abstract: Evaluation for many natural language understanding (NLU) tasks is broken: Unreliable and biased systems score so highly on standard benchmarks that there is little room for researchers who develop better systems to demonstrate their improvements. The recent trend to abandon IID benchmarks in favor of adversarially-constructed, out-of-distribution test sets ensures that current models will perform… ▽ More

    Submitted 15 October, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

    Comments: Proceedings of NAACL 2020. This revision adds a missing acknowledgment

  10. arXiv:2103.12888  [pdf

    physics.app-ph cond-mat.mtrl-sci

    Searching for refractory plasmonic materials: the structural and optical properties of Au$_{3}$Zr intermetallic thin films

    Authors: Hugh Littlehailes, William R. Hendren, Stacey Drakeley, Robert M. Bowman, Fumin Huang

    Abstract: Optical properties of refractory intermetallic thin films of Au$_{3}$Zr were experimentally investigated for the first time, which show distinctive plasmonic properties in the visible and near infrared region. The films were fabricated through DC magnetron sputtering at various deposition temperature ranging from room temperature to 427$^{o}$C and annealed at different vacuum levels. Both the stru… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

    Comments: 12 pages, plus 3 pages in attached supplementary information. 7 figures in main text, plus 1 table and 1 figure in attached supplementary information

  11. arXiv:2101.00933  [pdf, other

    physics.ins-det

    Modern Microscopy with the Web of Things: The OpenFlexure Microscope Software Stack

    Authors: Joel T. Collins, Joe Knapper, Julian Stirling, Samuel McDermott, Richard Bowman

    Abstract: Automated and computerised control of scientific instrumentation is almost ubiquitous in the modern laboratory. Most instrumentation is controlled over decades old communication busses or is accessed via proprietary system libraries. This limits which languages and operating systems can be used to control instruments, and poses a significant problem when interfacing multiple instruments into the s… ▽ More

    Submitted 28 September, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

  12. arXiv:2012.00624  [pdf

    cond-mat.mtrl-sci physics.app-ph

    Relaxed current matching requirements in highly luminescent perovskite tandem solar cells and their fundamental efficiency limits

    Authors: Alan R. Bowman, Felix Lang, Yu-Hsien Chiang, Alberto Jiménez-Solano, Kyle Frohna, Giles E. Eperon, Edoardo Ruggeri, Mojtaba Abdi-Jalebi, Miguel Anaya, Bettina V. Lotsch, Samuel D. Stranks

    Abstract: Here we use time-resolved and steady-state optical spectroscopy on state-of-the-art low- and high-bandgap perovskite films for tandems to quantify intrinsic recombination rates and absorption coefficients. We apply these data to calculate the limiting efficiency of perovskite-silicon and all-perovskite two-terminal tandems employing currently available bandgap materials as 42.0 % and 40.8 % respec… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: 20 pages, 5 figures

  13. arXiv:2011.04946  [pdf, other

    cs.CL

    When Do You Need Billions of Words of Pretraining Data?

    Authors: Yian Zhang, Alex Warstadt, Haau-Sing Li, Samuel R. Bowman

    Abstract: NLP is currently dominated by general-purpose pretrained language models like RoBERTa, which achieve strong performance on NLU tasks through pretraining on billions of words. But what exact knowledge or skills do Transformer LMs learn from large-scale pretraining that they cannot learn from less data? We adopt four probing methods---classifier probing, information-theoretic probing, unsupervised r… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: 10 pages, 6 figures

  14. arXiv:2010.06122  [pdf, other

    cs.CL

    Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options

    Authors: Clara Vania, Ruijie Chen, Samuel R. Bowman

    Abstract: Large-scale natural language inference (NLI) datasets such as SNLI or MNLI have been created by asking crowdworkers to read a premise and write three new hypotheses, one for each possible semantic relationships (entailment, contradiction, and neutral). While this protocol has been used to create useful benchmark data, it remains unclear whether the writing-based annotation protocol is optimal for… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: AACL 2020

  15. arXiv:2010.05358  [pdf, other

    cs.CL

    Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

    Authors: Alex Warstadt, Yian Zhang, Haau-Sing Li, Haokun Liu, Samuel R. Bowman

    Abstract: One reason pretraining on self-supervised linguistic tasks is effective is that it teaches models features that are helpful for language understanding. However, we want pretrained models to learn not only to represent linguistic features, but also to use those features preferentially during fine-turning. With this goal in mind, we introduce a new English-language diagnostic set called MSGS (the Mi… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

    Comments: accepted at EMNLP 2020

  16. arXiv:2010.04762  [pdf, other

    cs.CL

    Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data

    Authors: William Huang, Haokun Liu, Samuel R. Bowman

    Abstract: A growing body of work shows that models exploit annotation artifacts to achieve state-of-the-art performance on standard crowdsourced benchmarks---datasets collected from crowdworkers to create an evaluation task---while still failing on out-of-domain examples for the same task. Recent work has explored the use of counterfactually-augmented data---data built by minimally editing a set of seed exa… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

  17. arXiv:2010.04043  [pdf, other

    cs.CL

    Precise Task Formalization Matters in Winograd Schema Evaluations

    Authors: Haokun Liu, William Huang, Dhara A. Mungra, Samuel R. Bowman

    Abstract: Performance on the Winograd Schema Challenge (WSC), a respected English commonsense reasoning benchmark, recently rocketed from chance accuracy to 89% on the SuperGLUE leaderboard, with relatively little corroborating evidence of a correspondingly large improvement in reasoning ability. We hypothesize that much of this improvement comes from recent changes in task formalization---the combination o… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: Accepted to the EMNLP 2020 conference

  18. arXiv:2010.00133  [pdf, other

    cs.CL cs.AI

    CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

    Authors: Nikita Nangia, Clara Vania, Rasika Bhalerao, Samuel R. Bowman

    Abstract: Pretrained language models, especially masked language models (MLMs) have seen success across many NLP tasks. However, there is ample evidence that they use the cultural biases that are undoubtedly present in the corpora they are trained on, implicitly creating harm with biased representations. To measure some forms of social bias in language models against protected demographic groups in the US,… ▽ More

    Submitted 30 September, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  19. arXiv:2007.06761  [pdf, other

    cs.CL

    Can neural networks acquire a structural bias from raw linguistic data?

    Authors: Alex Warstadt, Samuel R. Bowman

    Abstract: We evaluate whether BERT, a widely used neural network for sentence processing, acquires an inductive bias towards forming structural generalizations through pretraining on raw data. We conduct four experiments testing its preference for structural vs. linear generalizations in different structure-dependent phenomena. We find that BERT makes a structural generalization in 3 out of 4 empirical doma… ▽ More

    Submitted 23 September, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: To appear in Proceedings of 42nd Annual Meeting of the Cognitive Science Society

  20. arXiv:2005.13455  [pdf, other

    cs.CL

    Self-Training for Unsupervised Parsing with PRPN

    Authors: Anhad Mohananey, Katharina Kann, Samuel R. Bowman

    Abstract: Neural unsupervised parsing (UP) models learn to parse without access to syntactic annotations, while being optimized for another task like language modeling. In this work, we propose self-training for neural UP models: we leverage aggregated annotations predicted by copies of our model as supervision for future copies. To be able to use our model's predictions during training, we extend a recent… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: Accepted for publication at the 16th International Conference on Parsing Technologies (IWPT), 2020

  21. arXiv:2005.13013  [pdf, other

    cs.CL

    English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too

    Authors: Jason Phang, Iacer Calixto, Phu Mon Htut, Yada Pruksachatkun, Haokun Liu, Clara Vania, Katharina Kann, Samuel R. Bowman

    Abstract: Intermediate-task training---fine-tuning a pretrained model on an intermediate task before fine-tuning again on the target task---often improves model performance substantially on language understanding tasks in monolingual English settings. We investigate whether English intermediate-task training is still helpful on non-English target tasks. Using nine intermediate language-understanding tasks,… ▽ More

    Submitted 30 September, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

  22. arXiv:2005.03410  [pdf

    cond-mat.mtrl-sci cond-mat.other

    Materials for hydrogen-based energy storage: Past, recent progress and future outlook

    Authors: Volodymyr A. Yartys, Marcello Baricco, Jose Bellosta von Colbe, Didier Blanchard, Robert C. Bowman Jr., Darren P. Broom, Craig E. Buckley, Fei Chang, ** Chen, Young Whan Cho, Jean-Claude Crivello, Fermin Cuevas, William I. F. David, Petra E. de Jongh, Roman V. Denys, Martin Dornheim, Michael Felderhoff, Yaroslav Filinchuk, George E. Froudakis, David M. Grant, Bjørn C. Hauback, Ladislav Havela, Teng He, Michael Hirscher, Terry D. Humphries , et al. (23 additional authors not shown)

    Abstract: Magnesium hydride owns the largest share of publications on solid materials for hydrogen storage. The Magnesium group of international experts contributing to IEA Task 32 Hydrogen Based Energy Storage recently published two review papers presenting the activities of the group focused on magnesium hydride based materials and on Mg based compounds for hydrogen and energy storage. This review article… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Journal ref: Journal of Alloys and Compounds Volume 827, 25 June 2020, 153548

  23. arXiv:2005.00628  [pdf, other

    cs.CL

    Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?

    Authors: Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman

    Abstract: While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large… ▽ More

    Submitted 9 May, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  24. arXiv:2004.13304  [pdf, other

    cs.CL

    Learning to Learn Morphological Inflection for Resource-Poor Languages

    Authors: Katharina Kann, Samuel R. Bowman, Kyunghyun Cho

    Abstract: We propose to cast the task of morphological inflection - map** a lemma to an indicated inflected form - for resource-poor languages as a meta-learning problem. Treating each language as a separate task, we use data from high-resource source languages to learn a set of model parameters that can serve as a strong initialization point for fine-tuning on a resource-poor target language. Experiments… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Comments: AAAI 2020

  25. arXiv:2004.11997  [pdf, other

    cs.CL

    New Protocols and Negative Results for Textual Entailment Data Collection

    Authors: Samuel R. Bowman, Jennimaria Palomaki, Livio Baldini Soares, Emily Pitler

    Abstract: Natural language inference (NLI) data has proven useful in benchmarking and, especially, as pretraining data for tasks requiring language understanding. However, the crowdsourcing protocol that was used to collect this data has known issues and was not explicitly optimized for either of these purposes, so it is likely far from ideal. We propose four alternative protocols, each aimed at improving e… ▽ More

    Submitted 29 September, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

    Comments: To appear at EMNLP 2020

  26. arXiv:2003.02249  [pdf, other

    cs.CL

    jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models

    Authors: Yada Pruksachatkun, Phil Yeres, Haokun Liu, Jason Phang, Phu Mon Htut, Alex Wang, Ian Tenney, Samuel R. Bowman

    Abstract: We introduce jiant, an open source toolkit for conducting multitask and transfer learning experiments on English NLU tasks. jiant enables modular and configuration-driven experimentation with state-of-the-art models and implements a broad set of tasks for probing, transfer learning, and multitask training experiments. jiant implements over 50 NLU tasks, including all GLUE and SuperGLUE benchmark t… ▽ More

    Submitted 13 May, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

  27. arXiv:1912.00582  [pdf, other

    cs.CL

    BLiMP: The Benchmark of Linguistic Minimal Pairs for English

    Authors: Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Mohananey, Wei Peng, Sheng-Fu Wang, Samuel R. Bowman

    Abstract: We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and… ▽ More

    Submitted 14 February, 2023; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: 2020: Published in TACL Feb 2023: Corrected erroneous GPT-2 results

  28. arXiv:1911.13295  [pdf, other

    physics.ins-det eess.IV

    Flat-field and colour correction for the Raspberry Pi camera module

    Authors: Richard Bowman, Boyko Vodenicharski, Joel Collins, Julian Stirling

    Abstract: The Raspberry Pi camera module is widely used in open source hardware projects as a low cost camera sensor. However, when the stock lens is removed and replaced with other custom optics the sensor will return a non-uniform background and colour response which hampers the use of this excellent and popular image sensor. This effect is found to be due to the sensor's optical design as well as due to… ▽ More

    Submitted 29 November, 2019; originally announced November 2019.

  29. arXiv:1911.12246  [pdf, other

    cs.CL

    Do Attention Heads in BERT Track Syntactic Dependencies?

    Authors: Phu Mon Htut, Jason Phang, Shikha Bordia, Samuel R. Bowman

    Abstract: We investigate the extent to which individual attention heads in pretrained transformer language models, such as BERT and RoBERTa, implicitly capture syntactic dependency relations. We employ two methods---taking the maximum attention weight and computing the maximum spanning tree---to extract implicit dependency relations from the attention weights of each layer/head, and compare them to the grou… ▽ More

    Submitted 27 November, 2019; originally announced November 2019.

  30. arXiv:1911.09986  [pdf, other

    physics.ins-det

    The OpenFlexure Block Stage: Sub-100 nm fibre alignment with a monolithic plastic flexure stage

    Authors: Qingxin Meng, Kerrianne Harrington, Julian Stirling, Richard Bowman

    Abstract: As 3D printers become more widely available, researchers are able to rapidly produce components that may have previously taken weeks to have machined. The resulting plastic components, having high surface roughness, are often not suitable for high-precision optomechanics. However, by playing to the strengths of 3D printing---namely the ability to print complex internal geometries---it is possible… ▽ More

    Submitted 22 November, 2019; originally announced November 2019.

  31. arXiv:1909.10056  [pdf, other

    cs.CL

    Inducing Constituency Trees through Neural Machine Translation

    Authors: Phu Mon Htut, Kyunghyun Cho, Samuel R. Bowman

    Abstract: Latent tree learning(LTL) methods learn to parse sentences using only indirect supervision from a downstream task. Recent advances in latent tree learning have made it possible to recover moderately high quality tree structures by training with language modeling or auto-encoding objectives. In this work, we explore the hypothesis that decoding in machine translation, as a conditional language mode… ▽ More

    Submitted 22 September, 2019; originally announced September 2019.

  32. arXiv:1909.02597  [pdf, other

    cs.CL

    Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs

    Authors: Alex Warstadt, Yu Cao, Ioana Grosu, Wei Peng, Hagen Blix, Yining Nie, Anna Alsop, Shikha Bordia, Haokun Liu, Alicia Parrish, Sheng-Fu Wang, Jason Phang, Anhad Mohananey, Phu Mon Htut, Paloma Jeretič, Samuel R. Bowman

    Abstract: Though state-of-the-art sentence representation models can perform tasks requiring significant knowledge of grammar, it is an open question how best to evaluate their grammatical knowledge. We explore five experimental methods inspired by prior work evaluating pretrained sentence representation models. We use a single linguistic phenomenon, negative polarity item (NPI) licensing in English, as a c… ▽ More

    Submitted 19 September, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: Accepted to EMNLP 2019; Added link to code+dataset

  33. arXiv:1909.01522  [pdf, other

    cs.CL

    Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set

    Authors: Katharina Kann, Kyunghyun Cho, Samuel R. Bowman

    Abstract: Development sets are impractical to obtain for real low-resource languages, since using all available data for training is often more effective. However, development sets are widely used in research papers that purport to deal with low-resource natural language processing (NLP). Here, we aim to answer the following questions: Does using a development set for early stop** in the low-resource sett… ▽ More

    Submitted 14 September, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019

  34. arXiv:1907.04944  [pdf, other

    cs.CL cs.LG

    Can Unconditional Language Models Recover Arbitrary Sentences?

    Authors: Nishant Subramani, Samuel R. Bowman, Kyunghyun Cho

    Abstract: Neural network-based generative language models like ELMo and BERT can work effectively as general purpose sentence encoders in text classification without further fine-tuning. Is it possible to adapt them in a similar way for use as general-purpose decoders? For this to be possible, it would need to be the case that for any target sentence of interest, there is some continuous representation that… ▽ More

    Submitted 9 January, 2020; v1 submitted 10 July, 2019; originally announced July 2019.

    Comments: NeurIPS 2019

  35. arXiv:1905.10425  [pdf, other

    cs.CL cs.AI

    Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark

    Authors: Nikita Nangia, Samuel R. Bowman

    Abstract: The GLUE benchmark (Wang et al., 2019b) is a suite of language understanding tasks which has seen dramatic progress in the past year, with average performance moving from 70.0 at launch to 83.9, state of the art at the time of writing (May 24, 2019). Here, we measure human performance on the benchmark, in order to learn whether significant headroom remains for further progress. We provide a conser… ▽ More

    Submitted 1 June, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

    Journal ref: ACL 2019

  36. arXiv:1905.06316  [pdf, other

    cs.CL

    What do you learn from context? Probing for sentence structure in contextualized word representations

    Authors: Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, Ellie Pavlick

    Abstract: Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks. Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. We probe… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Comments: ICLR 2019 camera-ready version, 17 pages including appendices

  37. arXiv:1905.00537  [pdf, other

    cs.CL cs.AI

    SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

    Authors: Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

    Abstract: In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert h… ▽ More

    Submitted 12 February, 2020; v1 submitted 1 May, 2019; originally announced May 2019.

    Comments: NeurIPS 2019, super.gluebenchmark.com updating acknowledegments

  38. arXiv:1904.11544  [pdf, other

    cs.CL

    Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

    Authors: Najoung Kim, Roma Patel, Adam Poliak, Alex Wang, Patrick Xia, R. Thomas McCoy, Ian Tenney, Alexis Ross, Tal Linzen, Benjamin Van Durme, Samuel R. Bowman, Ellie Pavlick

    Abstract: We introduce a set of nine challenge tasks that test for the understanding of function words. These tasks are created by structurally mutating sentences from existing datasets to target the comprehension of specific types of function words (e.g., prepositions, wh-words). Using these probing tasks, we explore the effects of various pretraining objectives for sentence encoders (e.g., language modeli… ▽ More

    Submitted 7 August, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

    Comments: Accepted to *SEM 2019 (revised submission). Corresponding authors: Najoung Kim ([email protected]), Ellie Pavlick ([email protected])

  39. arXiv:1904.03035  [pdf, other

    cs.CL

    Identifying and Reducing Gender Bias in Word-Level Language Models

    Authors: Shikha Bordia, Samuel R. Bowman

    Abstract: Many text corpora exhibit socially problematic biases, which can be propagated or amplified in the models trained on such data. For example, doctor cooccurs more frequently with male pronouns than female pronouns. In this study we (i) propose a metric to measure gender bias; (ii) measure bias in a text corpus and the text generated from a recurrent neural network language model trained on the text… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

    Comments: 12 pages with 8 tables and 1 figure; Published at NAACL SRW 2019

  40. arXiv:1903.10561  [pdf, other

    cs.CL cs.CY

    On Measuring Social Biases in Sentence Encoders

    Authors: Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, Rachel Rudinger

    Abstract: The Word Embedding Association Test shows that GloVe and word2vec word embeddings exhibit human-like implicit biases based on gender, race, and other social constructs (Caliskan et al., 2017). Meanwhile, research on learning reusable text representations has begun to explore sentence-level texts, with some sentence encoders seeing enthusiastic adoption. Accordingly, we extend the Word Embedding As… ▽ More

    Submitted 25 March, 2019; originally announced March 2019.

    Comments: NAACL 2019

  41. arXiv:1901.03438  [pdf, other

    cs.CL

    Linguistic Analysis of Pretrained Sentence Encoders with Acceptability Judgments

    Authors: Alex Warstadt, Samuel R. Bowman

    Abstract: Recent work on evaluating grammatical knowledge in pretrained sentence encoders gives a fine-grained view of a small number of phenomena. We introduce a new analysis dataset that also has broad coverage of linguistic phenomena. We annotate the development set of the Corpus of Linguistic Acceptability (CoLA; Warstadt et al., 2018) for the presence of 13 classes of syntactic phenomena including vari… ▽ More

    Submitted 21 May, 2020; v1 submitted 10 January, 2019; originally announced January 2019.

  42. arXiv:1812.10860  [pdf, other

    cs.CL

    Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling

    Authors: Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning **, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman

    Abstract: Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling. We conduct the first large-scale systematic study of candidate pretraining tasks, comparing 19 different tasks both as alternatives and complements to language modeling. Our prim… ▽ More

    Submitted 22 July, 2019; v1 submitted 27 December, 2018; originally announced December 2018.

    Comments: ACL 2019. This paper supercedes "Looking for ELMo's Friends: Sentence-Level Pretraining Beyond Language Modeling", an earlier version of this work by the same authors

  43. arXiv:1812.04943  [pdf, other

    eess.IV physics.optics

    Long-range depth imaging using a single-photon detector array and non-local data fusion

    Authors: Susan Chan, Abderrahim Halimi, Feng Zhu, Istvan Gyongy, Robert K. Henderson, Richard Bowman, Steve McLaughlin, Gerald S. Buller, Jonathan Leach

    Abstract: The ability to measure and record high-resolution depth images at long stand-off distances is important for a wide range of applications, including connected and automotive vehicles, defense and security, and agriculture and mining. In LIDAR (light detection and ranging) applications, single-photon sensitive detection is an emerging approach, offering high sensitivity to light and picosecond tempo… ▽ More

    Submitted 11 December, 2018; originally announced December 2018.

  44. arXiv:1811.10773  [pdf, other

    cs.CL

    Verb Argument Structure Alternations in Word and Sentence Embeddings

    Authors: Katharina Kann, Alex Warstadt, Adina Williams, Samuel R. Bowman

    Abstract: Verbs occur in different syntactic environments, or frames. We investigate whether artificial neural networks encode grammatical distinctions necessary for inferring the idiosyncratic frame-selectional properties of verbs. We introduce five datasets, collectively called FAVA, containing in aggregate nearly 10k sentences labeled for grammatical acceptability, illustrating different verbal argument… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

    Comments: Accepted to SCiL 2019

  45. arXiv:1811.01088  [pdf, other

    cs.CL

    Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

    Authors: Jason Phang, Thibault Févry, Samuel R. Bowman

    Abstract: Pretraining sentence encoders with language modeling and related unsupervised tasks has recently been shown to be very effective for language understanding tasks. By supplementing language model-style pretraining with further training on data-rich supervised tasks, such as natural language inference, we obtain additional performance improvements on the GLUE benchmark. Applying supplementary traini… ▽ More

    Submitted 27 February, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

  46. arXiv:1809.10040  [pdf, other

    cs.CL

    Language Modeling Teaches You More Syntax than Translation Does: Lessons Learned Through Auxiliary Task Analysis

    Authors: Kelly W. Zhang, Samuel R. Bowman

    Abstract: Recent work using auxiliary prediction task classifiers to investigate the properties of LSTM representations has begun to shed light on why pretrained representations, like ELMo (Peters et al., 2018) and CoVe (McCann et al., 2017), are so beneficial for neural language understanding models. We still, though, do not yet have a clear understanding of how the choice of pretraining objective affects… ▽ More

    Submitted 7 January, 2019; v1 submitted 26 September, 2018; originally announced September 2018.

    Journal ref: Blackbox NLP Workshop, EMNLP 2018

  47. arXiv:1809.05053  [pdf, other

    cs.CL cs.AI cs.LG

    XNLI: Evaluating Cross-lingual Sentence Representations

    Authors: Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams, Samuel R. Bowman, Holger Schwenk, Veselin Stoyanov

    Abstract: State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models. These models are generally trained on data in a single language (usually English), and cannot be directly used beyond that language. Since collecting data in every language is not realistic, there has been a growing interest in cross-lingual language understanding (XLU)… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

    Comments: EMNLP 2018

  48. arXiv:1808.10000  [pdf, ps, other

    cs.CL

    Grammar Induction with Neural Language Models: An Unusual Replication

    Authors: Phu Mon Htut, Kyunghyun Cho, Samuel R. Bowman

    Abstract: A substantial thread of recent work on latent tree learning has attempted to develop neural network models with parse-valued latent variables and train them on non-parsing tasks, in the hope of having them discover interpretable tree structure. In a recent paper, Shen et al. (2018) introduce such a model and report near-state-of-the-art results on the target task of language modeling, and the firs… ▽ More

    Submitted 29 August, 2018; originally announced August 2018.

    Comments: To appear in Proceedings of EMNLP 2018 (short paper)

  49. arXiv:1805.12471  [pdf, other

    cs.CL

    Neural Network Acceptability Judgments

    Authors: Alex Warstadt, Amanpreet Singh, Samuel R. Bowman

    Abstract: This paper investigates the ability of artificial neural networks to judge the grammatical acceptability of a sentence, with the goal of testing their linguistic competence. We introduce the Corpus of Linguistic Acceptability (CoLA), a set of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. As baselines, we train several recurrent neural netwo… ▽ More

    Submitted 1 October, 2019; v1 submitted 31 May, 2018; originally announced May 2018.

  50. arXiv:1805.04616  [pdf

    q-bio.BM

    Choice of adaptive sampling strategy impacts state discovery, transition probabilities, and the apparent mechanism of conformational changes

    Authors: Maxwell I. Zimmerman, Justin R. Porter, Xianqiang Sun, Roseane R. Silva, Gregory R. Bowman

    Abstract: Interest in equilibrium-based sampling methods has grown with recent advances in computational hardware and Markov state modeling (MSM) methods, yet outstanding questions remain that hinder widespread adoption. Namely, how do sampling strategies explore conformational space and how might this influence predictions? Here, we seek to answer these questions for four commonly used sampling methods: 1)… ▽ More

    Submitted 11 May, 2018; originally announced May 2018.