Search | arXiv e-print repository

Discrete Latent Structure in Neural Networks

Authors: Vlad Niculae, Caio F. Corro, Nikita Nangia, Tsvetomila Mihaylova, André F. T. Martins

Abstract: Many types of data from fields including natural language processing, computer vision, and bioinformatics, are well represented by discrete, compositional structures such as trees, sequences, or matchings. Latent structure models are a powerful tool for learning to extract such representations, offering a way to incorporate structural bias, discover insight about the data, and interpret decisions.… ▽ More Many types of data from fields including natural language processing, computer vision, and bioinformatics, are well represented by discrete, compositional structures such as trees, sequences, or matchings. Latent structure models are a powerful tool for learning to extract such representations, offering a way to incorporate structural bias, discover insight about the data, and interpret decisions. However, effective training is challenging, as neural networks are typically designed for continuous computation. This text explores three broad strategies for learning with discrete latent structure: continuous relaxation, surrogate gradients, and probabilistic estimation. Our presentation relies on consistent notations for a wide range of models. As such, we reveal many new connections between latent structure learning strategies, showing how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties. △ Less

Submitted 18 January, 2023; originally announced January 2023.

ACM Class: I.2.6

arXiv:2210.10860 [pdf, other]

Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions

Authors: Alicia Parrish, Harsh Trivedi, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Amanpreet Singh Saimbhi, Samuel R. Bowman

Abstract: The use of language-model-based question-answering systems to aid humans in completing difficult tasks is limited, in part, by the unreliability of the text these systems generate. Using hard multiple-choice reading comprehension questions as a testbed, we assess whether presenting humans with arguments for two competing answer options, where one is correct and the other is incorrect, allows human… ▽ More The use of language-model-based question-answering systems to aid humans in completing difficult tasks is limited, in part, by the unreliability of the text these systems generate. Using hard multiple-choice reading comprehension questions as a testbed, we assess whether presenting humans with arguments for two competing answer options, where one is correct and the other is incorrect, allows human judges to perform more accurately, even when one of the arguments is unreliable and deceptive. If this is helpful, we may be able to increase our justified trust in language-model-based systems by asking them to produce these arguments where needed. Previous research has shown that just a single turn of arguments in this format is not helpful to humans. However, as debate settings are characterized by a back-and-forth dialogue, we follow up on previous results to test whether adding a second round of counter-arguments is helpful to humans. We find that, regardless of whether they have access to arguments or not, humans perform similarly on our task. These findings suggest that, in the case of answering reading comprehension questions, debate is not a helpful format. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 12 pages, 6 figures, 7 tables

arXiv:2208.12852 [pdf, other]

What Do NLP Researchers Believe? Results of the NLP Community Metasurvey

Authors: Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman

Abstract: We present the results of the NLP Community Metasurvey. Run from May to June 2022, the survey elicited opinions on controversial issues, including industry influence in the field, concerns about AGI, and ethics. Our results put concrete numbers to several controversies: For example, respondents are split almost exactly in half on questions about the importance of artificial general intelligence, w… ▽ More We present the results of the NLP Community Metasurvey. Run from May to June 2022, the survey elicited opinions on controversial issues, including industry influence in the field, concerns about AGI, and ethics. Our results put concrete numbers to several controversies: For example, respondents are split almost exactly in half on questions about the importance of artificial general intelligence, whether language models understand language, and the necessity of linguistic structure and inductive bias for solving NLP problems. In addition, the survey posed meta-questions, asking respondents to predict the distribution of survey responses. This allows us not only to gain insight on the spectrum of beliefs held by NLP researchers, but also to uncover false sociological beliefs where the community's predictions don't match reality. We find such mismatches on a wide range of issues. Among other results, the community greatly overestimates its own belief in the usefulness of benchmarks and the potential for scaling to solve real-world problems, while underestimating its own belief in the importance of linguistic structure, inductive bias, and interdisciplinary science. △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: 31 pages, 19 figures, 3 tables; more information at https://nlpsurvey.net

ACM Class: I.2.7

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2204.05212 [pdf, other]

Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions

Authors: Alicia Parrish, Harsh Trivedi, Ethan Perez, Angelica Chen, Nikita Nangia, Jason Phang, Samuel R. Bowman

Abstract: Current QA systems can generate reasonable-sounding yet false answers without explanation or evidence for the generated answer, which is especially problematic when humans cannot readily check the model's answers. This presents a challenge for building trust in machine learning systems. We take inspiration from real-world situations where difficult questions are answered by considering opposing si… ▽ More Current QA systems can generate reasonable-sounding yet false answers without explanation or evidence for the generated answer, which is especially problematic when humans cannot readily check the model's answers. This presents a challenge for building trust in machine learning systems. We take inspiration from real-world situations where difficult questions are answered by considering opposing sides (see Irving et al., 2018). For multiple-choice QA examples, we build a dataset of single arguments for both a correct and incorrect answer option in a debate-style set-up as an initial step in training models to produce explanations for two candidate answers. We use long contexts -- humans familiar with the context write convincing explanations for pre-selected correct and incorrect answers, and we test if those explanations allow humans who have not read the full context to more accurately determine the correct answer. We do not find that explanations in our set-up improve human accuracy, but a baseline condition shows that providing human-selected text snippets does improve accuracy. We use these findings to suggest ways of improving the debate set up for future data collection efforts. △ Less

Submitted 13 April, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted to the 2022 ACL Workshop on Learning with Natural Language Supervision. 12 pages total, 9 figures, 2 tables

arXiv:2203.06342 [pdf, other]

What Makes Reading Comprehension Questions Difficult?

Authors: Saku Sugawara, Nikita Nangia, Alex Warstadt, Samuel R. Bowman

Abstract: For a natural language understanding benchmark to be useful in research, it has to consist of examples that are diverse and difficult enough to discriminate among current and near-future state-of-the-art systems. However, we do not yet know how best to select text sources to collect a variety of challenging examples. In this study, we crowdsource multiple-choice reading comprehension questions for… ▽ More For a natural language understanding benchmark to be useful in research, it has to consist of examples that are diverse and difficult enough to discriminate among current and near-future state-of-the-art systems. However, we do not yet know how best to select text sources to collect a variety of challenging examples. In this study, we crowdsource multiple-choice reading comprehension questions for passages taken from seven qualitatively distinct sources, analyzing what attributes of passages contribute to the difficulty and question types of the collected examples. To our surprise, we find that passage source, length, and readability measures do not significantly affect question difficulty. Through our manual annotation of seven reasoning types, we observe several trends between passage sources and reasoning types, e.g., logical reasoning is more often required in questions written for technical passages. These results suggest that when creating a new benchmark dataset, selecting a diverse set of passages can help ensure a diverse range of question types, but that passage difficulty need not be a priority. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: ACL 2022

arXiv:2112.08608 [pdf, other]

QuALITY: Question Answering with Long Input Texts, Yes!

Authors: Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, Samuel R. Bowman

Abstract: To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5,000 tokens, much longer than typical current models can process. Unlike in prior work with passages, our questions are written and validated by contributors who have read the entire passage, rather than rely… ▽ More To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5,000 tokens, much longer than typical current models can process. Unlike in prior work with passages, our questions are written and validated by contributors who have read the entire passage, rather than relying on summaries or excerpts. In addition, only half of the questions are answerable by annotators working under tight time constraints, indicating that skimming and simple search are not enough to consistently perform well. Our baseline models perform poorly on this task (55.4%) and significantly lag behind human performance (93.5%). △ Less

Submitted 11 May, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

Comments: NAACL 2022

arXiv:2110.08193 [pdf, other]

BBQ: A Hand-Built Bias Benchmark for Question Answering

Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, Samuel R. Bowman

Abstract: It is well documented that NLP models learn social biases, but little work has been done on how these biases manifest in model outputs for applied tasks like question answering (QA). We introduce the Bias Benchmark for QA (BBQ), a dataset of question sets constructed by the authors that highlight attested social biases against people belonging to protected classes along nine social dimensions rele… ▽ More It is well documented that NLP models learn social biases, but little work has been done on how these biases manifest in model outputs for applied tasks like question answering (QA). We introduce the Bias Benchmark for QA (BBQ), a dataset of question sets constructed by the authors that highlight attested social biases against people belonging to protected classes along nine social dimensions relevant for U.S. English-speaking contexts. Our task evaluates model responses at two levels: (i) given an under-informative context, we test how strongly responses reflect social biases, and (ii) given an adequately informative context, we test whether the model's biases override a correct answer choice. We find that models often rely on stereotypes when the context is under-informative, meaning the model's outputs consistently reproduce harmful biases in this setting. Though models are more accurate when the context provides an informative answer, they still rely on stereotypes and average up to 3.4 percentage points higher accuracy when the correct answer aligns with a social bias than when it conflicts, with this difference widening to over 5 points on examples targeting gender for most models tested. △ Less

Submitted 15 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: Accepted to ACL 2022 Findings. 20 pages, 10 figures

arXiv:2106.00794 [pdf, other]

What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?

Authors: Nikita Nangia, Saku Sugawara, Harsh Trivedi, Alex Warstadt, Clara Vania, Samuel R. Bowman

Abstract: Crowdsourcing is widely used to create data for common natural language understanding tasks. Despite the importance of these datasets for measuring and refining model understanding of language, there has been little focus on the crowdsourcing methods used for collecting the datasets. In this paper, we compare the efficacy of interventions that have been proposed in prior work as ways of improving… ▽ More Crowdsourcing is widely used to create data for common natural language understanding tasks. Despite the importance of these datasets for measuring and refining model understanding of language, there has been little focus on the crowdsourcing methods used for collecting the datasets. In this paper, we compare the efficacy of interventions that have been proposed in prior work as ways of improving data quality. We use multiple-choice question answering as a testbed and run a randomized trial by assigning crowdworkers to write questions under one of four different data collection protocols. We find that asking workers to write explanations for their examples is an ineffective stand-alone strategy for boosting NLU example difficulty. However, we find that training crowdworkers, and then using an iterative process of collecting data, sending feedback, and qualifying workers based on expert judgments is an effective means of collecting challenging data. But using crowdsourced, instead of expert judgments, to qualify workers and send feedback does not prove to be effective. We observe that the data from the iterative protocol with expert assessments is more challenging by several measures. Notably, the human--model gap on the unanimous agreement portion of this data is, on average, twice as large as the gap for the baseline protocol data. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: ACL 2021

arXiv:2104.07179 [pdf, other]

Does Putting a Linguist in the Loop Improve NLU Data Collection?

Authors: Alicia Parrish, William Huang, Omar Agha, Soo-Hwan Lee, Nikita Nangia, Alex Warstadt, Karmanya Aggarwal, Emily Allaway, Tal Linzen, Samuel R. Bowman

Abstract: Many crowdsourced NLP datasets contain systematic gaps and biases that are identified only after data collection is complete. Identifying these issues from early data samples during crowdsourcing should make mitigation more efficient, especially when done iteratively. We take natural language inference as a test case and ask whether it is beneficial to put a linguist `in the loop' during data coll… ▽ More Many crowdsourced NLP datasets contain systematic gaps and biases that are identified only after data collection is complete. Identifying these issues from early data samples during crowdsourcing should make mitigation more efficient, especially when done iteratively. We take natural language inference as a test case and ask whether it is beneficial to put a linguist `in the loop' during data collection to dynamically identify and address gaps in the data by introducing novel constraints on the task. We directly compare three data collection protocols: (i) a baseline protocol, (ii) a linguist-in-the-loop intervention with iteratively-updated constraints on the task, and (iii) an extension of linguist-in-the-loop that provides direct interaction between linguists and crowdworkers via a chatroom. The datasets collected with linguist involvement are more reliably challenging than baseline, without loss of quality. But we see no evidence that using this data in training leads to better out-of-domain model performance, and the addition of a chat platform has no measurable effect on the resulting dataset. We suggest integrating expert analysis \textit{during} data collection so that the expert can dynamically address gaps and biases in the dataset. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: 14 pages, 10 figures

arXiv:2101.12458 [pdf, other]

doi 10.1016/j.jcp.2021.110163

Critique on "Volume penalization for inhomogeneous Neumann boundary conditions modeling scalar flux in complicated geometry"

Authors: Ramakrishnan Thirumalaisamy, Nishant Nangia, Amneet Pal Singh Bhalla

Abstract: In this letter, we provide counter-examples to demonstrate that it is possible to retain second-order accuracy using Sakurai et al.'s method, even when different flux boundary conditions are imposed on multiple interfaces that do not conform to the Cartesian grid. We consider both continuous and discontinuous indicator functions in our test problems. Both indicator functions yield a similar conver… ▽ More In this letter, we provide counter-examples to demonstrate that it is possible to retain second-order accuracy using Sakurai et al.'s method, even when different flux boundary conditions are imposed on multiple interfaces that do not conform to the Cartesian grid. We consider both continuous and discontinuous indicator functions in our test problems. Both indicator functions yield a similar convergence rate for the problems considered here. We also find that the order of accuracy results for some of the cases presented in Sakurai et al. are not reproducible. This is demonstrated by re-considering the same one- and two-dimensional Poisson problems solved in Sakurai et al. in this letter. The results shown in this letter demonstrate that the spatial order of accuracy of the flux-based VP approach of Sakurai et al. is between $\mathcal{O}$(1) and $\mathcal{O}$(2), and it depends on the underlying problem/model. The spatial order of accuracy cannot simply be deduced a priori based on the imposed flux values, shapes, or grid-conformity of the interfaces, as concluded in Sakurai et al. Further analysis is required to understand the spatial convergence rate of the flux-based VP method. △ Less

Submitted 29 January, 2021; originally announced January 2021.

arXiv:2010.00133 [pdf, other]

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

Authors: Nikita Nangia, Clara Vania, Rasika Bhalerao, Samuel R. Bowman

Abstract: Pretrained language models, especially masked language models (MLMs) have seen success across many NLP tasks. However, there is ample evidence that they use the cultural biases that are undoubtedly present in the corpora they are trained on, implicitly creating harm with biased representations. To measure some forms of social bias in language models against protected demographic groups in the US,… ▽ More Pretrained language models, especially masked language models (MLMs) have seen success across many NLP tasks. However, there is ample evidence that they use the cultural biases that are undoubtedly present in the corpora they are trained on, implicitly creating harm with biased representations. To measure some forms of social bias in language models against protected demographic groups in the US, we introduce the Crowdsourced Stereotype Pairs benchmark (CrowS-Pairs). CrowS-Pairs has 1508 examples that cover stereotypes dealing with nine types of bias, like race, religion, and age. In CrowS-Pairs a model is presented with two sentences: one that is more stereoty** and another that is less stereoty**. The data focuses on stereotypes about historically disadvantaged groups and contrasts them with advantaged groups. We find that all three of the widely-used MLMs we evaluate substantially favor sentences that express stereotypes in every category in CrowS-Pairs. As work on building less biased models advances, this dataset can be used as a benchmark to evaluate progress. △ Less

Submitted 30 September, 2020; originally announced October 2020.

Comments: EMNLP 2020

arXiv:2005.06108 [pdf, other]

doi 10.1016/j.oceaneng.2021.108879

The inertial sea wave energy converter (ISWEC) technology: device-physics, multiphase modeling and simulations

Authors: Kaustubh Khedkar, Nishant Nangia, Ramakrishnan Thirumalaisamy, Amneet Pal Singh Bhalla

Abstract: In this paper we investigate the dynamics of the inertial wave energy converter (ISWEC) device using fully-resolved computational fluid dynamics (CFD) simulations. Originally prototyped by Polytechnic University of Turin, the device consists of a floating, boat-shaped hull that is slack-moored to the sea bed. Internally, a gyroscopic power take off (PTO) unit converts the wave-induced pitch motion… ▽ More In this paper we investigate the dynamics of the inertial wave energy converter (ISWEC) device using fully-resolved computational fluid dynamics (CFD) simulations. Originally prototyped by Polytechnic University of Turin, the device consists of a floating, boat-shaped hull that is slack-moored to the sea bed. Internally, a gyroscopic power take off (PTO) unit converts the wave-induced pitch motion of the hull into electrical energy. The CFD model is based on the incompressible Navier-Stokes equations and utilizes the fictitious domain Brinkman penalization technique to couple the device physics and water wave dynamics. A numerical wave tank is used to emulate realistic sea operating conditions. A Froude scaling analysis is performed to enable two- and three-dimensional simulations for a scaled-down (1:20) ISWEC model. It is demonstrated that the scaled-down 2D model is sufficient to accurately simulate the hull's pitching motion and to predict the power generation capability of the converter. A systematic parameter study of the ISWEC is conducted, and its optimal performance in terms of power generation is determined based on the hull and gyroscope control parameters. It is demonstrated that the device achieves peak performance when the gyroscope specifications are chosen based on reactive control theory. It is shown that a proportional control of the PTO control torque is required to generate continuous gyroscope precession effects, without which the device generates no power. In an inertial reference frame, it is demonstrated that the yaw and pitch torques acting on the hull are of the same order of magnitude, informing future design investigations of the ISWEC technology. Further, an energy transfer pathway from the water waves to the hull, the hull to the gyroscope, and the gyroscope to the PTO unit is analytically described and numerically verified. △ Less

Submitted 12 May, 2020; originally announced May 2020.

Comments: Figures are compressed to comply with arXiv size requirements

arXiv:1907.01041 [pdf]

Natural Language Understanding with the Quora Question Pairs Dataset

Authors: Lakshay Sharma, Laura Graesser, Nikita Nangia, Utku Evci

Abstract: This paper explores the task Natural Language Understanding (NLU) by looking at duplicate question detection in the Quora dataset. We conducted extensive exploration of the dataset and used various machine learning models, including linear and tree-based models. Our final finding was that a simple Continuous Bag of Words neural network model had the best performance, outdoing more complicated recu… ▽ More This paper explores the task Natural Language Understanding (NLU) by looking at duplicate question detection in the Quora dataset. We conducted extensive exploration of the dataset and used various machine learning models, including linear and tree-based models. Our final finding was that a simple Continuous Bag of Words neural network model had the best performance, outdoing more complicated recurrent and attention based models. We also conducted error analysis and found some subjectivity in the labeling of the dataset. △ Less

Submitted 1 July, 2019; originally announced July 2019.

arXiv:1905.10425 [pdf, other]

Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark

Authors: Nikita Nangia, Samuel R. Bowman

Abstract: The GLUE benchmark (Wang et al., 2019b) is a suite of language understanding tasks which has seen dramatic progress in the past year, with average performance moving from 70.0 at launch to 83.9, state of the art at the time of writing (May 24, 2019). Here, we measure human performance on the benchmark, in order to learn whether significant headroom remains for further progress. We provide a conser… ▽ More The GLUE benchmark (Wang et al., 2019b) is a suite of language understanding tasks which has seen dramatic progress in the past year, with average performance moving from 70.0 at launch to 83.9, state of the art at the time of writing (May 24, 2019). Here, we measure human performance on the benchmark, in order to learn whether significant headroom remains for further progress. We provide a conservative estimate of human performance on the benchmark through crowdsourcing: Our annotators are non-experts who must learn each task from a brief set of instructions and 20 examples. In spite of limited training, these annotators robustly outperform the state of the art on six of the nine GLUE tasks and achieve an average score of 87.1. Given the fast pace of progress however, the headroom we observe is quite limited. To reproduce the data-poor setting that our annotators must learn in, we also train the BERT model (Devlin et al., 2019) in limited-data regimes, and conclude that low-resource sentence classification remains a challenge for modern neural network approaches to text understanding. △ Less

Submitted 1 June, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

Journal ref: ACL 2019

arXiv:1905.00537 [pdf, other]

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Authors: Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

Abstract: In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert h… ▽ More In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com. △ Less

Submitted 12 February, 2020; v1 submitted 1 May, 2019; originally announced May 2019.

Comments: NeurIPS 2019, super.gluebenchmark.com updating acknowledegments

arXiv:1904.04078 [pdf, other]

doi 10.1016/j.apor.2019.101932

Simulating water-entry/exit problems using Eulerian-Lagrangian and fully-Eulerian fictitious domain methods within the open-source IBAMR library

Authors: Amneet Pal Singh Bhalla, Nishant Nangia, Panagiotis Dafnakis, Giovanni Bracco, Giuliana Mattiazzo

Abstract: In this paper we employ two implementations of the fictitious domain (FD) method to simulate water-entry and water-exit problems and demonstrate their ability to simulate practical marine engineering problems. In FD methods, the fluid momentum equation is extended within the solid domain using an additional body force that constrains the structure velocity to be that of a rigid body. Using this fo… ▽ More In this paper we employ two implementations of the fictitious domain (FD) method to simulate water-entry and water-exit problems and demonstrate their ability to simulate practical marine engineering problems. In FD methods, the fluid momentum equation is extended within the solid domain using an additional body force that constrains the structure velocity to be that of a rigid body. Using this formulation, a single set of equations is solved over the entire computational domain. The constraint force is calculated in two distinct ways: one using an Eulerian-Lagrangian framework of the immersed boundary (IB) method and another using a fully-Eulerian approach of the Brinkman penalization (BP) method. Both FSI strategies use the same multiphase flow algorithm that solves the discrete incompressible Navier-Stokes system in conservative form. A consistent transport scheme is employed to advect mass and momentum in the domain, which ensures numerical stability of high density ratio multiphase flows involved in practical marine engineering applications. Example cases of a free falling wedge (straight and inclined) and cylinder are simulated, and the numerical results are compared against benchmark cases in literature. △ Less

Submitted 3 July, 2019; v1 submitted 4 April, 2019; originally announced April 2019.

Comments: The current paper builds on arXiv:1901.07892 and re-explains some parts of it for the reader's convenience

arXiv:1901.07892 [pdf, other]

doi 10.1016/j.jcp.2019.07.004

A DLM immersed boundary method based wave-structure interaction solver for high density ratio multiphase flows

Authors: Nishant Nangia, Neelesh A. Patankar, Amneet Pal Singh Bhalla

Abstract: We present a robust immersed boundary (IB) method for high density ratio multiphase flows that is capable of modeling complex wave-structure interaction (WSI) problems arising in marine and coastal engineering applications. The IB/WSI methodology is enabled by combining the distributed Lagrange multiplier (DLM) method of Sharma and Patankar (J Comp Phys, 2005) with a robust level set method based… ▽ More We present a robust immersed boundary (IB) method for high density ratio multiphase flows that is capable of modeling complex wave-structure interaction (WSI) problems arising in marine and coastal engineering applications. The IB/WSI methodology is enabled by combining the distributed Lagrange multiplier (DLM) method of Sharma and Patankar (J Comp Phys, 2005) with a robust level set method based multiphase flow solver. The fluid solver integrates the conservative form of the variable-coefficient incompressible Navier-Stokes equations using a hybrid preconditioner and ensures consistent transport of mass and momentum at a discrete level. The consistent transport scheme preserves the numerical stability of the method in the presence of large density ratios found in problems involving air, water, and an immersed structure. The air-water interface is captured by the level set method on an Eulerian grid, whereas the free-surface piercing immersed structure is represented on a Lagrangian mesh. The fluid-structure interaction (FSI) coupling is mediated via Peskin's regularized delta functions in an implicit manner, which obviates the need to integrate the hydrodynamic stress tensor on the complex surface of the immersed structure. The IB/WSI numerical scheme is implemented within an adaptive mesh refinement (AMR) framework, in which the Lagrangian structure and the air-water interface are embedded on the finest mesh level to capture the thin boundary layers and the vortical structures arising from WSI. We use a well-balanced force discretization for gravity force that eliminates spurious velocity currents in the hydrostatic limit due to density variation in the three phases (air, water and solid). An effective wave generation and absorption technique for a numerical wave tank is presented and used to simulate a benchmark case of water wave distortion due to a submerged structure. △ Less

Submitted 1 September, 2019; v1 submitted 21 January, 2019; originally announced January 2019.

Comments: Figures are compressed to comply with arXiv size requirements

arXiv:1809.01008 [pdf, other]

doi 10.1016/j.jcp.2019.03.042

A robust incompressible Navier-Stokes solver for high density ratio multiphase flows

Authors: Nishant Nangia, Boyce E. Griffith, Neelesh A. Patankar, Amneet Pal Singh Bhalla

Abstract: This paper presents a robust, adaptive numerical scheme for simulating high density ratio and high shear multiphase flows on locally refined Cartesian grids that adapt to the evolving interfaces and track regions of high vorticity. The algorithm combines the interface capturing level set method with a variable-coefficient incompressible Navier-Stokes solver that is demonstrated to stably resolve m… ▽ More This paper presents a robust, adaptive numerical scheme for simulating high density ratio and high shear multiphase flows on locally refined Cartesian grids that adapt to the evolving interfaces and track regions of high vorticity. The algorithm combines the interface capturing level set method with a variable-coefficient incompressible Navier-Stokes solver that is demonstrated to stably resolve material contrast ratios of up to six orders of magnitude. The discretization approach ensures second-order pointwise accuracy for both velocity and pressure with several physical boundary treatments, including velocity and traction boundary conditions. The paper includes several test cases that demonstrate the order of accuracy and algorithmic scalability of the flow solver. To ensure the stability of the numerical scheme in the presence of high density and viscosity ratios, we employ a consistent treatment of mass and momentum transport in the conservative form of discrete equations. This consistency is achieved by solving an additional mass balance equation, which we approximate via a strong stability preserving Runga-Kutta time integrator and by employing the same mass flux (obtained from the mass equation) in the discrete momentum equation. The scheme uses higher-order total variation diminishing (TVD) and convection-boundedness criterion (CBC) satisfying limiter to avoid numerical fluctuations in the transported density field. The high-order bounded convective transport is done on a dimension-by-dimension basis, which makes the scheme simple to implement. We also demonstrate through several test cases that the lack of consistent mass and momentum transport in non-conservative formulations, which are commonly used in practice, or the use of non-CBC satisfying limiters can yield very large numerical error and very poor accuracy for convection-dominant high density ratio flows. △ Less

Submitted 1 September, 2019; v1 submitted 31 August, 2018; originally announced September 2018.

Comments: Figures are compressed to comply with arXiv size requirements

arXiv:1804.06028 [pdf, other]

ListOps: A Diagnostic Dataset for Latent Tree Learning

Authors: Nikita Nangia, Samuel R. Bowman

Abstract: Latent tree learning models learn to parse a sentence without syntactic supervision, and use that parse to build the sentence representation. Existing work on such models has shown that, while they perform well on tasks like sentence classification, they do not learn grammars that conform to any plausible semantic or syntactic formalism (Williams et al., 2018a). Studying the parsing ability of suc… ▽ More Latent tree learning models learn to parse a sentence without syntactic supervision, and use that parse to build the sentence representation. Existing work on such models has shown that, while they perform well on tasks like sentence classification, they do not learn grammars that conform to any plausible semantic or syntactic formalism (Williams et al., 2018a). Studying the parsing ability of such models in natural language can be challenging due to the inherent complexities of natural language, like having several valid parses for a single sentence. In this paper we introduce ListOps, a toy dataset created to study the parsing ability of latent tree models. ListOps sequences are in the style of prefix arithmetic. The dataset is designed to have a single correct parsing strategy that a system needs to learn to succeed at the task. We show that the current leading latent tree models are unable to learn to parse and succeed at ListOps. These models achieve accuracies worse than purely sequential RNNs. △ Less

Submitted 16 April, 2018; originally announced April 2018.

Comments: 8 pages, 4 figures, 3 tables, NAACL-SRW (2018)

arXiv:1707.08172 [pdf, other]

The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference with Sentence Representations

Authors: Nikita Nangia, Adina Williams, Angeliki Lazaridou, Samuel R. Bowman

Abstract: This paper presents the results of the RepEval 2017 Shared Task, which evaluated neural network sentence representation learning models on the Multi-Genre Natural Language Inference corpus (MultiNLI) recently introduced by Williams et al. (2017). All of the five participating teams beat the bidirectional LSTM (BiLSTM) and continuous bag of words baselines reported in Williams et al.. The best sing… ▽ More This paper presents the results of the RepEval 2017 Shared Task, which evaluated neural network sentence representation learning models on the Multi-Genre Natural Language Inference corpus (MultiNLI) recently introduced by Williams et al. (2017). All of the five participating teams beat the bidirectional LSTM (BiLSTM) and continuous bag of words baselines reported in Williams et al.. The best single model used stacked BiLSTMs with residual connections to extract sentence features and reached 74.5% accuracy on the genre-matched test set. Surprisingly, the results of the competition were fairly consistent across the genre-matched and genre-mismatched test sets, and across subsets of the test data representing a variety of linguistic phenomena, suggesting that all of the submitted systems learned reasonably domain-independent representations for sentence meaning. △ Less

Submitted 25 July, 2017; originally announced July 2017.

Comments: 10 pages, 1 figure, 6 tables, in Proceedings of The Second Workshop on Evaluating Vector Space Representations for NLP (RepEval 2017)

arXiv:1704.05426 [pdf, ps, other]

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Authors: Adina Williams, Nikita Nangia, Samuel R. Bowman

Abstract: This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest corpora available for the task of NLI, at 433k examples, this corpus improves upon available resources in its coverage: it offers data from ten distinct genres… ▽ More This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest corpora available for the task of NLI, at 433k examples, this corpus improves upon available resources in its coverage: it offers data from ten distinct genres of written and spoken English--making it possible to evaluate systems on nearly the full complexity of the language--and it offers an explicit setting for the evaluation of cross-genre domain adaptation. △ Less

Submitted 19 February, 2018; v1 submitted 18 April, 2017; originally announced April 2017.

Comments: 10 pages, 1 figures, 5 tables. v2 corrects a misreported accuracy number for the CBOW model in the 'matched' setting. v3 adds a discussion of the difficulty of the corpus to the analysis section. v4 is the version that was accepted to NAACL2018

arXiv:1704.00239 [pdf, other]

doi 10.1016/j.jcp.2017.06.047

A moving control volume approach to computing hydrodynamic forces and torques on immersed bodies

Authors: Nishant Nangia, Hans Johansen, Neelesh A. Patankar, Amneet Pal Singh Bhalla

Abstract: We present a moving control volume (CV) approach to computing hydrodynamic forces and torques on complex geometries. The method requires surface and volumetric integrals over a simple and regular Cartesian box that moves with an arbitrary velocity to enclose the body at all times. The moving box is aligned with Cartesian grid faces, which makes the integral evaluation straightforward in an immerse… ▽ More We present a moving control volume (CV) approach to computing hydrodynamic forces and torques on complex geometries. The method requires surface and volumetric integrals over a simple and regular Cartesian box that moves with an arbitrary velocity to enclose the body at all times. The moving box is aligned with Cartesian grid faces, which makes the integral evaluation straightforward in an immersed boundary (IB) framework. Discontinuous and noisy derivatives of velocity and pressure at the fluid-structure interface are avoided and far-field (smooth) velocity and pressure information is used. We re-visit the approach to compute hydrodynamic forces and torques through force/torque balance equation in a Lagrangian frame that some of us took in a prior work (Bhalla et al., J Comp Phys, 2013). We prove the equivalence of the two approaches for IB methods, thanks to the use of Peskin's delta functions. Both approaches are able to suppress spurious force oscillations and are in excellent agreement, as expected theoretically. Test cases ranging from Stokes to high Reynolds number regimes are considered. We discuss regridding issues for the moving CV method in an adaptive mesh refinement (AMR) context. The proposed moving CV method is not limited to a specific IB method and can also be used, for example, with embedded boundary methods. △ Less

Submitted 15 June, 2017; v1 submitted 1 April, 2017; originally announced April 2017.

Showing 1–23 of 23 results for author: Nangia, N