Skip to main content

Showing 1–14 of 14 results for author: Schuster, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.21068  [pdf, other

    cs.CL cs.AI

    Code Pretraining Improves Entity Tracking Abilities of Language Models

    Authors: Najoung Kim, Sebastian Schuster, Shubham Toshniwal

    Abstract: Recent work has provided indirect evidence that pretraining language models on code improves the ability of models to track state changes of discourse entities expressed in natural language. In this work, we systematically test this claim by comparing pairs of language models on their entity tracking performance. Critically, the pairs consist of base models and models trained on top of these base… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  2. Scope Ambiguities in Large Language Models

    Authors: Gaurav Kamath, Sebastian Schuster, Sowmya Vajjala, Siva Reddy

    Abstract: Sentences containing multiple semantic operators with overlap** scope often create ambiguities in interpretation, known as scope ambiguities. These ambiguities offer rich insights into the interaction between semantic structure and world knowledge in language processing. Despite this, there has been little research into how modern large language models treat them. In this paper, we investigate h… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: To be published in Transactions of the Association for Computational Linguistics

  3. arXiv:2305.02363  [pdf, other

    cs.CL

    Entity Tracking in Language Models

    Authors: Najoung Kim, Sebastian Schuster

    Abstract: Kee** track of how states of entities change as a text or dialog unfolds is a key prerequisite to discourse understanding. Yet, there have been few systematic investigations into the ability of large language models (LLMs) to track discourse entities. In this work, we present a task probing to what extent a language model can infer the final state of an entity given an English description of the… ▽ More

    Submitted 8 September, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Camera-ready

  4. arXiv:2304.04758  [pdf, other

    cs.CL

    Expectations over Unspoken Alternatives Predict Pragmatic Inferences

    Authors: Jennifer Hu, Roger Levy, Judith Degen, Sebastian Schuster

    Abstract: Scalar inferences (SI) are a signature example of how humans interpret language based on unspoken alternatives. While empirical studies have demonstrated that human SI rates are highly variable -- both within instances of a single scale, and across different scales -- there have been few proposals that quantitatively explain both cross- and within-scale variation. Furthermore, while it is generall… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: To appear in TACL (pre-MIT Press publication version)

  5. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  6. arXiv:2205.03472  [pdf, other

    cs.CL

    When a sentence does not introduce a discourse entity, Transformer-based models still sometimes refer to it

    Authors: Sebastian Schuster, Tal Linzen

    Abstract: Understanding longer narratives or participating in conversations requires tracking of discourse entities that have been mentioned. Indefinite noun phrases (NPs), such as 'a dog', frequently introduce discourse entities but this behavior is modulated by sentential operators such as negation. For example, 'a dog' in 'Arthur doesn't own a dog' does not introduce a discourse entity due to the presenc… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: To appear at NAACL 2022

  7. arXiv:2203.13094  [pdf

    cs.NI

    Six Insights into 6G: Orientation and Input for Develo** Your Strategic 6G Research Plan

    Authors: Kimberley Parsons Trommler, Matthias Hafner, Wolfgang Kellerer, Peter Merz, Sigurd Schuster, Josef Urban, Uwe Baeder, Bertram Gunzelmann, Andreas Kornbichler

    Abstract: This paper is a summary of the findings from a series of workshops which were held by Thinknet 6G and MUENCHNER KREIS in 2021, with the goal to provide orientation and input for develo** a strategic 6G research plan. The topics selected for the workshops are aspects of 6G that we expect will have a significant impact on other industries and on society: - 6G as both a communication infrastructu… ▽ More

    Submitted 20 May, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: 20 pages, 1 figure

    Report number: Thinknet-6G-2022-May-01 ACM Class: C.2.0; C.2.1; C.2.6

  8. arXiv:2203.09397  [pdf, other

    cs.CL

    Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models

    Authors: Aaron Mueller, Robert Frank, Tal Linzen, Luheng Wang, Sebastian Schuster

    Abstract: Relations between words are governed by hierarchical structure rather than linear ordering. Sequence-to-sequence (seq2seq) models, despite their success in downstream NLP applications, often fail to generalize in a hierarchy-sensitive manner when performing syntactic transformations - for example, transforming declarative sentences into questions. However, syntactic evaluations of seq2seq models h… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: Accepted to Findings of ACL 2022

  9. arXiv:2201.12266  [pdf

    cs.NI

    Six Questions about 6G

    Authors: Kimberley Parsons Trommler, Matthias Hafner, Wolfgang Kellerer, Peter Merz, Sigurd Schuster, Josef Urban, Uwe Baeder, Bertram Gunzelmann, Andreas Kornbichler

    Abstract: Although 5G (Fifth Generation) mobile technology is still in the rollout phase, research and development of 6G (Sixth Generation) wireless have already begun. This paper is an introduction to 6G wireless networks, covering the main drivers for 6G, some of the expected use cases, some of the technical challenges in 6G, example areas that will require research and new technologies, the expected time… ▽ More

    Submitted 7 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: 6 pages, 3 figures, document also available in German, document available in a more attractive format, here: www.thinknet-6g.de

    ACM Class: C.2.0; C.2.1; C.2.6

  10. arXiv:2109.06987  [pdf, other

    cs.CL

    NOPE: A Corpus of Naturally-Occurring Presuppositions in English

    Authors: Alicia Parrish, Sebastian Schuster, Alex Warstadt, Omar Agha, Soo-Hwan Lee, Zhuoye Zhao, Samuel R. Bowman, Tal Linzen

    Abstract: Understanding language requires gras** not only the overtly stated content, but also making inferences about things that were left unsaid. These inferences include presuppositions, a phenomenon by which a listener learns about new information through reasoning about what a speaker takes as given. Presuppositions require complex understanding of the lexical and syntactic properties that trigger t… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: CoNLL 2021. Data and code available at https://github.com/nyu-mll/nope

  11. arXiv:2004.10643  [pdf, other

    cs.CL

    Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection

    Authors: Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, Daniel Zeman

    Abstract: Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation consists in a linguistically motivated word segmentation; a morphological layer comprising lemmas, universal part-of-speech tags, and standardized morphological features; and a syntactic layer focusing on… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

    Comments: LREC 2020

  12. arXiv:1910.14254  [pdf, other

    cs.CL

    Harnessing the linguistic signal to predict scalar inferences

    Authors: Sebastian Schuster, Yuxing Chen, Judith Degen

    Abstract: Pragmatic inferences often subtly depend on the presence or absence of linguistic features. For example, the presence of a partitive construction (of the) increases the strength of a so-called scalar inference: listeners perceive the inference that Chris did not eat all of the cookies to be stronger after hearing "Chris ate some of the cookies" than after hearing the same utterance without a parti… ▽ More

    Submitted 22 April, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

    Comments: ACL 2020; 16 pages, 8 figures

  13. arXiv:1810.13327  [pdf, other

    cs.CL

    Cross-Lingual Transfer Learning for Multilingual Task Oriented Dialog

    Authors: Sebastian Schuster, Sonal Gupta, Rushin Shah, Mike Lewis

    Abstract: One of the first steps in the utterance interpretation pipeline of many task-oriented conversational AI systems is to identify user intents and the corresponding slots. Since data collection for machine learning models for this task is time-consuming, it is desirable to make use of existing data in a high-resource language to train models in low-resource languages. However, development of such mod… ▽ More

    Submitted 1 April, 2019; v1 submitted 31 October, 2018; originally announced October 2018.

    Comments: 11 pages, to be presented at NAACL 2019

  14. arXiv:1804.06922  [pdf, other

    cs.CL

    Sentences with Gap**: Parsing and Reconstructing Elided Predicates

    Authors: Sebastian Schuster, Joakim Nivre, Christopher D. Manning

    Abstract: Sentences with gap**, such as Paul likes coffee and Mary tea, lack an overt predicate to indicate the relation between two or more arguments. Surface syntax representations of such sentences are often produced poorly by parsers, and even if correct, not well suited to downstream natural language understanding tasks such as relation extraction that are typically designed to extract information fr… ▽ More

    Submitted 18 April, 2018; originally announced April 2018.

    Comments: To be presented at NAACL 2018

    Journal ref: Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2018)