Skip to main content

Showing 1–20 of 20 results for author: Wintner, S

.
  1. arXiv:2405.18115  [pdf, other

    cs.CL

    The Knesset Corpus: An Annotated Corpus of Hebrew Parliamentary Proceedings

    Authors: Gili Goldin, Nick Howell, Noam Ordan, Ella Rabinovich, Shuly Wintner

    Abstract: We present the Knesset Corpus, a corpus of Hebrew parliamentary proceedings containing over 30 million sentences (over 384 million tokens) from all the (plenary and committee) protocols held in the Israeli parliament between 1998 and 2022. Sentences are annotated with morpho-syntactic information and are associated with detailed meta-information reflecting demographic and political properties of t… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 28 pages, 7 figures

    MSC Class: 68T50 ACM Class: I.2.7

  2. arXiv:2308.15209  [pdf, other

    cs.CL cs.AI

    Shared Lexical Items as Triggers of Code Switching

    Authors: Shuly Wintner, Safaa Shehadi, Yuli Zeira, Doreen Osmelak, Yuval Nov

    Abstract: Why do bilingual speakers code-switch (mix their two languages)? Among the several theories that attempt to explain this natural and ubiquitous phenomenon, the Triggering Hypothesis relates code-switching to the presence of lexical triggers, specifically cognates and proper names, adjacent to the switch point. We provide a fuller, more nuanced and refined exploration of the triggering hypothesis,… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: This is the author's final version; the article has been accepted for publication in the Transactions of the Association for Computational Linguistics (TACL)

  3. arXiv:2203.08979  [pdf, other

    cs.CL

    Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching

    Authors: Alissa Ostapenko, Shuly Wintner, Melinda Fricke, Yulia Tsvetkov

    Abstract: Natural language processing (NLP) models trained on people-generated data can be unreliable because, without any constraints, they can learn from spurious correlations that are not relevant to the task. We hypothesize that enriching models with speaker information in a controlled, educated way can guide them to pick up on relevant inductive biases. For the speaker-driven task of predicting code-sw… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: To appear in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)

  4. arXiv:2106.06797  [pdf, other

    cs.CL

    Machine Translation into Low-resource Language Varieties

    Authors: Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, Yulia Tsvetkov

    Abstract: State-of-the-art machine translation (MT) systems are typically trained to generate the "standard" target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different from the standard language. Such varieties are often low-resource, and hence do not benefit from contemporary NLP solutions, MT included. We propose a g… ▽ More

    Submitted 15 October, 2021; v1 submitted 12 June, 2021; originally announced June 2021.

    Comments: The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)

  5. arXiv:1909.00453  [pdf, other

    cs.LG cs.CL stat.ML

    Topics to Avoid: Demoting Latent Confounds in Text Classification

    Authors: Sachin Kumar, Shuly Wintner, Noah A. Smith, Yulia Tsvetkov

    Abstract: Despite impressive performance on many text classification tasks, deep neural networks tend to learn frequent superficial patterns that are specific to the training data and do not always generalize well. In this work, we observe this limitation with respect to the task of native language identification. We find that standard text classifiers which perform well on the test set end up learning topi… ▽ More

    Submitted 14 June, 2021; v1 submitted 1 September, 2019; originally announced September 2019.

    Comments: 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019)

  6. arXiv:1808.09386  [pdf, other

    cs.CL

    Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies

    Authors: Anjalie Field, Doron Kliger, Shuly Wintner, Jennifer Pan, Dan Jurafsky, Yulia Tsvetkov

    Abstract: Amidst growing concern over media manipulation, NLP attention has focused on overt strategies like censorship and "fake news'". Here, we draw on two concepts from the political science literature to explore subtler strategies for government media manipulation: agenda-setting (selecting what topics to cover) and framing (deciding how topics are covered). We analyze 13 years (100K articles) of the R… ▽ More

    Submitted 29 October, 2018; v1 submitted 28 August, 2018; originally announced August 2018.

    Comments: Accepted as a full paper at EMNLP 2018

  7. arXiv:1805.09590  [pdf, other

    cs.CL

    Native Language Cognate Effects on Second Language Lexical Choice

    Authors: Ella Rabinovich, Yulia Tsvetkov, Shuly Wintner

    Abstract: We present a computational analysis of cognate effects on the spontaneous linguistic productions of advanced non-native speakers. Introducing a large corpus of highly competent non-native English speakers, and using a set of carefully selected lexical items, we show that the lexical choices of non-natives are affected by cognates in their native language. This effect is so powerful that we are abl… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

    Comments: Transactions of the Association for Computational Linguistics (TACL), 2018; 14 pages

  8. arXiv:1805.07697  [pdf

    cs.CL

    The UN Parallel Corpus Annotated for Translation Direction

    Authors: Elad Tolochinsky, Ohad Mosafi, Ella Rabinovich, Shuly Wintner

    Abstract: This work distinguishes between translated and original text in the UN protocol corpus. By modeling the problem as classification problem, we can achieve up to 95% classification accuracy. We begin by deriving a parallel corpus for different language-pairs annotated for translation direction, and then classify the data by using various feature extraction methods. We compare the different methods a… ▽ More

    Submitted 19 May, 2018; originally announced May 2018.

  9. arXiv:1704.07146  [pdf, other

    cs.CL

    Found in Translation: Reconstructing Phylogenetic Language Trees from Translations

    Authors: Ella Rabinovich, Noam Ordan, Shuly Wintner

    Abstract: Translation has played an important role in trade, law, commerce, politics, and literature for thousands of years. Translators have always tried to be invisible; ideal translations should look as if they were written originally in the target language. We show that traces of the source language remain in the translation product to the extent that it is possible to uncover the history of the source… ▽ More

    Submitted 24 April, 2017; originally announced April 2017.

    Comments: ACL2017, 11 pages

  10. arXiv:1610.05461  [pdf, other

    cs.CL

    Personalized Machine Translation: Preserving Original Author Traits

    Authors: Ella Rabinovich, Shachar Mirkin, Raj Nath Patel, Lucia Specia, Shuly Wintner

    Abstract: The language that we produce reflects our personality, and various personal and demographic characteristics can be detected in natural language texts. We focus on one particular personal trait of the author, gender, and study how it is manifested in original texts and in translations. We show that author's gender has a powerful, clear signal in originals texts, but this signal is obfuscated in hum… ▽ More

    Submitted 12 January, 2017; v1 submitted 18 October, 2016; originally announced October 2016.

    Comments: EACL 2017, 11 pages

  11. arXiv:1609.03205  [pdf, ps, other

    cs.CL

    Unsupervised Identification of Translationese

    Authors: Ella Rabinovich, Shuly Wintner

    Abstract: Translated texts are distinctively different from original ones, to the extent that supervised text classification methods can distinguish between them with high accuracy. These differences were proven useful for statistical machine translation. However, it has been suggested that the accuracy of translation detection deteriorates when the classifier is evaluated outside the domain it was trained… ▽ More

    Submitted 11 September, 2016; originally announced September 2016.

    Comments: TACL2015, 14 pages

  12. arXiv:1609.03204  [pdf, other

    cs.CL

    On the Similarities Between Native, Non-native and Translated Texts

    Authors: Ella Rabinovich, Sergiu Nisioi, Noam Ordan, Shuly Wintner

    Abstract: We present a computational analysis of three language varieties: native, advanced non-native, and translation. Our goal is to investigate the similarities and differences between non-native language productions and translations, contrasting both with native language. Using a collection of computational methods we establish three main results: (1) the three types of texts are easily distinguishable… ▽ More

    Submitted 11 September, 2016; originally announced September 2016.

    Comments: ACL2016, 12 pages

  13. arXiv:1509.03611  [pdf, ps, other

    cs.CL

    A Parallel Corpus of Translationese

    Authors: Ella Rabinovich, Shuly Wintner, Ofek Luis Lewinsohn

    Abstract: We describe a set of bilingual English--French and English--German parallel corpora in which the direction of translation is accurately and reliably annotated. The corpora are diverse, consisting of parliamentary proceedings, literary works, transcriptions of TED talks and political commentary. They will be instrumental for research of translationese and its applications to (human and machine) tra… ▽ More

    Submitted 6 March, 2016; v1 submitted 11 September, 2015; originally announced September 2015.

  14. Amalia -- A Unified Platform for Parsing and Generation

    Authors: Shuly Wintner, Evgeniy Gabrilovich, Nissim Francez

    Abstract: Contemporary linguistic theories (in particular, HPSG) are declarative in nature: they specify constraints on permissible structures, not how such structures are to be computed. Grammars designed under such theories are, therefore, suitable for both parsing and generation. However, practical implementations of such theories don't usually support bidirectional processing of grammars. We present a… ▽ More

    Submitted 24 September, 1997; originally announced September 1997.

    Comments: 8 pages postscript

    Journal ref: Proceedings of Recent Advances in Natural Language Programming (RANLP97), Tzigov Chark, Bulgaria, 11-13 September 1997, pp. 135-142

  15. An Abstract Machine for Unification Grammars

    Authors: Shuly Wintner

    Abstract: This work describes the design and implementation of an abstract machine, Amalia, for the linguistic formalism ALE, which is based on typed feature structures. This formalism is one of the most widely accepted in computational linguistics and has been used for designing grammars in various linguistic theories, most notably HPSG. Amalia is composed of data structures and a set of instructions, au… ▽ More

    Submitted 23 September, 1997; originally announced September 1997.

    Comments: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks, pst-node, psfig, fullname and a macros file

  16. Off-line Parsability and the Well-foundedness of Subsumption

    Authors: Shuly Wintner, Nissim Francez

    Abstract: Typed feature structures are used extensively for the specification of linguistic information in many formalisms. The subsumption relation orders TFSs by their information content. We prove that subsumption of acyclic TFSs is well-founded, whereas in the presence of cycles general TFS subsumption is not well-founded. We show an application of this result for parsing, where the well-foundedness o… ▽ More

    Submitted 23 September, 1997; originally announced September 1997.

    Comments: 19 pages, 1 postscript figure, uses fullname.sty

  17. arXiv:cmp-lg/9601011  [pdf, ps

    cs.CL

    Parsing with Typed Feature Structures

    Authors: Shuly Wintner, Nissim Francez

    Abstract: In this paper we provide for parsing with respect to grammars expressed in a general TFS-based formalism, a restriction of ALE. Our motivation being the design of an abstract (WAM-like) machine for the formalism, we consider parsing as a computational process and use it as an operational semantics to guide the design of the control structures for the abstract machine. We emphasize the notion of… ▽ More

    Submitted 31 January, 1996; originally announced January 1996.

    Comments: PostScript, 29 pages

    Report number: Laboratory for Computational Linguistics TR #LCL 95-1

  18. arXiv:cmp-lg/9601010  [pdf, ps

    cs.CL

    Parsing with Typed Feature Structures

    Authors: Shuly Wintner, Nissim Francez

    Abstract: In this paper we provide for parsing with respect to grammars expressed in a general TFS-based formalism, a restriction of ALE. Our motivation being the design of an abstract (WAM-like) machine for the formalism, we consider parsing as a computational process and use it as an operational semantics to guide the design of the control structures for the abstract machine. We emphasize the notion of… ▽ More

    Submitted 31 January, 1996; originally announced January 1996.

    Comments: PostScript, 15 pages; Proc. 4th Intl. Workshop on Parsing Technologies, Prague, September 1995

  19. Abstract Machine for Typed Feature Structures

    Authors: Shuly Wintner, Nissim Francez

    Abstract: This paper describes an abstract machine for linguistic formalisms that are based on typed feature structures, such as HPSG. The core design of the abstract machine is given in detail, including the compilation process from a high-level language to the abstract machine language and the implementation of the abstract instructions. The machine's engine supports the unification of typed, possibly c… ▽ More

    Submitted 13 April, 1995; originally announced April 1995.

    Comments: Self-contained LaTeX, 15 pages, to appear in NLULP95

    Journal ref: Proc. 5th Int. Workshop on Natural Language Understanding and Logic Programming, Lisbon, May 1995

  20. arXiv:cmp-lg/9407014  [pdf, ps

    cs.CL

    Abstract Machine for Typed Feature Structures

    Authors: Shuly Wintner, Nissim Francez

    Abstract: This paper describes a first step towards the definition of an abstract machine for linguistic formalisms that are based on typed feature structures, such as HPSG. The core design of the abstract machine is given in detail, including the compilation process from a high-level specification language to the abstract machine language and the implementation of the abstract instructions. We thus apply… ▽ More

    Submitted 17 July, 1994; originally announced July 1994.

    Comments: 38 pages, uuencoded compressed postscript

    Report number: TR #LCL 94-8, Laboratory for Computational Linguistics, Technion