Skip to main content

Showing 1–24 of 24 results for author: Rabinovich, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.20833  [pdf, other

    cs.CL

    That's Optional: A Contemporary Exploration of "that" Omission in English Subordinate Clauses

    Authors: Ella Rabinovich

    Abstract: The Uniform Information Density (UID) hypothesis posits that speakers optimize the communicative properties of their utterances by avoiding spikes in information, thereby maintaining a relatively uniform information profile over time. This paper investigates the impact of UID principles on syntactic reduction, specifically focusing on the optional omission of the connector "that" in English subord… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: ACL2024 (main conference), 8 pages

  2. arXiv:2405.18115  [pdf, other

    cs.CL

    The Knesset Corpus: An Annotated Corpus of Hebrew Parliamentary Proceedings

    Authors: Gili Goldin, Nick Howell, Noam Ordan, Ella Rabinovich, Shuly Wintner

    Abstract: We present the Knesset Corpus, a corpus of Hebrew parliamentary proceedings containing over 30 million sentences (over 384 million tokens) from all the (plenary and committee) protocols held in the Israeli parliament between 1998 and 2022. Sentences are annotated with morpho-syntactic information and are associated with detailed meta-information reflecting demographic and political properties of t… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 28 pages, 7 figures

    MSC Class: 68T50 ACM Class: I.2.7

  3. arXiv:2311.01152  [pdf, other

    cs.CL

    Predicting Question-Answering Performance of Large Language Models through Semantic Consistency

    Authors: Ella Rabinovich, Samuel Ackerman, Orna Raz, Eitan Farchi, Ateret Anaby-Tavor

    Abstract: Semantic consistency of a language model is broadly defined as the model's ability to produce semantically-equivalent outputs, given semantically-equivalent inputs. We address the task of assessing question-answering (QA) semantic consistency of contemporary large language models (LLMs) by manually creating a benchmark dataset with high-quality paraphrases for factual questions, and release the da… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: EMNLP2023 GEM workshop, 17 pages

  4. arXiv:2305.17750  [pdf, other

    cs.CL

    Reliable and Interpretable Drift Detection in Streams of Short Texts

    Authors: Ella Rabinovich, Matan Vetzler, Samuel Ackerman, Ateret Anaby-Tavor

    Abstract: Data drift is the change in model input data that is one of the key factors leading to machine learning models performance degradation over time. Monitoring drift helps detecting these issues and preventing their harmful consequences. Meaningful drift interpretation is a fundamental step towards effective re-training of the model. In this study we propose an end-to-end framework for reliable model… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: ACL2023 industry track (9 pages)

  5. arXiv:2210.11905  [pdf, other

    cs.CL

    Exploration of the Usage of Color Terms by Color-blind Participants in Online Discussion Platforms

    Authors: Ella Rabinovich, Boaz Carmeli

    Abstract: Prominent questions about the role of sensory vs. linguistic input in the way we acquire and use language have been extensively studied in the psycholinguistic literature. However, the relative effect of various factors in a person's overall experience on their linguistic system remains unclear. We study this question by making a step forward towards a better understanding of the conceptual percep… ▽ More

    Submitted 30 October, 2022; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP 2022 (main conference), 13 pages

  6. arXiv:2204.13043  [pdf, other

    cs.HC stat.AP

    High-quality Conversational Systems

    Authors: Samuel Ackerman, Ateret Anaby-Tavor, Eitan Farchi, Esther Goldbraich, George Kour, Ella Rabinovich, Orna Raz, Saritha Route, Marcel Zalmanovici, Naama Zwerdling

    Abstract: Conversational systems or chatbots are an example of AI-Infused Applications (AIIA). Chatbots are especially important as they are often the first interaction of clients with a business and are the entry point of a business into the AI (Artificial Intelligence) world. The quality of the chatbot is, therefore, key. However, as is the case in general with AIIAs, it is especially challenging to asses… ▽ More

    Submitted 28 April, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

  7. arXiv:2204.05158  [pdf, other

    cs.CL

    Gaining Insights into Unrecognized User Utterances in Task-Oriented Dialog Systems

    Authors: Ella Rabinovich, Matan Vetzler, David Boaz, Vineet Kumar, Gaurav Pandey, Ateret Anaby-Tavor

    Abstract: The rapidly growing market demand for automatic dialogue agents capable of goal-oriented behavior has caused many tech-industry leaders to invest considerable efforts into task-oriented dialog systems. The success of these systems is highly dependent on the accuracy of their intent identification -- the process of deducing the goal or meaning of the user's request and map** it to one of the know… ▽ More

    Submitted 24 October, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted at EMNLP 2022 (industry track), 8 pages

  8. arXiv:2110.05780  [pdf, other

    cs.CL

    We've had this conversation before: A Novel Approach to Measuring Dialog Similarity

    Authors: Ofer Lavi, Ella Rabinovich, Segev Shlomov, David Boaz, Inbal Ronen, Ateret Anaby-Tavor

    Abstract: Dialog is a core building block of human natural language interactions. It contains multi-party utterances used to convey information from one party to another in a dynamic and evolving manner. The ability to compare dialogs is beneficial in many real world use cases, such as conversation analytics for contact center calls and virtual agent design. We propose a novel adaptation of the edit dista… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: EMNLP 2021, 9 pages

  9. arXiv:2110.05775  [pdf, other

    cs.CL

    Quantifying Cognitive Factors in Lexical Decline

    Authors: David Francis, Ella Rabinovich, Farhan Samir, David Mortensen, Suzanne Stevenson

    Abstract: We adopt an evolutionary view on language change in which cognitive factors (in addition to social ones) affect the fitness of words and their success in the linguistic ecosystem. Specifically, we propose a variety of psycholinguistic factors -- semantic, distributional, and phonological -- that we hypothesize are predictive of lexical decline, in which words greatly decrease in frequency over tim… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: Transactions of the Association for Computational Linguistics (TACL) 2021, 16 pages

  10. arXiv:2011.00335  [pdf, other

    cs.CL

    Pick a Fight or Bite your Tongue: Investigation of Gender Differences in Idiomatic Language Usage

    Authors: Ella Rabinovich, Hila Gonen, Suzanne Stevenson

    Abstract: A large body of research on gender-linked language has established foundations regarding cross-gender differences in lexical, emotional, and topical preferences, along with their sociological underpinnings. We compile a novel, large and diverse corpus of spontaneous linguistic productions annotated with speakers' gender, and perform a first large-scale empirical study of distinctions in the usage… ▽ More

    Submitted 31 October, 2020; originally announced November 2020.

    Comments: COLING'2020, 12 pages

  11. arXiv:2008.05713  [pdf, other

    cs.CL cs.SI

    Exploration of Gender Differences in COVID-19 Discourse on Reddit

    Authors: Jai Aggarwal, Ella Rabinovich, Suzanne Stevenson

    Abstract: Decades of research on differences in the language of men and women have established postulates about preferences in lexical, topical, and emotional expression between the two genders, along with their sociological underpinnings. Using a novel dataset of male and female linguistic productions collected from the Reddit discussion platform, we further confirm existing assumptions about gender-linked… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: Proceedings of the 1st Workshop on NLP for COVID-19 (ACL 2020)

  12. arXiv:2006.01966  [pdf, other

    cs.CL

    The Typology of Polysemy: A Multilingual Distributional Framework

    Authors: Ella Rabinovich, Yang Xu, Suzanne Stevenson

    Abstract: Lexical semantic typology has identified important cross-linguistic generalizations about the variation and commonalities in polysemy patterns---how languages package up meanings into words. Recent computational research has enabled investigation of lexical semantics at a much larger scale, but little work has explored lexical typology across semantic domains, nor the factors that influence cross-… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: CogSci 2020 (Annual Meeting of the Cognitive Science Society)

  13. Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods

    Authors: Maria Ryskina, Ella Rabinovich, Taylor Berg-Kirkpatrick, David R. Mortensen, Yulia Tsvetkov

    Abstract: We perform statistical analysis of the phenomenon of neology, the process by which new words emerge in a language, using large diachronic corpora of English. We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm. We show that both factors are predictive of word emergence although we find… ▽ More

    Submitted 21 January, 2020; originally announced January 2020.

    Comments: SCiL 2020

    Journal ref: Proceedings of the Society for Computation in Linguistics 3.1 (2020): 43-52

  14. arXiv:1909.07928  [pdf, other

    cs.CL

    Say Anything: Automatic Semantic Infelicity Detection in L2 English Indefinite Pronouns

    Authors: Ella Rabinovich, Julia Watson, Barend Beekhuizen, Suzanne Stevenson

    Abstract: Computational research on error detection in second language speakers has mainly addressed clear grammatical anomalies typical to learners at the beginner-to-intermediate level. We focus instead on acquisition of subtle semantic nuances of English indefinite pronouns by non-native speakers at varying levels of proficiency. We first lay out theoretical, linguistically motivated hypotheses, and supp… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

    Comments: 10 pages, CoNLL2019

  15. arXiv:1908.11841  [pdf, other

    cs.CL

    CodeSwitch-Reddit: Exploration of Written Multilingual Discourse in Online Discussion Forums

    Authors: Ella Rabinovich, Masih Sultani, Suzanne Stevenson

    Abstract: In contrast to many decades of research on oral code-switching, the study of written multilingual productions has only recently enjoyed a surge of interest. Many open questions remain regarding the sociolinguistic underpinnings of written code-switching, and progress has been limited by a lack of suitable resources. We introduce a novel, large, and diverse dataset of written code-switched producti… ▽ More

    Submitted 30 August, 2019; originally announced August 2019.

    Comments: EMNLP2019, 11 pages

  16. arXiv:1908.07491  [pdf, ps, other

    cs.CL

    Controversy in Context

    Authors: Benjamin Sznajder, Ariel Gera, Yonatan Bilu, Dafna Sheinwald, Ella Rabinovich, Ranit Aharonov, David Konopnicki, Noam Slonim

    Abstract: With the growing interest in social applications of Natural Language Processing and Computational Argumentation, a natural question is how controversial a given concept is. Prior works relied on Wikipedia's metadata and on content analysis of the articles pertaining to a concept in question. Here we show that the immediate textual context of a concept is strongly indicative of this property, and,… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

    Comments: 5 pages

  17. arXiv:1809.01285  [pdf, ps, other

    cs.CL

    Learning Concept Abstractness Using Weak Supervision

    Authors: Ella Rabinovich, Benjamin Sznajder, Artem Spector, Ilya Shnayderman, Ranit Aharonov, David Konopnicki, Noam Slonim

    Abstract: We introduce a weakly supervised approach for inferring the property of abstractness of words and expressions in the complete absence of labeled data. Exploiting only minimal linguistic clues and the contextual usage of a concept as manifested in textual data, we train sufficiently powerful classifiers, obtaining high correlation with human labels. The results imply the applicability of this appro… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

    Comments: 6 pages, EMNLP 2018

  18. arXiv:1805.09590  [pdf, other

    cs.CL

    Native Language Cognate Effects on Second Language Lexical Choice

    Authors: Ella Rabinovich, Yulia Tsvetkov, Shuly Wintner

    Abstract: We present a computational analysis of cognate effects on the spontaneous linguistic productions of advanced non-native speakers. Introducing a large corpus of highly competent non-native English speakers, and using a set of carefully selected lexical items, we show that the lexical choices of non-natives are affected by cognates in their native language. This effect is so powerful that we are abl… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

    Comments: Transactions of the Association for Computational Linguistics (TACL), 2018; 14 pages

  19. arXiv:1805.07697  [pdf

    cs.CL

    The UN Parallel Corpus Annotated for Translation Direction

    Authors: Elad Tolochinsky, Ohad Mosafi, Ella Rabinovich, Shuly Wintner

    Abstract: This work distinguishes between translated and original text in the UN protocol corpus. By modeling the problem as classification problem, we can achieve up to 95% classification accuracy. We begin by deriving a parallel corpus for different language-pairs annotated for translation direction, and then classify the data by using various feature extraction methods. We compare the different methods a… ▽ More

    Submitted 19 May, 2018; originally announced May 2018.

  20. arXiv:1704.07146  [pdf, other

    cs.CL

    Found in Translation: Reconstructing Phylogenetic Language Trees from Translations

    Authors: Ella Rabinovich, Noam Ordan, Shuly Wintner

    Abstract: Translation has played an important role in trade, law, commerce, politics, and literature for thousands of years. Translators have always tried to be invisible; ideal translations should look as if they were written originally in the target language. We show that traces of the source language remain in the translation product to the extent that it is possible to uncover the history of the source… ▽ More

    Submitted 24 April, 2017; originally announced April 2017.

    Comments: ACL2017, 11 pages

  21. arXiv:1610.05461  [pdf, other

    cs.CL

    Personalized Machine Translation: Preserving Original Author Traits

    Authors: Ella Rabinovich, Shachar Mirkin, Raj Nath Patel, Lucia Specia, Shuly Wintner

    Abstract: The language that we produce reflects our personality, and various personal and demographic characteristics can be detected in natural language texts. We focus on one particular personal trait of the author, gender, and study how it is manifested in original texts and in translations. We show that author's gender has a powerful, clear signal in originals texts, but this signal is obfuscated in hum… ▽ More

    Submitted 12 January, 2017; v1 submitted 18 October, 2016; originally announced October 2016.

    Comments: EACL 2017, 11 pages

  22. arXiv:1609.03205  [pdf, ps, other

    cs.CL

    Unsupervised Identification of Translationese

    Authors: Ella Rabinovich, Shuly Wintner

    Abstract: Translated texts are distinctively different from original ones, to the extent that supervised text classification methods can distinguish between them with high accuracy. These differences were proven useful for statistical machine translation. However, it has been suggested that the accuracy of translation detection deteriorates when the classifier is evaluated outside the domain it was trained… ▽ More

    Submitted 11 September, 2016; originally announced September 2016.

    Comments: TACL2015, 14 pages

  23. arXiv:1609.03204  [pdf, other

    cs.CL

    On the Similarities Between Native, Non-native and Translated Texts

    Authors: Ella Rabinovich, Sergiu Nisioi, Noam Ordan, Shuly Wintner

    Abstract: We present a computational analysis of three language varieties: native, advanced non-native, and translation. Our goal is to investigate the similarities and differences between non-native language productions and translations, contrasting both with native language. Using a collection of computational methods we establish three main results: (1) the three types of texts are easily distinguishable… ▽ More

    Submitted 11 September, 2016; originally announced September 2016.

    Comments: ACL2016, 12 pages

  24. arXiv:1509.03611  [pdf, ps, other

    cs.CL

    A Parallel Corpus of Translationese

    Authors: Ella Rabinovich, Shuly Wintner, Ofek Luis Lewinsohn

    Abstract: We describe a set of bilingual English--French and English--German parallel corpora in which the direction of translation is accurately and reliably annotated. The corpora are diverse, consisting of parliamentary proceedings, literary works, transcriptions of TED talks and political commentary. They will be instrumental for research of translationese and its applications to (human and machine) tra… ▽ More

    Submitted 6 March, 2016; v1 submitted 11 September, 2015; originally announced September 2015.