Skip to main content

Showing 1–7 of 7 results for author: Mirkin, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  2. arXiv:1909.00393  [pdf, other

    cs.CL cs.AI cs.LG

    A Dataset of General-Purpose Rebuttal

    Authors: Matan Orbach, Yonatan Bilu, Ariel Gera, Yoav Kantor, Lena Dankin, Tamar Lavee, Lili Kotlerman, Shachar Mirkin, Michal Jacovi, Ranit Aharonov, Noam Slonim

    Abstract: In Natural Language Understanding, the task of response generation is usually focused on responses to short texts, such as tweets or a turn in a dialog. Here we present a novel task of producing a critical response to a long argumentative text, and suggest a method based on general rebuttal arguments to address it. We do this in the context of the recently-suggested task of listening comprehension… ▽ More

    Submitted 1 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019

  3. arXiv:1907.11889  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Effective Rebuttal: Listening Comprehension using Corpus-Wide Claim Mining

    Authors: Tamar Lavee, Matan Orbach, Lili Kotlerman, Yoav Kantor, Shai Gretz, Lena Dankin, Shachar Mirkin, Michal Jacovi, Yonatan Bilu, Ranit Aharonov, Noam Slonim

    Abstract: Engaging in a live debate requires, among other things, the ability to effectively rebut arguments claimed by your opponent. In particular, this requires identifying these arguments. Here, we suggest doing so by automatically mining claims from a corpus of news articles containing billions of sentences, and searching for them in a given speech. This raises the question of whether such claims indee… ▽ More

    Submitted 27 July, 2019; originally announced July 2019.

    Comments: 6th Argument Mining Workshop @ ACL 2019

  4. arXiv:1801.07507  [pdf, other

    cs.CL

    What did you Mention? A Large Scale Mention Detection Benchmark for Spoken and Written Text

    Authors: Yosi Mass, Lili Kotlerman, Shachar Mirkin, Elad Venezian, Gera Witzling, Noam Slonim

    Abstract: We describe a large, high-quality benchmark for the evaluation of Mention Detection tools. The benchmark contains annotations of both named entities as well as other types of entities, annotated on different types of text, ranging from clean text taken from Wikipedia, to noisy spoken data. The benchmark was built through a highly controlled crowd sourcing process to ensure its quality. We describe… ▽ More

    Submitted 25 January, 2018; v1 submitted 23 January, 2018; originally announced January 2018.

  5. arXiv:1709.06438  [pdf, other

    cs.CL

    A Recorded Debating Dataset

    Authors: Shachar Mirkin, Michal Jacovi, Tamar Lavee, Hong-Kwang Kuo, Samuel Thomas, Leslie Sager, Lili Kotlerman, Elad Venezian, Noam Slonim

    Abstract: This paper describes an English audio and textual dataset of debating speeches, a unique resource for the growing research field of computational argumentation and debating technologies. We detail the process of speech recording by professional debaters, the transcription of the speeches with an Automatic Speech Recognition (ASR) system, their consequent automatic processing to produce a text that… ▽ More

    Submitted 27 March, 2018; v1 submitted 19 September, 2017; originally announced September 2017.

  6. arXiv:1703.04650  [pdf, other

    cs.CL

    Joint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks

    Authors: Vardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas Raykar, Lili Kotlerman, Guy Lev

    Abstract: The stream of words produced by Automatic Speech Recognition (ASR) systems is typically devoid of punctuations and formatting. Most natural language processing applications expect segmented and well-formatted texts as input, which is not available in ASR output. This paper proposes a novel technique of jointly modeling multiple correlated tasks such as punctuation and capitalization using bidirect… ▽ More

    Submitted 18 July, 2017; v1 submitted 14 March, 2017; originally announced March 2017.

    Comments: Accepted in Interspeech 2017

  7. arXiv:1610.05461  [pdf, other

    cs.CL

    Personalized Machine Translation: Preserving Original Author Traits

    Authors: Ella Rabinovich, Shachar Mirkin, Raj Nath Patel, Lucia Specia, Shuly Wintner

    Abstract: The language that we produce reflects our personality, and various personal and demographic characteristics can be detected in natural language texts. We focus on one particular personal trait of the author, gender, and study how it is manifested in original texts and in translations. We show that author's gender has a powerful, clear signal in originals texts, but this signal is obfuscated in hum… ▽ More

    Submitted 12 January, 2017; v1 submitted 18 October, 2016; originally announced October 2016.

    Comments: EACL 2017, 11 pages