Skip to main content

Showing 1–8 of 8 results for author: Ciosici, M R

.
  1. arXiv:2311.01468  [pdf, other

    cs.CL cs.LG

    Remember what you did so you know what to do next

    Authors: Manuel R. Ciosici, Alex Hedges, Yash Kankanampati, Justin Martin, Marjorie Freedman, Ralph Weischedel

    Abstract: We explore using a moderately sized large language model (GPT-J 6B parameters) to create a plan for a simulated robot to achieve 30 classes of goals in ScienceWorld, a text game simulator for elementary science experiments. Previously published empirical work claimed that large language models (LLMs) are a poor fit (Wang et al., 2022) compared to reinforcement learning. Using the Markov assumption… ▽ More

    Submitted 30 October, 2023; originally announced November 2023.

    Comments: Identical to EMNLP 2023 Findings

  2. arXiv:2209.00099  [pdf, other

    cs.CL

    Efficient Methods for Natural Language Processing: A Survey

    Authors: Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, André F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz

    Abstract: Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require few… ▽ More

    Submitted 24 March, 2023; v1 submitted 31 August, 2022; originally announced September 2022.

    Comments: Accepted at TACL, pre publication version

  3. arXiv:2208.12097  [pdf, other

    cs.CL

    Training a T5 Using Lab-sized Resources

    Authors: Manuel R. Ciosici, Leon Derczynski

    Abstract: Training large neural language models on large datasets is resource- and time-intensive. These requirements create a barrier to entry, where those with fewer resources cannot build competitive models. This paper presents various techniques for making it possible to (a) train a large language model using resources that a modest research lab might have, and (b) train it in a reasonable amount of tim… ▽ More

    Submitted 25 August, 2022; originally announced August 2022.

  4. arXiv:2110.01552  [pdf, other

    cs.CL cs.AI cs.LG

    Perhaps PTLMs Should Go to School -- A Task to Assess Open Book and Closed Book QA

    Authors: Manuel R. Ciosici, Joe Cecil, Alex Hedges, Dong-Ho Lee, Marjorie Freedman, Ralph Weischedel

    Abstract: Our goal is to deliver a new task and leaderboard to stimulate research on question answering and pre-trained language models (PTLMs) to understand a significant instructional document, e.g., an introductory college textbook or a manual. PTLMs have shown great success in many question-answering tasks, given significant supervised training, but much less so in zero-shot settings. We propose a new t… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

    Comments: Identical to the EMNLP 2021 version

  5. arXiv:2102.06282  [pdf, other

    cs.CL cs.LG

    A reproduction of Apple's bi-directional LSTM models for language identification in short strings

    Authors: Mads Toftrup, Søren Asger Sørensen, Manuel R. Ciosici, Ira Assent

    Abstract: Language Identification is the task of identifying a document's language. For applications like automatic spell checker selection, language identification must use very short strings such as text message fragments. In this work, we reproduce a language identification architecture that Apple briefly sketched in a blog post. We confirm the bi-LSTM model's performance and find that it outperforms cur… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

    Comments: Will be presented at EACL 2021 SRW

  6. arXiv:2101.05400  [pdf, other

    cs.CL cs.AI cs.LG

    Machine-Assisted Script Curation

    Authors: Manuel R. Ciosici, Joseph Cummings, Mitchell DeHaven, Alex Hedges, Yash Kankanampati, Dong-Ho Lee, Ralph Weischedel, Marjorie Freedman

    Abstract: We describe Machine-Aided Script Curator (MASC), a system for human-machine collaborative script authoring. Scripts produced with MASC include (1) English descriptions of sub-events that comprise a larger, complex event; (2) event types for each of those events; (3) a record of entities expected to participate in multiple sub-events; and (4) temporal sequencing between the sub-events. MASC automat… ▽ More

    Submitted 4 May, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Comments: Identical to the NAACL 2021 Demo version

  7. arXiv:2005.03521  [pdf, other

    cs.CL

    The Danish Gigaword Project

    Authors: Leon Strømberg-Derczynski, Manuel R. Ciosici, Rebekah Baglini, Morten H. Christiansen, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen, Claus Ladefoged, Finn Årup Nielsen, Malte Lau Petersen, Jonathan Hvithamar Rystrøm, Daniel Varab

    Abstract: Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers' socio-economic status, and Danish dialect… ▽ More

    Submitted 12 May, 2021; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: Identical to the NoDaLiDa 2021 version

  8. arXiv:1608.01238  [pdf, other

    cs.CL cs.LG

    Improving Quality of Hierarchical Clustering for Large Data Series

    Authors: Manuel R. Ciosici

    Abstract: Brown clustering is a hard, hierarchical, bottom-up clustering of words in a vocabulary. Words are assigned to clusters based on their usage pattern in a given corpus. The resulting clusters and hierarchical structure can be used in constructing class-based language models and for generating features to be used in NLP tasks. Because of its high computational cost, the most-used version of Brown cl… ▽ More

    Submitted 3 August, 2016; originally announced August 2016.