Skip to main content

Showing 1–40 of 40 results for author: Rumshisky, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.02204  [pdf, other

    cs.CL cs.LG

    Emergent Abilities in Reduced-Scale Generative Language Models

    Authors: Sherin Muckatira, Vijeta Deshpande, Vladislav Lialin, Anna Rumshisky

    Abstract: Large language models can solve new tasks without task-specific fine-tuning. This ability, also known as in-context learning (ICL), is considered an emergent ability and is primarily seen in large language models with billions of parameters. This study investigates if such emergent properties are strictly tied to model size or can be demonstrated by smaller models trained on reduced-scale data. To… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 16 pages, 4 figures. Accepted to NAACL 2024 Findings

  2. arXiv:2404.02054  [pdf, other

    cs.CL

    Deconstructing In-Context Learning: Understanding Prompts via Corruption

    Authors: Namrata Shivagunde, Vladislav Lialin, Sherin Muckatira, Anna Rumshisky

    Abstract: The ability of large language models (LLMs) to $``$learn in context$"$ based on the provided prompt has led to an explosive growth in their use, culminating in the proliferation of AI assistants such as ChatGPT, Claude, and Bard. These AI assistants are known to be robust to minor prompt modifications, mostly due to alignment techniques that use human feedback. In contrast, the underlying pre-trai… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to LREC-COLING 2024 main conference. The code is available at https://github.com/text-machine-lab/Understanding_prompts_via_corruption

  3. arXiv:2402.15833  [pdf, other

    cs.CL cs.LG

    Prompt Perturbation Consistency Learning for Robust Language Models

    Authors: Yao Qiang, Subhrangshu Nandi, Ninareh Mehrabi, Greg Ver Steeg, Anoop Kumar, Anna Rumshisky, Aram Galstyan

    Abstract: Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks, such as question answering and text summarization. However, their performance on sequence labeling tasks such as intent classification and slot filling (IC-SF), which is a central component in personal assistant systems, lags significantly behind discriminative models. Furthermor… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  4. arXiv:2311.05821  [pdf, other

    cs.CL

    Let's Reinforce Step by Step

    Authors: Sarah Pan, Vladislav Lialin, Sherin Muckatira, Anna Rumshisky

    Abstract: While recent advances have boosted LM proficiency in linguistic benchmarks, LMs consistently struggle to reason correctly on complex tasks like mathematics. We turn to Reinforcement Learning from Human Feedback (RLHF) as a method with which to shape model reasoning processes. In particular, we explore two reward schemes, outcome-supervised reward models (ORMs) and process-supervised reward models… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following

  5. arXiv:2307.05695  [pdf, other

    cs.CL cs.LG

    ReLoRA: High-Rank Training Through Low-Rank Updates

    Authors: Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, Anna Rumshisky

    Abstract: Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparameterized models remains poorly understood, while training costs grow exponentially. In this paper, we explore parameter-efficient training techniques as an approach to training large neural networks. We introduce a novel method called ReLoRA, whic… ▽ More

    Submitted 10 December, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

  6. arXiv:2306.08756  [pdf, other

    cs.CL cs.AI cs.LG

    Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models

    Authors: Saleh Soltan, Andy Rosenbaum, Tobias Falke, Qin Lu, Anna Rumshisky, Wael Hamza

    Abstract: Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each have advantages, however training both model types from scratch is computationally expensive. We explore recipes to improve pre-training efficiency by initializing one model from the other. (1) Extracting the encoder from a seq2seq model, we show it under-performs a Masked Language Modeling (MLM) encoder, particularly on seque… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: ACL Findings 2023 and SustaiNLP Workshop 2023

  7. arXiv:2305.17266  [pdf, other

    cs.CL

    Honey, I Shrunk the Language: Language Model Behavior at Reduced Scale

    Authors: Vijeta Deshpande, Dan Pechi, Shree Thatte, Vladislav Lialin, Anna Rumshisky

    Abstract: In recent years, language models have drastically grown in size, and the abilities of these models have been shown to improve with scale. The majority of recent scaling laws studies focused on high-compute high-parameter count settings, leaving the question of when these abilities begin to emerge largely unanswered. In this paper, we investigate whether the effects of pre-training can be observed… ▽ More

    Submitted 30 May, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 Findings

  8. Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data

    Authors: Vladislav Lialin, Stephen Rawls, David Chan, Shalini Ghosh, Anna Rumshisky, Wael Hamza

    Abstract: Scaling up weakly-supervised datasets has shown to be highly effective in the image-text domain and has contributed to most of the recent state-of-the-art computer vision and multimodal neural networks. However, existing large-scale video-text datasets and mining techniques suffer from several limitations, such as the scarcity of aligned data, the lack of diversity in the data, and the difficulty… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Journal ref: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)

  9. arXiv:2303.16445  [pdf, other

    cs.CL

    Larger Probes Tell a Different Story: Extending Psycholinguistic Datasets Via In-Context Learning

    Authors: Namrata Shivagunde, Vladislav Lialin, Anna Rumshisky

    Abstract: Language model probing is often used to test specific capabilities of models. However, conclusions from such studies may be limited when the probing benchmarks are small and lack statistical power. In this work, we introduce new, larger datasets for negation (NEG-1500-SIMP) and role reversal (ROLE-1500) inspired by psycholinguistic studies. We dramatically extend existing NEG-136 and ROLE-88 bench… ▽ More

    Submitted 14 November, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: 14 pages, 6 figures. Published as a conference paper at EMNLP 2023 (short). The datasets and code are available on this $\href{https://github.com/text-machine-lab/extending_psycholinguistic_dataset}{URL}$

  10. arXiv:2303.15647  [pdf, other

    cs.CL

    Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning

    Authors: Vladislav Lialin, Vijeta Deshpande, Anna Rumshisky

    Abstract: This paper presents a systematic overview and comparison of parameter-efficient fine-tuning methods covering over 40 papers published between February 2019 and February 2023. These methods aim to resolve the infeasibility and impracticality of fine-tuning large language models by only training a small set of parameters. We provide a taxonomy that covers a broad range of methods and present a detai… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

  11. arXiv:2211.08466  [pdf, other

    cs.CL

    Reasoning Circuits: Few-shot Multihop Question Generation with Structured Rationales

    Authors: Saurabh Kulshreshtha, Anna Rumshisky

    Abstract: Multi-hop Question Generation is the task of generating questions which require the reader to reason over and combine information spread across multiple passages using several reasoning steps. Chain-of-thought rationale generation has been shown to improve performance on multi-step reasoning tasks and make model predictions more interpretable. However, few-shot performance gains from including rat… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  12. arXiv:2210.04073  [pdf, other

    cs.CL

    On Task-Adaptive Pretraining for Dialogue Response Selection

    Authors: Tzu-Hsiang Lin, Ta-Chung Chi, Anna Rumshisky

    Abstract: Recent advancements in dialogue response selection (DRS) are based on the \textit{task-adaptive pre-training (TAP)} approach, by first initializing their model with BERT~\cite{devlin-etal-2019-bert}, and adapt to dialogue data with dialogue-specific or fine-grained pre-training tasks. However, it is uncertain whether BERT is the best initialization choice, or whether the proposed dialogue-specific… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

    Comments: 6 pages, 4 figures

  13. arXiv:2208.01448  [pdf, other

    cs.CL cs.LG

    AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

    Authors: Saleh Soltan, Shankar Ananthakrishnan, Jack FitzGerald, Rahul Gupta, Wael Hamza, Haidar Khan, Charith Peris, Stephen Rawls, Andy Rosenbaum, Anna Rumshisky, Chandana Satya Prakash, Mukund Sridhar, Fabian Triefenbach, Apurv Verma, Gokhan Tur, Prem Natarajan

    Abstract: In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks. In particular, we train a 20 billion parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B) and show that it achieves s… ▽ More

    Submitted 3 August, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

  14. arXiv:2206.02696  [pdf, other

    cs.CL

    Learning to Ask Like a Physician

    Authors: Eric Lehman, Vladislav Lialin, Katelyn Y. Legaspi, Anne Janelle R. Sy, Patricia Therese S. Pile, Nicole Rose I. Alberto, Richard Raymund R. Ragasa, Corinna Victoria M. Puyat, Isabelle Rose I. Alberto, Pia Gabrielle I. Alfonso, Marianne Taliño, Dana Moukheiber, Byron C. Wallace, Anna Rumshisky, Jenifer J. Liang, Preethi Raghavan, Leo Anthony Celi, Peter Szolovits

    Abstract: Existing question answering (QA) datasets derived from electronic health records (EHR) are artificially generated and consequently fail to capture realistic physician information needs. We present Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ questions paired with the snippets of text (triggers) that prompted each question. The questions are gene… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  15. Life after BERT: What do Other Muppets Understand about Language?

    Authors: Vladislav Lialin, Kevin Zhao, Namrata Shivagunde, Anna Rumshisky

    Abstract: Existing pre-trained transformer analysis works usually focus only on one or two model families at a time, overlooking the variability of the architecture and pre-training objectives. In our work, we utilize the oLMpics benchmark and psycholinguistic probing datasets for a diverse set of 29 models including T5, BART, and ALBERT. Additionally, we adapt the oLMpics zero-shot setup for autoregressive… ▽ More

    Submitted 29 September, 2022; v1 submitted 21 May, 2022; originally announced May 2022.

    Journal ref: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers, pages 3180-3193, 2022

  16. arXiv:2205.10442  [pdf, other

    cs.CL cs.AI

    Down and Across: Introducing Crossword-Solving as a New NLP Benchmark

    Authors: Saurabh Kulshreshtha, Olga Kovaleva, Namrata Shivagunde, Anna Rumshisky

    Abstract: Solving crossword puzzles requires diverse reasoning capabilities, access to a vast amount of knowledge about language and the world, and the ability to satisfy the constraints imposed by the structure of the puzzle. In this work, we introduce solving crossword puzzles as a new natural language understanding task. We release the specification of a corpus of crossword puzzles collected from the New… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: Accepted as long paper at ACL 2022

  17. arXiv:2205.03092  [pdf, other

    cs.LG cs.CL cs.HC

    Federated Learning with Noisy User Feedback

    Authors: Rahul Sharma, Anil Ramakrishna, Ansel MacLaughlin, Anna Rumshisky, Jimit Majmudar, Clement Chung, Salman Avestimehr, Rahul Gupta

    Abstract: Machine Learning (ML) systems are getting increasingly popular, and drive more and more applications and services in our daily life. This has led to growing concerns over user privacy, since human interaction data typically needs to be transmitted to the cloud in order to train and improve such systems. Federated learning (FL) has recently emerged as a method for training ML models on edge devices… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: Accepted to appear in NAACL 2022

  18. arXiv:2107.14586  [pdf, ps, other

    cs.CL cs.CR cs.LG

    An Efficient DP-SGD Mechanism for Large Scale NLP Models

    Authors: Christophe Dupuy, Radhika Arava, Rahul Gupta, Anna Rumshisky

    Abstract: Recent advances in deep learning have drastically improved performance on many Natural Language Understanding (NLU) tasks. However, the data used to train NLU models may contain private information such as addresses or phone numbers, particularly when drawn from human subjects. It is desirable that underlying models do not expose private information contained in the training data. Differentially P… ▽ More

    Submitted 2 March, 2022; v1 submitted 14 July, 2021; originally announced July 2021.

  19. arXiv:2107.10342  [pdf, other

    cs.CL cs.NE

    Multi-Stream Transformers

    Authors: Mikhail Burtsev, Anna Rumshisky

    Abstract: Transformer-based encoder-decoder models produce a fused token-wise representation after every encoder layer. We investigate the effects of allowing the encoder to preserve and explore alternative hypotheses, combined at the end of the encoding process. To that end, we design and examine a $\textit{Multi-stream Transformer}$ architecture and find that splitting the Transformer encoder into multipl… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

  20. arXiv:2105.06990  [pdf, other

    cs.CL

    BERT Busters: Outlier Dimensions that Disrupt Transformers

    Authors: Olga Kovaleva, Saurabh Kulshreshtha, Anna Rogers, Anna Rumshisky

    Abstract: Multiple studies have shown that Transformers are remarkably robust to pruning. Contrary to this received wisdom, we demonstrate that pre-trained Transformer encoders are surprisingly fragile to the removal of a very small number of features in the layer outputs (<0.0001% of model weights). In case of BERT and other pre-trained encoder Transformers, the affected component is the scaling factors an… ▽ More

    Submitted 2 June, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: Accepted as long paper at Findings of ACL 2021

  21. arXiv:2010.07865  [pdf, other

    cs.CL cs.LG

    Update Frequently, Update Fast: Retraining Semantic Parsing Systems in a Fraction of Time

    Authors: Vladislav Lialin, Rahul Goel, Andrey Simanovsky, Anna Rumshisky, Rushin Shah

    Abstract: Currently used semantic parsing systems deployed in voice assistants can require weeks to train. Datasets for these models often receive small and frequent updates, data patches. Each patch requires training a new model. To reduce training time, one can fine-tune the previously trained model on each patch, but naive fine-tuning exhibits catastrophic forgetting - degradation of the model performanc… ▽ More

    Submitted 22 March, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

  22. arXiv:2005.00561  [pdf, other

    cs.CL cs.LG

    When BERT Plays the Lottery, All Tickets Are Winning

    Authors: Sai Prasanna, Anna Rogers, Anna Rumshisky

    Abstract: Large Transformer-based models were shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis, using both structured and magnitude pruning. For fine-tuned BERT, we show that (a) it is possible to find subnetworks achieving performance that is comparable with that of the full model, and (b) similar… ▽ More

    Submitted 24 October, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: EMNLP 2020 camera-ready

  23. arXiv:2002.12327  [pdf, other

    cs.CL

    A Primer in BERTology: What we know about how BERT works

    Authors: Anna Rogers, Olga Kovaleva, Anna Rumshisky

    Abstract: Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives a… ▽ More

    Submitted 9 November, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

    Comments: Accepted to TACL. Please note that the multilingual BERT section is only available in version 1

  24. arXiv:1910.10488  [pdf, other

    cs.LG cs.CL stat.ML

    Injecting Hierarchy with U-Net Transformers

    Authors: David Donahue, Vladislav Lialin, Anna Rumshisky

    Abstract: The Transformer architecture has become increasingly popular over the past two years, owing to its impressive performance on a number of natural language processing (NLP) tasks. However, all Transformer computations occur at the level of word representations and therefore, it may be argued that Transformer models do not explicitly attempt to learn hierarchical structure which is widely assumed to… ▽ More

    Submitted 1 April, 2021; v1 submitted 16 October, 2019; originally announced October 2019.

    Comments: 10 pages

  25. arXiv:1910.10487  [pdf, other

    cs.CL cs.LG stat.ML

    Memory-Augmented Recurrent Networks for Dialogue Coherence

    Authors: David Donahue, Yuanliang Meng, Anna Rumshisky

    Abstract: Recent dialogue approaches operate by reading each word in a conversation history, and aggregating accrued dialogue information into a single state. This fixed-size vector is not expandable and must maintain a consistent format over time. Other recent approaches exploit an attention mechanism to extract useful information from past conversational utterances, but this introduces an increased comput… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    Comments: Honors project, 12 pages

  26. arXiv:1908.11443  [pdf, other

    cs.CL

    NarrativeTime: Dense Temporal Annotation on a Timeline

    Authors: Anna Rogers, Marzena Karpinska, Ankita Gupta, Vladislav Lialin, Gregory Smelkov, Anna Rumshisky

    Abstract: For the past decade, temporal annotation has been sparse: only a small portion of event pairs in a text was annotated. We present NarrativeTime, the first timeline-based annotation framework that achieves full coverage of all possible TLinks. To compare with the previous SOTA in dense temporal annotation, we perform full re-annotation of TimeBankDense corpus, which shows comparable agreement with… ▽ More

    Submitted 22 December, 2022; v1 submitted 29 August, 2019; originally announced August 2019.

  27. arXiv:1908.10924  [pdf, other

    cs.LG cs.CL

    Solving Math Word Problems with Double-Decoder Transformer

    Authors: Yuanliang Meng, Anna Rumshisky

    Abstract: This paper proposes a Transformer-based model to generate equations for math word problems. It achieves much better results than RNN models when copy and align mechanisms are not used, and can outperform complex copy and align RNN models. We also show that training a Transformer jointly in a generation task with two decoders, left-to-right and right-to-left, is beneficial. Such a Transformer perfo… ▽ More

    Submitted 28 August, 2019; originally announced August 2019.

  28. arXiv:1908.08593  [pdf, other

    cs.CL cs.LG stat.ML

    Revealing the Dark Secrets of BERT

    Authors: Olga Kovaleva, Alexey Romanov, Anna Rogers, Anna Rumshisky

    Abstract: BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT. Using a subset of GLUE tasks and a set of handcrafted features-of-interest, we propose the methodol… ▽ More

    Submitted 11 September, 2019; v1 submitted 21 August, 2019; originally announced August 2019.

    Comments: Accepted to EMNLP 2019

  29. arXiv:1904.05233  [pdf, other

    cs.LG cs.CL stat.ML

    What's in a Name? Reducing Bias in Bios without Access to Protected Attributes

    Authors: Alexey Romanov, Maria De-Arteaga, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Anna Rumshisky, Adam Tauman Kalai

    Abstract: There is a growing body of work that proposes methods for mitigating bias in machine learning systems. These methods typically rely on access to protected attributes such as race, gender, or age. However, this raises two significant challenges: (1) protected attributes may not be available or it may not be legal to use them, and (2) it is often desirable to simultaneously consider multiple protect… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: Accepted at NAACL 2019; Best Thematic Paper

  30. arXiv:1810.06640  [pdf, other

    cs.CL cs.LG stat.ML

    Adversarial Text Generation Without Reinforcement Learning

    Authors: David Donahue, Anna Rumshisky

    Abstract: Generative Adversarial Networks (GANs) have experienced a recent surge in popularity, performing competitively in a variety of tasks, especially in computer vision. However, GAN training has shown limited success in natural language processing. This is largely because sequences of text are discrete, and thus gradients cannot propagate from the discriminator to the generator. Recent solutions use r… ▽ More

    Submitted 1 January, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

    Comments: Four pages without references. ACL latex style. Four figures

  31. arXiv:1809.06491  [pdf, other

    cs.IR cs.CL

    Triad-based Neural Network for Coreference Resolution

    Authors: Yuanliang Meng, Anna Rumshisky

    Abstract: We propose a triad-based neural network system that generates affinity scores between entity mentions for coreference resolution. The system simultaneously accepts three mentions as input, taking mutual dependency and logical constraints of all three mentions into account, and thus makes more accurate predictions than the traditional pairwise approach. Depending on system choices, the affinity sco… ▽ More

    Submitted 17 September, 2018; originally announced September 2018.

    Journal ref: Proceedings of 27th International Conference on Computational Linguistics (2018) 35-43

  32. arXiv:1808.09042  [pdf, other

    cs.CL

    Adversarial Decomposition of Text Representation

    Authors: Alexey Romanov, Anna Rumshisky, Anna Rogers, David Donahue

    Abstract: In this paper, we present a method for adversarial decomposition of text representation. This method can be used to decompose a representation of an input sentence into several independent vectors, each of them responsible for a specific aspect of the input sentence. We evaluate the proposed method on two case studies: the conversion between different social registers and diachronic language chang… ▽ More

    Submitted 10 April, 2019; v1 submitted 27 August, 2018; originally announced August 2018.

    Comments: Accepted at NAACL 2019

  33. arXiv:1803.02245  [pdf, other

    cs.CL

    CliNER 2.0: Accessible and Accurate Clinical Concept Extraction

    Authors: Willie Boag, Elena Sergeeva, Saurabh Kulshreshtha, Peter Szolovits, Anna Rumshisky, Tristan Naumann

    Abstract: Clinical notes often describe important aspects of a patient's stay and are therefore critical to medical research. Clinical concept extraction (CCE) of named entities - such as problems, tests, and treatments - aids in forming an understanding of notes and provides a foundation for many downstream clinical decision-making tasks. Historically, this task has been posed as a standard named entity re… ▽ More

    Submitted 6 March, 2018; originally announced March 2018.

  34. arXiv:1705.00574  [pdf, other

    cs.LG cs.NE

    Forced to Learn: Discovering Disentangled Representations Without Exhaustive Labels

    Authors: Alexey Romanov, Anna Rumshisky

    Abstract: Learning a better representation with neural networks is a challenging problem, which was tackled extensively from different prospectives in the past few years. In this work, we focus on learning a representation that could be used for a clustering task and introduce two novel loss components that substantially improve the quality of produced clusters, are simple to apply to an arbitrary model and… ▽ More

    Submitted 1 May, 2017; originally announced May 2017.

    Comments: Abstract accepted at ICLR 2017 Workshop: https://openreview.net/pdf?id=SkCmfeSFg

  35. arXiv:1703.05851  [pdf, other

    cs.IR cs.CL

    Temporal Information Extraction for Question Answering Using Syntactic Dependencies in an LSTM-based Architecture

    Authors: Yuanliang Meng, Anna Rumshisky, Alexey Romanov

    Abstract: In this paper, we propose to use a set of simple, uniform in architecture LSTM-based models to recover different kinds of temporal relations from text. Using the shortest dependency path between entities as input, the same architecture is used to extract intra-sentence, cross-sentence, and document creation time relations. A "double-checking" technique reverses entity pairs in classification, boos… ▽ More

    Submitted 5 October, 2017; v1 submitted 16 March, 2017; originally announced March 2017.

    Comments: EMNLP 2017

  36. arXiv:1612.08994  [pdf, other

    cs.CL

    Here's My Point: Joint Pointer Architecture for Argument Mining

    Authors: Peter Potash, Alexey Romanov, Anna Rumshisky

    Abstract: One of the major goals in automated argumentation mining is to uncover the argument structure present in argumentative text. In order to determine this structure, one must understand how different individual components of the overall argument are linked. General consensus in this field dictates that the argument components form a hierarchy of persuasion, which manifests itself in a tree structure.… ▽ More

    Submitted 8 May, 2017; v1 submitted 28 December, 2016; originally announced December 2016.

    Comments: 10 pages; under review for ICLR

  37. arXiv:1612.03216  [pdf, other

    cs.CL

    #HashtagWars: Learning a Sense of Humor

    Authors: Peter Potash, Alexey Romanov, Anna Rumshisky

    Abstract: In this work, we present a new dataset for computational humor, specifically comparative humor ranking, which attempts to eschew the ubiquitous binary approach to humor detection. The dataset consists of tweets that are humorous responses to a given hashtag. We describe the motivation for this new dataset, as well as the collection process, which includes a description of our semi-automated system… ▽ More

    Submitted 15 April, 2017; v1 submitted 9 December, 2016; originally announced December 2016.

    Comments: 10 Pages

  38. arXiv:1612.03205  [pdf, other

    cs.CL

    Evaluating Creative Language Generation: The Case of Rap Lyric Ghostwriting

    Authors: Peter Potash, Alexey Romanov, Anna Rumshisky

    Abstract: Language generation tasks that seek to mimic human ability to use language creatively are difficult to evaluate, since one must consider creativity, style, and other non-trivial aspects of the generated text. The goal of this paper is to develop evaluation methods for one such task, ghostwriting of rap lyrics, and to provide an explicit, quantifiable foundation for the goals and future directions… ▽ More

    Submitted 9 December, 2016; originally announced December 2016.

    Comments: 10 pages

  39. arXiv:1510.04972  [pdf

    cs.CL cs.AI cs.IR

    Normalization of Relative and Incomplete Temporal Expressions in Clinical Narratives

    Authors: Weiyi Sun, Anna Rumshisky, Ozlem Uzuner

    Abstract: We analyze the RI-TIMEXes in temporally annotated corpora and propose two hypotheses regarding the normalization of RI-TIMEXes in the clinical narrative domain: the anchor point hypothesis and the anchor relation hypothesis. We annotate the RI-TIMEXes in three corpora to study the characteristics of RI-TMEXes in different domains. This informed the design of our RI-TIMEX normalization system for t… ▽ More

    Submitted 16 October, 2015; originally announced October 2015.

    Comments: Draft version

    Journal ref: Journal of the American Medical Informatics Association (2015)

  40. arXiv:cs/0209003  [pdf

    cs.CL

    Rerendering Semantic Ontologies: Automatic Extensions to UMLS through Corpus Analytics

    Authors: J. Pustejovsky, A. Rumshisky, J. Castano

    Abstract: In this paper, we discuss the utility and deficiencies of existing ontology resources for a number of language processing applications. We describe a technique for increasing the semantic type coverage of a specific ontology, the National Library of Medicine's UMLS, with the use of robust finite state methods used in conjunction with large-scale corpus analytics of the domain corpus. We call thi… ▽ More

    Submitted 3 September, 2002; originally announced September 2002.

    Comments: 8 pages

    ACM Class: I.2.7; J.3

    Journal ref: LREC 2002 Workshop on Ontologies and Lexical Knowledge Bases