-
Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations
Authors:
Ritam Dutt,
Zhen Wu,
Kelly Shi,
Divyanshu Sheth,
Prakhar Gupta,
Carolyn Penstein Rose
Abstract:
We present a generalizable classification approach that leverages Large Language Models (LLMs) to facilitate the detection of implicitly encoded social meaning in conversations. We design a multi-faceted prompt to extract a textual explanation of the reasoning that connects visible cues to underlying social meanings. These extracted explanations or rationales serve as augmentations to the conversa…
▽ More
We present a generalizable classification approach that leverages Large Language Models (LLMs) to facilitate the detection of implicitly encoded social meaning in conversations. We design a multi-faceted prompt to extract a textual explanation of the reasoning that connects visible cues to underlying social meanings. These extracted explanations or rationales serve as augmentations to the conversational text to facilitate dialogue understanding and transfer. Our empirical results over 2,340 experimental settings demonstrate the significant positive impact of adding these rationales. Our findings hold true for in-domain classification, zero-shot, and few-shot domain transfer for two different social meaning detection tasks, each spanning two different corpora.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Generating Situated Reflection Triggers about Alternative Solution Paths: A Case Study of Generative AI for Computer-Supported Collaborative Learning
Authors:
Atharva Naik,
Jessica Ruhan Yin,
Anusha Kamath,
Qianou Ma,
Sherry Tongshuang Wu,
Charles Murray,
Christopher Bogart,
Majd Sakr,
Carolyn P. Rose
Abstract:
An advantage of Large Language Models (LLMs) is their contextualization capability - providing different responses based on student inputs like solution strategy or prior discussion, to potentially better engage students than standard feedback. We present a design and evaluation of a proof-of-concept LLM application to offer students dynamic and contextualized feedback. Specifically, we augment an…
▽ More
An advantage of Large Language Models (LLMs) is their contextualization capability - providing different responses based on student inputs like solution strategy or prior discussion, to potentially better engage students than standard feedback. We present a design and evaluation of a proof-of-concept LLM application to offer students dynamic and contextualized feedback. Specifically, we augment an Online Programming Exercise bot for a college-level Cloud Computing course with ChatGPT, which offers students contextualized reflection triggers during a collaborative query optimization task in database design. We demonstrate that LLMs can be used to generate highly situated reflection triggers that incorporate details of the collaborative discussion happening in context. We discuss in depth the exploration of the design space of the triggers and their correspondence with the learning objectives as well as the impact on student learning in a pilot study with 34 students.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Authors:
Yiqing Xie,
Alex Xie,
Divyanshu Sheth,
Pengfei Liu,
Daniel Fried,
Carolyn Rose
Abstract:
To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the…
▽ More
To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples involving 293 libraries revised from code in 367 GitHub repositories taken from the CodeSearchNet dataset. To demonstrate the complexity and solvability of examples in Exec-CSN, we present a human study demonstrating that 81.3% of the examples can be solved by humans and 61% are rated as "requires effort to solve". We conduct code generation experiments on open-source and proprietary models and analyze the performance of both humans and models. We provide the code at https://github.com/Veronicium/CodeBenchGen.
△ Less
Submitted 7 May, 2024; v1 submitted 31 March, 2024;
originally announced April 2024.
-
Rectilinear Crossing Number of Graphs Excluding Single-Crossing Graphs as Minors
Authors:
Vida Dujmović,
Camille La Rose
Abstract:
The crossing number of a graph $G$ is the minimum number of crossings in a drawing of $G$ in the plane. A rectilinear drawing of a graph $G$ represents vertices of $G$ by a set of points in the plane and represents each edge of $G$ by a straight-line segment connecting its two endpoints. The rectilinear crossing number of $G$ is the minimum number of crossings in a rectilinear drawing of $G$.
By…
▽ More
The crossing number of a graph $G$ is the minimum number of crossings in a drawing of $G$ in the plane. A rectilinear drawing of a graph $G$ represents vertices of $G$ by a set of points in the plane and represents each edge of $G$ by a straight-line segment connecting its two endpoints. The rectilinear crossing number of $G$ is the minimum number of crossings in a rectilinear drawing of $G$.
By the crossing lemma, the crossing number of an $n$-vertex graph $G$ can be $O(n)$ only if $|E(G)|\in O(n)$. Graphs of bounded genus and bounded degree (Böröczky, Pach and Tóth, 2006) and in fact all bounded degree proper minor-closed families (Wood and Telle, 2007) have been shown to admit linear crossing number, with tight $Θ(Δn)$ bound shown by Dujmović, Kawarabayashi, Mohar and Wood, 2008.
Much less is known about rectilinear crossing number. It is not bounded by any function of the crossing number. We prove that graphs that exclude a single-crossing graph as a minor have the rectilinear crossing number $O(Δn)$. This dependence on $n$ and $Δ$ is best possible. A single-crossing graph is a graph whose crossing number is at most one. Thus the result applies to $K_5$-minor-free graphs, for example. It also applies to bounded treewidth graphs, since each family of bounded treewidth graphs excludes some fixed planar graph as a minor. Prior to our work, the only bounded degree minor-closed families known to have linear rectilinear crossing number were bounded degree graphs of bounded treewidth (Wood and Telle, 2007), as well as, bounded degree $K_{3,3}$-minor-free graphs (Dujmović, Kawarabayashi, Mohar and Wood, 2008). In the case of bounded treewidth graphs, our $O(Δn)$ result is again tight and improves on the previous best known bound of $O(Δ^2 n)$ by Wood and Telle, 2007 (obtained for convex geometric drawings).
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation
Authors:
Yiqing Xie,
Sheng Zhang,
Hao Cheng,
Pengfei Liu,
Zelalem Gero,
Cliff Wong,
Tristan Naumann,
Hoifung Poon,
Carolyn Rose
Abstract:
Medical text generation aims to assist with administrative work and highlight salient information to support decision-making. To reflect the specific requirements of medical text, in this paper, we propose a set of metrics to evaluate the completeness, conciseness, and attribution of the generated text at a fine-grained level. The metrics can be computed by various types of evaluators including in…
▽ More
Medical text generation aims to assist with administrative work and highlight salient information to support decision-making. To reflect the specific requirements of medical text, in this paper, we propose a set of metrics to evaluate the completeness, conciseness, and attribution of the generated text at a fine-grained level. The metrics can be computed by various types of evaluators including instruction-following (both proprietary and open-source) and supervised entailment models. We demonstrate the effectiveness of the resulting framework, DocLens, with three evaluators on three tasks: clinical note generation, radiology report summarization, and patient question summarization. A comprehensive human study shows that DocLens exhibits substantially higher agreement with the judgments of medical experts than existing metrics. The results also highlight the need to improve open-source evaluators and suggest potential directions.
△ Less
Submitted 18 February, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Data Augmentation for Code Translation with Comparable Corpora and Multiple References
Authors:
Yiqing Xie,
Atharva Naik,
Daniel Fried,
Carolyn Rose
Abstract:
One major challenge of translating code between programming languages is that parallel training data is often limited. To overcome this challenge, we present two data augmentation techniques, one that builds comparable corpora (i.e., code pairs with similar functionality), and another that augments existing parallel data with multiple reference translations. Specifically, we build and analyze mult…
▽ More
One major challenge of translating code between programming languages is that parallel training data is often limited. To overcome this challenge, we present two data augmentation techniques, one that builds comparable corpora (i.e., code pairs with similar functionality), and another that augments existing parallel data with multiple reference translations. Specifically, we build and analyze multiple types of comparable corpora, including programs generated from natural language documentation using a code generation model. Furthermore, to reduce overfitting to a single reference translation, we automatically generate additional translation references for available parallel data and filter the translations by unit tests, which increases variation in target translations. Experiments show that our data augmentation techniques significantly improve CodeT5 for translation between Java, Python, and C++ by an average of 7.5% Computational Accuracy (CA@1), which verifies the correctness of translations by execution. The code is available at https://github.com/Veronicium/CMTrans.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Linguistic representations for fewer-shot relation extraction across domains
Authors:
Sireesh Gururaja,
Ritam Dutt,
Tinglong Liao,
Carolyn Rose
Abstract:
Recent work has demonstrated the positive impact of incorporating linguistic representations as additional context and scaffolding on the in-domain performance of several NLP tasks. We extend this work by exploring the impact of linguistic representations on cross-domain performance in a few-shot transfer setting. An important question is whether linguistic representations enhance generalizability…
▽ More
Recent work has demonstrated the positive impact of incorporating linguistic representations as additional context and scaffolding on the in-domain performance of several NLP tasks. We extend this work by exploring the impact of linguistic representations on cross-domain performance in a few-shot transfer setting. An important question is whether linguistic representations enhance generalizability by providing features that function as cross-domain pivots. We focus on the task of relation extraction on three datasets of procedural text in two domains, cooking and materials science. Our approach augments a popular transformer-based architecture by alternately incorporating syntactic and semantic graphs constructed by freely available off-the-shelf tools. We examine their utility for enhancing generalization, and investigate whether earlier findings, e.g. that semantic representations can be more helpful than syntactic ones, extend to relation extraction in multiple domains. We find that while the inclusion of these graphs results in significantly higher performance in few-shot transfer, both types of graph exhibit roughly equivalent utility.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Making Sense of Machine Learning: Integrating Youth's Conceptual, Creative, and Critical Understandings of AI
Authors:
Luis Morales-Navarro,
Yasmin B. Kafai,
Francisco Castro,
William Payne,
Kayla DesPortes,
Daniella DiPaola,
Randi Williams,
Safinah Ali,
Cynthia Breazeal,
Clifford Lee,
Elisabeth Soep,
Duri Long,
Brian Magerko,
Jaemarie Solyst,
Amy Ogan,
Cansu Tatar,
Shiyan Jiang,
Jie Chao,
Carolyn P. Rosé,
Sepehr Vakil
Abstract:
Understanding how youth make sense of machine learning and how learning about machine learning can be supported in and out of school is more relevant than ever before as young people interact with machine learning powered applications everyday; while connecting with friends, listening to music, playing games, or attending school. In this symposium, we present different perspectives on understandin…
▽ More
Understanding how youth make sense of machine learning and how learning about machine learning can be supported in and out of school is more relevant than ever before as young people interact with machine learning powered applications everyday; while connecting with friends, listening to music, playing games, or attending school. In this symposium, we present different perspectives on understanding how learners make sense of machine learning in their everyday lives, how sensemaking of machine learning can be supported in and out of school through the construction of applications, and how youth critically evaluate machine learning powered systems. We discuss how sensemaking of machine learning applications involves the development and integration of conceptual, creative, and critical understandings that are increasingly important to prepare youth to participate in the world.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Combining Reinforcement Learning and Tensor Networks, with an Application to Dynamical Large Deviations
Authors:
Edward Gillman,
Dominic C. Rose,
Juan P. Garrahan
Abstract:
We present a framework to integrate tensor network (TN) methods with reinforcement learning (RL) for solving dynamical optimisation tasks. We consider the RL actor-critic method, a model-free approach for solving RL problems, and introduce TNs as the approximators for its policy and value functions. Our "actor-critic with tensor networks" (ACTeN) method is especially well suited to problems with l…
▽ More
We present a framework to integrate tensor network (TN) methods with reinforcement learning (RL) for solving dynamical optimisation tasks. We consider the RL actor-critic method, a model-free approach for solving RL problems, and introduce TNs as the approximators for its policy and value functions. Our "actor-critic with tensor networks" (ACTeN) method is especially well suited to problems with large and factorisable state and action spaces. As an illustration of the applicability of ACTeN we solve the exponentially hard task of sampling rare trajectories in two paradigmatic stochastic models, the East model of glasses and the asymmetric simple exclusion process (ASEP), the latter being particularly challenging to other methods due to the absence of detailed balance. With substantial potential for further integration with the vast array of existing RL methods, the approach introduced here is promising both for applications in physics and to multi-agent RL problems more generally.
△ Less
Submitted 5 April, 2024; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Training neural network ensembles via trajectory sampling
Authors:
Jamie F. Mair,
Dominic C. Rose,
Juan P. Garrahan
Abstract:
In machine learning, there is renewed interest in neural network ensembles (NNEs), whereby predictions are obtained as an aggregate from a diverse set of smaller models, rather than from a single larger model. Here, we show how to define and train a NNE using techniques from the study of rare trajectories in stochastic systems. We define an NNE in terms of the trajectory of the model parameters un…
▽ More
In machine learning, there is renewed interest in neural network ensembles (NNEs), whereby predictions are obtained as an aggregate from a diverse set of smaller models, rather than from a single larger model. Here, we show how to define and train a NNE using techniques from the study of rare trajectories in stochastic systems. We define an NNE in terms of the trajectory of the model parameters under a simple, and discrete in time, diffusive dynamics, and train the NNE by biasing these trajectories towards a small time-integrated loss, as controlled by appropriate counting fields which act as hyperparameters. We demonstrate the viability of this technique on a range of simple supervised learning tasks. We discuss potential advantages of our trajectory sampling approach compared with more conventional gradient based methods.
△ Less
Submitted 10 May, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Multi-Scale Contrastive Knowledge Co-Distillation for Event Temporal Relation Extraction
Authors:
Hao-Ren Yao,
Luke Breitfeller,
Aakanksha Naik,
Chunxiao Zhou,
Carolyn Rose
Abstract:
Event Temporal Relation Extraction (ETRE) is a crucial yet challenging problem. Event pairs are situated within a discourse at different distances, which we refer to as proximity bands. The temporal ordering communicated about event pairs situated at more remote (i.e., ``long'') or less remote (i.e., ``short'') proximity bands is encoded differently. SOTA ETRE models have tended to perform well on…
▽ More
Event Temporal Relation Extraction (ETRE) is a crucial yet challenging problem. Event pairs are situated within a discourse at different distances, which we refer to as proximity bands. The temporal ordering communicated about event pairs situated at more remote (i.e., ``long'') or less remote (i.e., ``short'') proximity bands is encoded differently. SOTA ETRE models have tended to perform well on events situated at either short or long proximity bands, but not both. Yet, real-world, natural texts contain all types of temporal event-pairs. In this paper, we present MulCo: Multi-Scale Contrastive Knowledge Co-Distillation, a fusion approach that shares knowledge across multiple event pair proximity bands in order to improve performance on all types of temporal datasets. Our experimental results show that MulCo successfully integrates linguistic cues pertaining to temporal reasoning across both short and long proximity bands and achieves new state-of-the-art results on several ETRE benchmark datasets.
△ Less
Submitted 20 March, 2024; v1 submitted 1 September, 2022;
originally announced September 2022.
-
YOLO2U-Net: Detection-Guided 3D Instance Segmentation for Microscopy
Authors:
Amirkoushyar Ziabari,
Derek C. Rose,
Abbas Shirinifard,
David Solecki
Abstract:
Microscopy imaging techniques are instrumental for characterization and analysis of biological structures. As these techniques typically render 3D visualization of cells by stacking 2D projections, issues such as out-of-plane excitation and low resolution in the $z$-axis may pose challenges (even for human experts) to detect individual cells in 3D volumes as these non-overlap** cells may appear…
▽ More
Microscopy imaging techniques are instrumental for characterization and analysis of biological structures. As these techniques typically render 3D visualization of cells by stacking 2D projections, issues such as out-of-plane excitation and low resolution in the $z$-axis may pose challenges (even for human experts) to detect individual cells in 3D volumes as these non-overlap** cells may appear as overlap**. In this work, we introduce a comprehensive method for accurate 3D instance segmentation of cells in the brain tissue. The proposed method combines the 2D YOLO detection method with a multi-view fusion algorithm to construct a 3D localization of the cells. Next, the 3D bounding boxes along with the data volume are input to a 3D U-Net network that is designed to segment the primary cell in each 3D bounding box, and in turn, to carry out instance segmentation of cells in the entire volume. The promising performance of the proposed method is shown in comparison with some current deep learning-based 3D instance segmentation methods.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
HCIL: Hierarchical Class Incremental Learning for Longline Fishing Visual Monitoring
Authors:
Jie Mei,
Suzanne Romain,
Craig Rose,
Kelsey Magrane,
Jenq-Neng Hwang
Abstract:
The goal of electronic monitoring of longline fishing is to visually monitor the fish catching activities on fishing vessels based on cameras, either for regulatory compliance or catch counting. The previous hierarchical classification method demonstrates efficient fish species identification of catches from longline fishing, where fishes are under severe deformation and self-occlusion during the…
▽ More
The goal of electronic monitoring of longline fishing is to visually monitor the fish catching activities on fishing vessels based on cameras, either for regulatory compliance or catch counting. The previous hierarchical classification method demonstrates efficient fish species identification of catches from longline fishing, where fishes are under severe deformation and self-occlusion during the catching process. Although the hierarchical classification mitigates the laborious efforts of human reviews by providing confidence scores in different hierarchical levels, its performance drops dramatically under the class incremental learning (CIL) scenario. A CIL system should be able to learn about more and more classes over time from a stream of data, i.e., only the training data for a small number of classes have to be present at the beginning and new classes can be added progressively. In this work, we introduce a Hierarchical Class Incremental Learning (HCIL) model, which significantly improves the state-of-the-art hierarchical classification methods under the CIL scenario.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
Unsupervised Severely Deformed Mesh Reconstruction (DMR) from a Single-View Image
Authors:
Jie Mei,
**gxi Yu,
Suzanne Romain,
Craig Rose,
Kelsey Magrane,
Graeme LeeSon,
Jenq-Neng Hwang
Abstract:
Much progress has been made in the supervised learning of 3D reconstruction of rigid objects from multi-view images or a video. However, it is more challenging to reconstruct severely deformed objects from a single-view RGB image in an unsupervised manner. Although training-based methods, such as specific category-level training, have been shown to successfully reconstruct rigid objects and slight…
▽ More
Much progress has been made in the supervised learning of 3D reconstruction of rigid objects from multi-view images or a video. However, it is more challenging to reconstruct severely deformed objects from a single-view RGB image in an unsupervised manner. Although training-based methods, such as specific category-level training, have been shown to successfully reconstruct rigid objects and slightly deformed objects like birds from a single-view image, they cannot effectively handle severely deformed objects and neither can be applied to some downstream tasks in the real world due to the inconsistent semantic meaning of vertices, which are crucial in defining the adopted 3D templates of objects to be reconstructed. In this work, we introduce a template-based method to infer 3D shapes from a single-view image and apply the reconstructed mesh to a downstream task, i.e., absolute length measurement. Without using 3D ground truth, our method faithfully reconstructs 3D meshes and achieves state-of-the-art accuracy in a length measurement task on a severely deformed fish dataset.
△ Less
Submitted 23 January, 2022;
originally announced January 2022.
-
Adapting to the Long Tail: A Meta-Analysis of Transfer Learning Research for Language Understanding Tasks
Authors:
Aakanksha Naik,
Jill Lehman,
Carolyn Rose
Abstract:
Natural language understanding (NLU) has made massive progress driven by large benchmarks, but benchmarks often leave a long tail of infrequent phenomena underrepresented. We reflect on the question: have transfer learning methods sufficiently addressed the poor performance of benchmark-trained models on the long tail? We conceptualize the long tail using macro-level dimensions (e.g., underreprese…
▽ More
Natural language understanding (NLU) has made massive progress driven by large benchmarks, but benchmarks often leave a long tail of infrequent phenomena underrepresented. We reflect on the question: have transfer learning methods sufficiently addressed the poor performance of benchmark-trained models on the long tail? We conceptualize the long tail using macro-level dimensions (e.g., underrepresented genres, topics, etc.), and perform a qualitative meta-analysis of 100 representative papers on transfer learning research for NLU. Our analysis asks three questions: (i) Which long tail dimensions do transfer learning studies target? (ii) Which properties of adaptation methods help improve performance on the long tail? (iii) Which methodological gaps have greatest negative impact on long tail performance? Our answers highlight major avenues for future research in transfer learning for the long tail. Lastly, using our meta-analysis framework, we perform a case study comparing the performance of various adaptation methods on clinical narratives, which provides interesting insights that may enable us to make progress along these future avenues.
△ Less
Submitted 3 June, 2022; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling
Authors:
Xiaopeng Lu,
Zhen Fan,
Yansen Wang,
Jean Oh,
Carolyn P. Rose
Abstract:
As an important task in multimodal context understanding, Text-VQA (Visual Question Answering) aims at question answering through reading text information in images. It differentiates from the original VQA task as Text-VQA requires large amounts of scene-text relationship understanding, in addition to the cross-modal grounding capability. In this paper, we propose Localize, Group, and Select (LOGO…
▽ More
As an important task in multimodal context understanding, Text-VQA (Visual Question Answering) aims at question answering through reading text information in images. It differentiates from the original VQA task as Text-VQA requires large amounts of scene-text relationship understanding, in addition to the cross-modal grounding capability. In this paper, we propose Localize, Group, and Select (LOGOS), a novel model which attempts to tackle this problem from multiple aspects. LOGOS leverages two grounding tasks to better localize the key information of the image, utilizes scene text clustering to group individual OCR tokens, and learns to select the best answer from different sources of OCR (Optical Character Recognition) texts. Experiments show that LOGOS outperforms previous state-of-the-art methods on two Text-VQA benchmarks without using additional OCR annotation data. Ablation studies and analysis demonstrate the capability of LOGOS to bridge different modalities and better understand scene text.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
Comparing Example-Based Collaborative Reflection to Problem Solving Practice for Learning during Team-Based Software Engineering Projects
Authors:
Sreecharan Sankaranarayanan,
Siddharth Reddy Kandimalla,
Christopher Bogart,
R. Charles Murray,
Haokang An,
Michael Hilton,
Majd Sakr,
Carolyn Rosé
Abstract:
Contributing to the literature on aptitude-treatment interactions between worked examples and problem-solving, this paper addresses differential learning from the two approaches when students are positioned as domain experts learning new concepts. Our evaluation is situated in a team project that is part of an advanced software engineering course. In this course, students who possess foundational…
▽ More
Contributing to the literature on aptitude-treatment interactions between worked examples and problem-solving, this paper addresses differential learning from the two approaches when students are positioned as domain experts learning new concepts. Our evaluation is situated in a team project that is part of an advanced software engineering course. In this course, students who possess foundational domain knowledge but are learning new concepts engage alternatively in programming followed by worked example-based reflection. They are either allowed to finish programming or are curtailed after a pre-specified time to participate in a longer worked example-based reflection. We find significant pre- to post-test learning gains in both conditions. Then, we not only find significantly more learning when students participated in longer worked example-based reflections but also a significant performance improvement on a problem-solving transfer task. These findings suggest that domain experts learning new concepts benefit more from worked example-based reflections than from problem-solving.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Robust Knowledge Graph Completion with Stacked Convolutions and a Student Re-Ranking Network
Authors:
Justin Lovelace,
Denis Newman-Griffis,
Shikhar Vashishth,
Jill Fain Lehman,
Carolyn Penstein Rosé
Abstract:
Knowledge Graph (KG) completion research usually focuses on densely connected benchmark datasets that are not representative of real KGs. We curate two KG datasets that include biomedical and encyclopedic knowledge and use an existing commonsense KG dataset to explore KG completion in the more realistic setting where dense connectivity is not guaranteed. We develop a deep convolutional network tha…
▽ More
Knowledge Graph (KG) completion research usually focuses on densely connected benchmark datasets that are not representative of real KGs. We curate two KG datasets that include biomedical and encyclopedic knowledge and use an existing commonsense KG dataset to explore KG completion in the more realistic setting where dense connectivity is not guaranteed. We develop a deep convolutional network that utilizes textual entity representations and demonstrate that our model outperforms recent KG completion methods in this challenging setting. We find that our model's performance improvements stem primarily from its robustness to sparsity. We then distill the knowledge from the convolutional network into a student network that re-ranks promising candidate entities. This re-ranking stage leads to further improvements in performance and demonstrates the effectiveness of entity re-ranking for KG completion.
△ Less
Submitted 11 June, 2021;
originally announced June 2021.
-
STAGE: Tool for Automated Extraction of Semantic Time Cues to Enrich Neural Temporal Ordering Models
Authors:
Luke Breitfeller,
Aakanksha Naik,
Carolyn Rose
Abstract:
Despite achieving state-of-the-art accuracy on temporal ordering of events, neural models showcase significant gaps in performance. Our work seeks to fill one of these gaps by leveraging an under-explored dimension of textual semantics: rich semantic information provided by explicit textual time cues. We develop STAGE, a system that consists of a novel temporal framework and a parser that can auto…
▽ More
Despite achieving state-of-the-art accuracy on temporal ordering of events, neural models showcase significant gaps in performance. Our work seeks to fill one of these gaps by leveraging an under-explored dimension of textual semantics: rich semantic information provided by explicit textual time cues. We develop STAGE, a system that consists of a novel temporal framework and a parser that can automatically extract time cues and convert them into representations suitable for integration with neural models. We demonstrate the utility of extracted cues by integrating them with an event ordering model using a joint BiLSTM and ILP constraint architecture. We outline the functionality of the 3-part STAGE processing approach, and show two methods of integrating its representations with the BiLSTM-ILP model: (i) incorporating semantic cues as additional features, and (ii) generating new constraints from semantic cues to be enforced in the ILP. We demonstrate promising results on two event ordering datasets, and highlight important issues in semantic cue representation and integration for future research.
△ Less
Submitted 15 May, 2021;
originally announced May 2021.
-
Reinforcement learning of rare diffusive dynamics
Authors:
Avishek Das,
Dominic C. Rose,
Juan P. Garrahan,
David T. Limmer
Abstract:
We present a method to probe rare molecular dynamics trajectories directly using reinforcement learning. We consider trajectories that are conditioned to transition between regions of configuration space in finite time, like those relevant in the study of reactive events, as well as trajectories exhibiting rare fluctuations of time-integrated quantities in the long time limit, like those relevant…
▽ More
We present a method to probe rare molecular dynamics trajectories directly using reinforcement learning. We consider trajectories that are conditioned to transition between regions of configuration space in finite time, like those relevant in the study of reactive events, as well as trajectories exhibiting rare fluctuations of time-integrated quantities in the long time limit, like those relevant in the calculation of large deviation functions. In both cases, reinforcement learning techniques are used to optimize an added force that minimizes the Kullback-Leibler divergence between the conditioned trajectory ensemble and a driven one. Under the optimized added force, the system evolves the rare fluctuation as a typical one, affording a variational estimate of its likelihood in the original trajectory ensemble. Low variance gradients employing value functions are proposed to increase the convergence of the optimal force. The method we develop employing these gradients leads to efficient and accurate estimates of both the optimal force and the likelihood of the rare event for a variety of model systems.
△ Less
Submitted 11 August, 2021; v1 submitted 10 May, 2021;
originally announced May 2021.
-
Evaluating the Impact of a Hierarchical Discourse Representation on Entity Coreference Resolution Performance
Authors:
Sopan Khosla,
James Fiacco,
Carolyn Rose
Abstract:
Recent work on entity coreference resolution (CR) follows current trends in Deep Learning applied to embeddings and relatively simple task-related features. SOTA models do not make use of hierarchical representations of discourse structure. In this work, we leverage automatically constructed discourse parse trees within a neural approach and demonstrate a significant improvement on two benchmark e…
▽ More
Recent work on entity coreference resolution (CR) follows current trends in Deep Learning applied to embeddings and relatively simple task-related features. SOTA models do not make use of hierarchical representations of discourse structure. In this work, we leverage automatically constructed discourse parse trees within a neural approach and demonstrate a significant improvement on two benchmark entity coreference-resolution datasets. We explore how the impact varies depending upon the type of mention.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research
Authors:
Denis Newman-Griffis,
Jill Fain Lehman,
Carolyn Rosé,
Harry Hochheiser
Abstract:
Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of…
▽ More
Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of Translational NLP, which aims to structure and facilitate the processes by which basic and applied NLP research inform one another. Translational NLP thus presents a third research paradigm, focused on understanding the challenges posed by application needs and how these challenges can drive innovation in basic science and technology design. We show that many significant advances in NLP research have emerged from the intersection of basic principles with application needs, and present a conceptual framework outlining the stakeholders and key questions in translational research. Our framework provides a roadmap for develo** Translational NLP as a dedicated research area, and identifies general translational principles to facilitate exchange between basic and applied research.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
Hugo: A Cluster Scheduler that Efficiently Learns to Select Complementary Data-Parallel Jobs
Authors:
Lauritz Thamsen,
Ilya Verbitskiy,
Sasho Nedelkoski,
Vinh Thuy Tran,
Vinicius Meyer,
Miguel G. Xavier,
Odej Kao,
Cesar A. F. De Rose
Abstract:
Distributed data processing systems like MapReduce, Spark, and Flink are popular tools for analysis of large datasets with cluster resources. Yet, users often overprovision resources for their data processing jobs, while the resource usage of these jobs also typically fluctuates considerably. Therefore, multiple jobs usually get scheduled onto the same shared resources to increase the resource uti…
▽ More
Distributed data processing systems like MapReduce, Spark, and Flink are popular tools for analysis of large datasets with cluster resources. Yet, users often overprovision resources for their data processing jobs, while the resource usage of these jobs also typically fluctuates considerably. Therefore, multiple jobs usually get scheduled onto the same shared resources to increase the resource utilization and throughput of clusters. However, job runtimes and the utilization of shared resources can vary significantly depending on the specific combinations of co-located jobs.
This paper presents Hugo, a cluster scheduler that continuously learns how efficiently jobs share resources, considering metrics for the resource utilization and interference among co-located jobs. The scheduler combines offline grou** of jobs with online reinforcement learning to provide a scheduling mechanism that efficiently generalizes from specific monitored job combinations yet also adapts to changes in workloads. Our evaluation of a prototype shows that the approach can reduce the runtimes of exemplary Spark jobs on a YARN cluster by up to 12.5%, while resource utilization is increased and waiting times can be bounded.
△ Less
Submitted 14 February, 2021;
originally announced February 2021.
-
Absolute 3D Pose Estimation and Length Measurement of Severely Deformed Fish from Monocular Videos in Longline Fishing
Authors:
Jie Mei,
Jenq-Neng Hwang,
Suzanne Romain,
Craig Rose,
Braden Moore,
Kelsey Magrane
Abstract:
Monocular absolute 3D fish pose estimation allows for efficient fish length measurement in the longline fisheries, where fishes are under severe deformation during the catching process. This task is challenging since it requires locating absolute 3D fish keypoints based on a short monocular video clip. Unlike related works, which either require expensive 3D ground-truth data and/or multiple-view i…
▽ More
Monocular absolute 3D fish pose estimation allows for efficient fish length measurement in the longline fisheries, where fishes are under severe deformation during the catching process. This task is challenging since it requires locating absolute 3D fish keypoints based on a short monocular video clip. Unlike related works, which either require expensive 3D ground-truth data and/or multiple-view images to provide depth information, or are limited to rigid objects, we propose a novel frame-based method to estimate the absolute 3D fish pose and fish length from a single-view 2D segmentation mask. We first introduce a relative 3D fish template. By minimizing an objective function, our method systematically estimates the relative 3D pose of the target fish and fish 2D keypoints in the image. Finally, with a closed-form solution, the relative 3D fish pose can help locate absolute 3D keypoints, resulting in the frame-based absolute fish length measurement, which is further refined based on the statistical temporal inference for the optimal fish length measurement from the video clip. Our experiments show that this method can accurately estimate the absolute 3D fish pose and further measure the absolute length, even outperforming the state-of-the-art multi-view method.
△ Less
Submitted 8 February, 2021;
originally announced February 2021.
-
Video-based Hierarchical Species Classification for Longline Fishing Monitoring
Authors:
Jie Mei,
Jenq-Neng Hwang,
Suzanne Romain,
Craig Rose,
Braden Moore,
Kelsey Magrane
Abstract:
The goal of electronic monitoring (EM) of longline fishing is to monitor the fish catching activities on fishing vessels, either for the regulatory compliance or catch counting. Hierarchical classification based on videos allows for inexpensive and efficient fish species identification of catches from longline fishing, where fishes are under severe deformation and self-occlusion during the catchin…
▽ More
The goal of electronic monitoring (EM) of longline fishing is to monitor the fish catching activities on fishing vessels, either for the regulatory compliance or catch counting. Hierarchical classification based on videos allows for inexpensive and efficient fish species identification of catches from longline fishing, where fishes are under severe deformation and self-occlusion during the catching process. More importantly, the flexibility of hierarchical classification mitigates the laborious efforts of human reviews by providing confidence scores in different hierarchical levels. Some related works either use cascaded models for hierarchical classification or make predictions per image or predict one overlap** hierarchical data structure of the dataset in advance. However, with a known non-overlap** hierarchical data structure provided by fisheries scientists, our method enforces the hierarchical data structure and introduces an efficient training and inference strategy for video-based fisheries data. Our experiments show that the proposed method outperforms the classic flat classification system significantly and our ablation study justifies our contributions in CNN model design, training strategy, and the video-based inference schemes for the hierarchical fish species classification task.
△ Less
Submitted 6 February, 2021;
originally announced February 2021.
-
RESPER: Computationally Modelling Resisting Strategies in Persuasive Conversations
Authors:
Ritam Dutt,
Sayan Sinha,
Rishabh Joshi,
Surya Shekhar Chakraborty,
Meredith Riggs,
Xinru Yan,
Haogang Bao,
Carolyn Penstein Rosé
Abstract:
Modelling persuasion strategies as predictors of task outcome has several real-world applications and has received considerable attention from the computational linguistics community. However, previous research has failed to account for the resisting strategies employed by an individual to foil such persuasion attempts. Grounded in prior literature in cognitive and social psychology, we propose a…
▽ More
Modelling persuasion strategies as predictors of task outcome has several real-world applications and has received considerable attention from the computational linguistics community. However, previous research has failed to account for the resisting strategies employed by an individual to foil such persuasion attempts. Grounded in prior literature in cognitive and social psychology, we propose a generalised framework for identifying resisting strategies in persuasive conversations. We instantiate our framework on two distinct datasets comprising persuasion and negotiation conversations. We also leverage a hierarchical sequence-labelling neural architecture to infer the aforementioned resisting strategies automatically. Our experiments reveal the asymmetry of power roles in non-collaborative goal-directed conversations and the benefits accrued from incorporating resisting strategies on the final conversation outcome. We also investigate the role of different resisting strategies on the conversation outcome and glean insights that corroborate with past findings. We also make the code and the dataset of this work publicly available at https://github.com/americast/resper.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.
-
RECEIPT: REfine CoarsE-grained IndePendent Tasks for Parallel Tip decomposition of Bipartite Graphs
Authors:
Kartik Lakhotia,
Rajgopal Kannan,
Viktor Prasanna,
Cesar A. F. De Rose
Abstract:
Tip decomposition is a crucial kernel for mining dense subgraphs in bipartite networks, with applications in spam detection, analysis of affiliation networks etc. It creates a hierarchy of vertex-induced subgraphs with varying densities determined by the participation of vertices in butterflies (2,2-bicliques). To build the hierarchy, existing algorithms iteratively follow a delete-update(peeling)…
▽ More
Tip decomposition is a crucial kernel for mining dense subgraphs in bipartite networks, with applications in spam detection, analysis of affiliation networks etc. It creates a hierarchy of vertex-induced subgraphs with varying densities determined by the participation of vertices in butterflies (2,2-bicliques). To build the hierarchy, existing algorithms iteratively follow a delete-update(peeling) process: deleting vertices with the minimum number of butterflies and correspondingly updating the butterfly count of their 2-hop neighbors. The need to explore 2-hop neighborhood renders tip-decomposition computationally very expensive. Furthermore, the inherent sequentiality in peeling only minimum butterfly vertices makes derived parallel algorithms prone to heavy synchronization.
In this paper, we propose a novel parallel tip-decomposition algorithm -- REfine CoarsE-grained Independent Tasks (RECEIPT) that relaxes the peeling order restrictions by partitioning the vertices into multiple independent subsets that can be concurrently peeled. This enables RECEIPT to simultaneously achieve a high degree of parallelism and dramatic reduction in synchronizations. Further, RECEIPT employs a hybrid peeling strategy along with other optimizations that drastically reduce the amount of wedge exploration and execution time.
We perform detailed experimental evaluation of RECEIPT on a shared-memory multicore server. It can process some of the largest publicly available bipartite datasets orders of magnitude faster than the state-of-the-art algorithms -- achieving up to 1100x and 64x reduction in the number of thread synchronizations and traversed wedges, respectively. Using 36 threads, RECEIPT can provide up to 17.1x self-relative speedup. Our implementation of RECEIPT is available at https://github.com/kartiklakhotia/RECEIPT.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Using Type Information to Improve Entity Coreference Resolution
Authors:
Sopan Khosla,
Carolyn Rose
Abstract:
Coreference resolution (CR) is an essential part of discourse analysis. Most recently, neural approaches have been proposed to improve over SOTA models from earlier paradigms. So far none of the published neural models leverage external semantic knowledge such as type information. This paper offers the first such model and evaluation, demonstrating modest gains in accuracy by introducing either go…
▽ More
Coreference resolution (CR) is an essential part of discourse analysis. Most recently, neural approaches have been proposed to improve over SOTA models from earlier paradigms. So far none of the published neural models leverage external semantic knowledge such as type information. This paper offers the first such model and evaluation, demonstrating modest gains in accuracy by introducing either gold standard or predicted types. In the proposed approach, type information serves both to (1) improve mention representation and (2) create a soft type consistency check between coreference candidate mentions. Our evaluation covers two different grain sizes of types over four different benchmark corpora.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
MedFilter: Improving Extraction of Task-relevant Utterances from Doctor-Patient Conversations through Integration of Discourse Structure and Ontological Knowledge
Authors:
Sopan Khosla,
Shikhar Vashishth,
Jill Fain Lehman,
Carolyn Rose
Abstract:
Information extraction from conversational data is particularly challenging because the task-centric nature of conversation allows for effective communication of implicit information by humans, but is challenging for machines. The challenges may differ between utterances depending on the role of the speaker within the conversation, especially when relevant expertise is distributed asymmetrically a…
▽ More
Information extraction from conversational data is particularly challenging because the task-centric nature of conversation allows for effective communication of implicit information by humans, but is challenging for machines. The challenges may differ between utterances depending on the role of the speaker within the conversation, especially when relevant expertise is distributed asymmetrically across roles. Further, the challenges may also increase over the conversation as more shared context is built up through information communicated implicitly earlier in the dialogue. In this paper, we propose the novel modeling approach MedFilter, which addresses these insights in order to increase performance at identifying and categorizing task-relevant utterances, and in so doing, positively impacts performance at a downstream information extraction task. We evaluate this approach on a corpus of nearly 7,000 doctor-patient conversations where MedFilter is used to identify medically relevant contributions to the discussion (achieving a 10% improvement over SOTA baselines in terms of area under the PR curve). Identifying task-relevant utterances benefits downstream medical processing, achieving improvements of 15%, 105%, and 23% respectively for the extraction of symptoms, medications, and complaints.
△ Less
Submitted 21 June, 2022; v1 submitted 5 October, 2020;
originally announced October 2020.
-
Kee** Up Appearances: Computational Modeling of Face Acts in Persuasion Oriented Discussions
Authors:
Ritam Dutt,
Rishabh Joshi,
Carolyn Penstein Rose
Abstract:
The notion of face refers to the public self-image of an individual that emerges both from the individual's own actions as well as from the interaction with others. Modeling face and understanding its state changes throughout a conversation is critical to the study of maintenance of basic human needs in and through interaction. Grounded in the politeness theory of Brown and Levinson (1978), we pro…
▽ More
The notion of face refers to the public self-image of an individual that emerges both from the individual's own actions as well as from the interaction with others. Modeling face and understanding its state changes throughout a conversation is critical to the study of maintenance of basic human needs in and through interaction. Grounded in the politeness theory of Brown and Levinson (1978), we propose a generalized framework for modeling face acts in persuasion conversations, resulting in a reliable coding manual, an annotated corpus, and computational models. The framework reveals insights about differences in face act utilization between asymmetric roles in persuasion conversations. Using computational models, we are able to successfully identify face acts as well as predict a key conversational outcome (e.g. donation success). Finally, we model a latent representation of the conversational state to analyze the impact of predicted face acts on the probability of a positive conversational outcome and observe several correlations that corroborate previous findings.
△ Less
Submitted 23 September, 2020; v1 submitted 22 September, 2020;
originally announced September 2020.
-
Population Susceptibility Variation and Its Effect on Contagion Dynamics
Authors:
Christopher Rose,
Andrew J. Medford,
C. Franklin Goldsmith,
Tejs Vegge,
Joshua Weitz,
Andrew A. Peterson
Abstract:
Susceptibility governs the dynamics of contagion. The classical SIR model is one of the simplest compartmental models of contagion spread, assuming a single shared susceptibility level. However, variation in susceptibility over a population can fundamentally affect the dynamics of contagion and thus the ultimate outcome of a pandemic. We develop mathematical machinery which explicitly considers su…
▽ More
Susceptibility governs the dynamics of contagion. The classical SIR model is one of the simplest compartmental models of contagion spread, assuming a single shared susceptibility level. However, variation in susceptibility over a population can fundamentally affect the dynamics of contagion and thus the ultimate outcome of a pandemic. We develop mathematical machinery which explicitly considers susceptibility variation, illuminates how the susceptibility distribution is sculpted by contagion, and thence how such variation affects the SIR differential questions that govern contagion. Our methods allow us to derive closed form expressions for herd immunity thresholds as a function of initial susceptibility distributions and suggests an intuitively satisfying approach to inoculation when only a fraction of the population is accessible to such intervention. Of particular interest, if we assume static susceptibility of individuals in the susceptible pool, ignoring susceptibility diversity {\em always} results in overestimation of the herd immunity threshold and that difference can be dramatic. Therefore, we should develop robust measures of susceptibility variation as part of public health strategies for handling pandemics.
△ Less
Submitted 18 September, 2020;
originally announced September 2020.
-
Adapting Event Extractors to Medical Data: Bridging the Covariate Shift
Authors:
Aakanksha Naik,
Jill Lehman,
Carolyn Rose
Abstract:
We tackle the task of adapting event extractors to new domains without labeled data, by aligning the marginal distributions of source and target domains. As a testbed, we create two new event extraction datasets using English texts from two medical domains: (i) clinical notes, and (ii) doctor-patient conversations. We test the efficacy of three marginal alignment techniques: (i) adversarial domain…
▽ More
We tackle the task of adapting event extractors to new domains without labeled data, by aligning the marginal distributions of source and target domains. As a testbed, we create two new event extraction datasets using English texts from two medical domains: (i) clinical notes, and (ii) doctor-patient conversations. We test the efficacy of three marginal alignment techniques: (i) adversarial domain adaptation (ADA), (ii) domain adaptive fine-tuning (DAFT), and (iii) a novel instance weighting technique based on language model likelihood scores (LIW). LIW and DAFT improve over a no-transfer BERT baseline on both domains, but ADA only improves on clinical notes. Deeper analysis of performance under different types of shifts (e.g., lexical shift, semantic shift) reveals interesting variations among models. Our best-performing models reach F1 scores of 70.0 and 72.9 on notes and conversations respectively, using no labeled data from target domains.
△ Less
Submitted 20 August, 2020;
originally announced August 2020.
-
A reinforcement learning approach to rare trajectory sampling
Authors:
Dominic C. Rose,
Jamie F. Mair,
Juan P. Garrahan
Abstract:
Very often when studying non-equilibrium systems one is interested in analysing dynamical behaviour that occurs with very low probability, so called rare events. In practice, since rare events are by definition atypical, they are often difficult to access in a statistically significant way. What are required are strategies to "make rare events typical" so that they can be generated on demand. Here…
▽ More
Very often when studying non-equilibrium systems one is interested in analysing dynamical behaviour that occurs with very low probability, so called rare events. In practice, since rare events are by definition atypical, they are often difficult to access in a statistically significant way. What are required are strategies to "make rare events typical" so that they can be generated on demand. Here we present such a general approach to adaptively construct a dynamics that efficiently samples atypical events. We do so by exploiting the methods of reinforcement learning (RL), which refers to the set of machine learning techniques aimed at finding the optimal behaviour to maximise a reward associated with the dynamics. We consider the general perspective of dynamical trajectory ensembles, whereby rare events are described in terms of ensemble reweighting. By minimising the distance between a reweighted ensemble and that of a suitably parametrised controlled dynamics we arrive at a set of methods similar to those of RL to numerically approximate the optimal dynamics that realises the rare behaviour of interest. As simple illustrations we consider in detail the problem of excursions of a random walker, for the case of rare events with a finite time horizon; and the problem of a studying current statistics of a particle hop** in a ring geometry, for the case of an infinite time horizon. We discuss natural extensions of the ideas presented here, including to continuous-time Markov systems, first passage time problems and non-Markovian dynamics.
△ Less
Submitted 25 November, 2020; v1 submitted 26 May, 2020;
originally announced May 2020.
-
Towards Open Domain Event Trigger Identification using Adversarial Domain Adaptation
Authors:
Aakanksha Naik,
Carolyn Rosé
Abstract:
We tackle the task of building supervised event trigger identification models which can generalize better across domains. Our work leverages the adversarial domain adaptation (ADA) framework to introduce domain-invariance. ADA uses adversarial training to construct representations that are predictive for trigger identification, but not predictive of the example's domain. It requires no labeled dat…
▽ More
We tackle the task of building supervised event trigger identification models which can generalize better across domains. Our work leverages the adversarial domain adaptation (ADA) framework to introduce domain-invariance. ADA uses adversarial training to construct representations that are predictive for trigger identification, but not predictive of the example's domain. It requires no labeled data from the target domain, making it completely unsupervised. Experiments with two domains (English literature and news) show that ADA leads to an average F1 score improvement of 3.9 on out-of-domain data. Our best performing model (BERT-A) reaches 44-49 F1 across both domains, using no labeled target data. Preliminary experiments reveal that finetuning on 1% labeled data, followed by self-training leads to substantial improvement, reaching 51.5 and 67.2 F1 on literature and news respectively.
△ Less
Submitted 22 May, 2020;
originally announced May 2020.
-
Capacities and Optimal Input Distributions for Particle-Intensity Channels
Authors:
Nariman Farsad,
Will Chuang,
Andrea Goldsmith,
Christos Komninakis,
Muriel Médard,
Christopher Rose,
Lieven Vandenberghe,
Emily E. Wesel,
Richard D. Wesel
Abstract:
This work introduces the particle-intensity channel (PIC) as a model for molecular communication systems and characterizes the capacity limits as well as properties of the optimal (capacity-achieving) input distributions for such channels. In the PIC, the transmitter encodes information, in symbols of a given duration, based on the probability of particle release, and the receiver detects and deco…
▽ More
This work introduces the particle-intensity channel (PIC) as a model for molecular communication systems and characterizes the capacity limits as well as properties of the optimal (capacity-achieving) input distributions for such channels. In the PIC, the transmitter encodes information, in symbols of a given duration, based on the probability of particle release, and the receiver detects and decodes the message based on the number of particles detected during the symbol interval. In this channel, the transmitter may be unable to control precisely the probability of particle release, and the receiver may not detect all the particles that arrive. We model this channel using a generalization of the binomial channel and show that the capacity-achieving input distribution for this channel always has mass points at probabilities of particle release of zero and one. To find the capacity-achieving input distributions, we develop an efficient algorithm we call dynamic assignment Blahut-Arimoto (DAB). For diffusive particle transport, we also derive the conditions under which the input with two mass points is capacity-achieving.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
Hyperparameter Optimization in Binary Communication Networks for Neuromorphic Deployment
Authors:
Maryam Parsa,
Catherine D. Schuman,
Prasanna Date,
Derek C. Rose,
Bill Kay,
J. Parker Mitchell,
Steven R. Young,
Ryan Dellana,
William Severa,
Thomas E. Potok,
Kaushik Roy
Abstract:
Training neural networks for neuromorphic deployment is non-trivial. There have been a variety of approaches proposed to adapt back-propagation or back-propagation-like algorithms appropriate for training. Considering that these networks often have very different performance characteristics than traditional neural networks, it is often unclear how to set either the network topology or the hyperpar…
▽ More
Training neural networks for neuromorphic deployment is non-trivial. There have been a variety of approaches proposed to adapt back-propagation or back-propagation-like algorithms appropriate for training. Considering that these networks often have very different performance characteristics than traditional neural networks, it is often unclear how to set either the network topology or the hyperparameters to achieve optimal performance. In this work, we introduce a Bayesian approach for optimizing the hyperparameters of an algorithm for training binary communication networks that can be deployed to neuromorphic hardware. We show that by optimizing the hyperparameters on this algorithm for each dataset, we can achieve improvements in accuracy over the previous state-of-the-art for this algorithm on each dataset (by up to 15 percent). This jump in performance continues to emphasize the potential when converting traditional neural networks to binary communication applicable to neuromorphic hardware.
△ Less
Submitted 20 April, 2020;
originally announced May 2020.
-
Improving Broad-Coverage Medical Entity Linking with Semantic Type Prediction and Large-Scale Datasets
Authors:
Shikhar Vashishth,
Denis Newman-Griffis,
Rishabh Joshi,
Ritam Dutt,
Carolyn Rose
Abstract:
Medical entity linking is the task of identifying and standardizing medical concepts referred to in an unstructured text. Most of the existing methods adopt a three-step approach of (1) detecting mentions, (2) generating a list of candidate concepts, and finally (3) picking the best concept among them. In this paper, we probe into alleviating the problem of overgeneration of candidate concepts in…
▽ More
Medical entity linking is the task of identifying and standardizing medical concepts referred to in an unstructured text. Most of the existing methods adopt a three-step approach of (1) detecting mentions, (2) generating a list of candidate concepts, and finally (3) picking the best concept among them. In this paper, we probe into alleviating the problem of overgeneration of candidate concepts in the candidate generation module, the most under-studied component of medical entity linking. For this, we present MedType, a fully modular system that prunes out irrelevant candidate concepts based on the predicted semantic type of an entity mention. We incorporate MedType into five off-the-shelf toolkits for medical entity linking and demonstrate that it consistently improves entity linking performance across several benchmark datasets. To address the dearth of annotated training data for medical entity linking, we present WikiMed and PubMedDS, two large-scale medical entity linking datasets, and demonstrate that pre-training MedType on these datasets further improves entity linking performance. We make our source code and datasets publicly available for medical entity linking research.
△ Less
Submitted 22 August, 2021; v1 submitted 1 May, 2020;
originally announced May 2020.
-
A Tensor Network Approach to Finite Markov Decision Processes
Authors:
Edward Gillman,
Dominic C. Rose,
Juan P. Garrahan
Abstract:
Tensor network (TN) techniques - often used in the context of quantum many-body physics - have shown promise as a tool for tackling machine learning (ML) problems. The application of TNs to ML, however, has mostly focused on supervised and unsupervised learning. Yet, with their direct connection to hidden Markov chains, TNs are also naturally suited to Markov decision processes (MDPs) which provid…
▽ More
Tensor network (TN) techniques - often used in the context of quantum many-body physics - have shown promise as a tool for tackling machine learning (ML) problems. The application of TNs to ML, however, has mostly focused on supervised and unsupervised learning. Yet, with their direct connection to hidden Markov chains, TNs are also naturally suited to Markov decision processes (MDPs) which provide the foundation for reinforcement learning (RL). Here we introduce a general TN formulation of finite, episodic and discrete MDPs. We show how this formulation allows us to exploit algorithms developed for TNs for policy optimisation, the key aim of RL. As an application we consider the issue - formulated as an RL problem - of finding a stochastic evolution that satisfies specific dynamical conditions, using the simple example of random walk excursions as an illustration.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
A Machine Learning Framework for Authorship Identification From Texts
Authors:
Rahul Radhakrishnan Iyer,
Carolyn Penstein Rose
Abstract:
Authorship identification is a process in which the author of a text is identified. Most known literary texts can easily be attributed to a certain author because they are, for example, signed. Yet sometimes we find unfinished pieces of work or a whole bunch of manuscripts with a wide variety of possible authors. In order to assess the importance of such a manuscript, it is vital to know who wrote…
▽ More
Authorship identification is a process in which the author of a text is identified. Most known literary texts can easily be attributed to a certain author because they are, for example, signed. Yet sometimes we find unfinished pieces of work or a whole bunch of manuscripts with a wide variety of possible authors. In order to assess the importance of such a manuscript, it is vital to know who wrote it. In this work, we aim to develop a machine learning framework to effectively determine authorship. We formulate the task as a single-label multi-class text categorization problem and propose a supervised machine learning framework incorporating stylometric features. This task is highly interdisciplinary in that it takes advantage of machine learning, information retrieval, and natural language processing. We present an approach and a model which learns the differences in writing style between $50$ different authors and is able to predict the author of a new text with high accuracy. The accuracy is seen to increase significantly after introducing certain linguistic stylometric features along with text features.
△ Less
Submitted 21 December, 2019;
originally announced December 2019.
-
Exascale Deep Learning to Accelerate Cancer Research
Authors:
Robert M. Patton,
J. Travis Johnston,
Steven R. Young,
Catherine D. Schuman,
Thomas E. Potok,
Derek C. Rose,
Seung-Hwan Lim,
Junghoon Chae,
Le Hou,
Shahira Abousamra,
Dimitris Samaras,
Joel Saltz
Abstract:
Deep learning, through the use of neural networks, has demonstrated remarkable ability to automate many routine tasks when presented with sufficient data for training. The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend…
▽ More
Deep learning, through the use of neural networks, has demonstrated remarkable ability to automate many routine tasks when presented with sufficient data for training. The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend for neural network architectures, especially those trained on ImageNet, has been to grow ever deeper and more complex. The result has been ever increasing accuracy on benchmark datasets with the cost of increased computational demands. In this paper we demonstrate that neural network architectures can be automatically generated, tailored for a specific application, with dual objectives: accuracy of prediction and speed of prediction. Using MENNDL--an HPC-enabled software stack for neural architecture search--we generate a neural network with comparable accuracy to state-of-the-art networks on a cancer pathology dataset that is also $16\times$ faster at inference. The speedup in inference is necessary because of the volume and velocity of cancer pathology data; specifically, the previous state-of-the-art networks are too slow for individual researchers without access to HPC systems to keep pace with the rate of data generation. Our new model enables researchers with modest computational resources to analyze newly generated data faster than it is collected.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Principles of Information Storage in Small-Molecule Mixtures
Authors:
Jacob K. Rosenstein,
Christopher Rose,
Sherief Reda,
Peter M. Weber,
Eunsuk Kim,
Jason Sello,
Joseph Geiser,
Eamonn Kennedy,
Christopher Arcadia,
Amanda Dombroski,
Kady Oakley,
Shui Ling Chen,
Hokchhay Tann,
Brenda M. Rubenstein
Abstract:
Molecular data systems have the potential to store information at dramatically higher density than existing electronic media. Some of the first experimental demonstrations of this idea have used DNA, but nature also uses a wide diversity of smaller non-polymeric molecules to preserve, process, and transmit information. In this paper, we present a general framework for quantifying chemical memory,…
▽ More
Molecular data systems have the potential to store information at dramatically higher density than existing electronic media. Some of the first experimental demonstrations of this idea have used DNA, but nature also uses a wide diversity of smaller non-polymeric molecules to preserve, process, and transmit information. In this paper, we present a general framework for quantifying chemical memory, which is not limited to polymers and extends to mixtures of molecules of all types. We show that the theoretical limit for molecular information is two orders of magnitude denser by mass than DNA, although this comes with different practical constraints on total capacity. We experimentally demonstrate kilobyte-scale information storage in mixtures of small synthetic molecules, and we consider some of the new perspectives that will be necessary to harness the information capacity available from the vast non-genomic chemical space.
△ Less
Submitted 6 May, 2019;
originally announced May 2019.
-
Time-series Insights into the Process of Passing or Failing Online University Courses using Neural-Induced Interpretable Student States
Authors:
Byungsoo Jeon,
Eyal Shafran,
Luke Breitfeller,
Jason Levin,
Carolyn P. Rose
Abstract:
This paper addresses a key challenge in Educational Data Mining, namely to model student behavioral trajectories in order to provide a means for identifying students most at-risk, with the goal of providing supportive interventions. While many forms of data including clickstream data or data from sensors have been used extensively in time series models for such purposes, in this paper we explore t…
▽ More
This paper addresses a key challenge in Educational Data Mining, namely to model student behavioral trajectories in order to provide a means for identifying students most at-risk, with the goal of providing supportive interventions. While many forms of data including clickstream data or data from sensors have been used extensively in time series models for such purposes, in this paper we explore the use of textual data, which is sometimes available in the records of students at large, online universities. We propose a time series model that constructs an evolving student state representation using both clickstream data and a signal extracted from the textual notes recorded by human mentors assigned to each student. We explore how the addition of this textual data improves both the predictive power of student states for the purpose of identifying students at risk for course failure as well as for providing interpretable insights about student course engagement processes.
△ Less
Submitted 1 May, 2019;
originally announced May 2019.
-
EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference
Authors:
Abhilasha Ravichander,
Aakanksha Naik,
Carolyn Rose,
Eduard Hovy
Abstract:
Quantitative reasoning is a higher-order reasoning skill that any intelligent natural language understanding system can reasonably be expected to handle. We present EQUATE (Evaluating Quantitative Understanding Aptitude in Textual Entailment), a new framework for quantitative reasoning in textual entailment. We benchmark the performance of 9 published NLI models on EQUATE, and find that on average…
▽ More
Quantitative reasoning is a higher-order reasoning skill that any intelligent natural language understanding system can reasonably be expected to handle. We present EQUATE (Evaluating Quantitative Understanding Aptitude in Textual Entailment), a new framework for quantitative reasoning in textual entailment. We benchmark the performance of 9 published NLI models on EQUATE, and find that on average, state-of-the-art methods do not achieve an absolute improvement over a majority-class baseline, suggesting that they do not implicitly learn to reason with quantities. We establish a new baseline Q-REAS that manipulates quantities symbolically. In comparison to the best performing NLI model, it achieves success on numerical reasoning tests (+24.2%), but has limited verbal reasoning capabilities (-8.1%). We hope our evaluation framework will support the development of models of quantitative reasoning in language understanding.
△ Less
Submitted 26 October, 2019; v1 submitted 11 January, 2019;
originally announced January 2019.
-
Parallelized Linear Classification with Volumetric Chemical Perceptrons
Authors:
Christopher E. Arcadia,
Hokchhay Tann,
Amanda Dombroski,
Kady Ferguson,
Shui Ling Chen,
Eunsuk Kim,
Christopher Rose,
Brenda M. Rubenstein,
Sherief Reda,
Jacob K. Rosenstein
Abstract:
In this work, we introduce a new type of linear classifier that is implemented in a chemical form. We propose a novel encoding technique which simultaneously represents multiple datasets in an array of microliter-scale chemical mixtures. Parallel computations on these datasets are performed as robotic liquid handling sequences, whose outputs are analyzed by high-performance liquid chromatography.…
▽ More
In this work, we introduce a new type of linear classifier that is implemented in a chemical form. We propose a novel encoding technique which simultaneously represents multiple datasets in an array of microliter-scale chemical mixtures. Parallel computations on these datasets are performed as robotic liquid handling sequences, whose outputs are analyzed by high-performance liquid chromatography. As a proof of concept, we chemically encode several MNIST images of handwritten digits and demonstrate successful chemical-domain classification of the digits using volumetric perceptrons. We additionally quantify the performance of our method with a larger dataset of binary vectors and compare the experimental measurements against predicted results. Paired with appropriate chemical analysis tools, our approach can work on increasingly parallel datasets. We anticipate that related approaches will be scalable to multilayer neural networks and other more complex algorithms. Much like recent demonstrations of archival data storage in DNA, this work blurs the line between chemical and electrical information systems, and offers early insight into the computational efficiency and massive parallelism which may come with computing in chemical domains.
△ Less
Submitted 11 October, 2018;
originally announced October 2018.
-
Combining Model-Free Q-Ensembles and Model-Based Approaches for Informed Exploration
Authors:
Sreecharan Sankaranarayanan,
Raghuram Mandyam Annasamy,
Katia Sycara,
Carolyn Penstein Rosé
Abstract:
Q-Ensembles are a model-free approach where input images are fed into different Q-networks and exploration is driven by the assumption that uncertainty is proportional to the variance of the output Q-values obtained. They have been shown to perform relatively well compared to other exploration strategies. Further, model-based approaches, such as encoder-decoder models have been used successfully f…
▽ More
Q-Ensembles are a model-free approach where input images are fed into different Q-networks and exploration is driven by the assumption that uncertainty is proportional to the variance of the output Q-values obtained. They have been shown to perform relatively well compared to other exploration strategies. Further, model-based approaches, such as encoder-decoder models have been used successfully for next frame prediction given previous frames. This paper proposes to integrate the model-free Q-ensembles and model-based approaches with the hope of compounding the benefits of both and achieving superior exploration as a result. Results show that a model-based trajectory memory approach when combined with Q-ensembles produces superior performance when compared to only using Q-ensembles.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
Stress Test Evaluation for Natural Language Inference
Authors:
Aakanksha Naik,
Abhilasha Ravichander,
Norman Sadeh,
Carolyn Rose,
Graham Neubig
Abstract:
Natural language inference (NLI) is the task of determining if a natural language hypothesis can be inferred from a given premise in a justifiable manner. NLI was proposed as a benchmark task for natural language understanding. Existing models perform well at standard datasets for NLI, achieving impressive results across different genres of text. However, the extent to which these models understan…
▽ More
Natural language inference (NLI) is the task of determining if a natural language hypothesis can be inferred from a given premise in a justifiable manner. NLI was proposed as a benchmark task for natural language understanding. Existing models perform well at standard datasets for NLI, achieving impressive results across different genres of text. However, the extent to which these models understand the semantic content of sentences is unclear. In this work, we propose an evaluation methodology consisting of automatically constructed "stress tests" that allow us to examine whether systems have the ability to make real inferential decisions. Our evaluation of six sentence-encoder models on these stress tests reveals strengths and weaknesses of these models with respect to challenging linguistic phenomena, and suggests important directions for future work in this area.
△ Less
Submitted 13 June, 2018; v1 submitted 2 June, 2018;
originally announced June 2018.
-
Attentive Interaction Model: Modeling Changes in View in Argumentation
Authors:
Yohan Jo,
Shivani Poddar,
Byungsoo Jeon,
Qinlan Shen,
Carolyn P. Rose,
Graham Neubig
Abstract:
We present a neural architecture for modeling argumentative dialogue that explicitly models the interplay between an Opinion Holder's (OH's) reasoning and a challenger's argument, with the goal of predicting if the argument successfully changes the OH's view. The model has two components: (1) vulnerable region detection, an attention model that identifies parts of the OH's reasoning that are amena…
▽ More
We present a neural architecture for modeling argumentative dialogue that explicitly models the interplay between an Opinion Holder's (OH's) reasoning and a challenger's argument, with the goal of predicting if the argument successfully changes the OH's view. The model has two components: (1) vulnerable region detection, an attention model that identifies parts of the OH's reasoning that are amenable to change, and (2) interaction encoding, which identifies the relationship between the content of the OH's reasoning and that of the challenger's argument. Based on evaluation on discussions from the Change My View forum on Reddit, the two components work together to predict an OH's change in view, outperforming several baselines. A posthoc analysis suggests that sentences picked out by the attention model are addressed more frequently by successful arguments than by unsuccessful ones.
△ Less
Submitted 18 April, 2018; v1 submitted 30 March, 2018;
originally announced April 2018.
-
Linguistic Markers of Influence in Informal Interactions
Authors:
Shrimai Prabhumoye,
Samridhi Choudhary,
Evangelia Spiliopoulou,
Christopher Bogart,
Carolyn Penstein Rose,
Alan W Black
Abstract:
There has been a long standing interest in understanding `Social Influence' both in Social Sciences and in Computational Linguistics. In this paper, we present a novel approach to study and measure interpersonal influence in daily interactions. Motivated by the basic principles of influence, we attempt to identify indicative linguistic features of the posts in an online knitting community. We pres…
▽ More
There has been a long standing interest in understanding `Social Influence' both in Social Sciences and in Computational Linguistics. In this paper, we present a novel approach to study and measure interpersonal influence in daily interactions. Motivated by the basic principles of influence, we attempt to identify indicative linguistic features of the posts in an online knitting community. We present the scheme used to operationalize and label the posts with indicator features. Experiments with the identified features show an improvement in the classification accuracy of influence by 3.15%. Our results illustrate the important correlation between the characteristics of the language and its potential to influence others.
△ Less
Submitted 14 July, 2017;
originally announced July 2017.
-
Capacity of Molecular Channels with Imperfect Particle-Intensity Modulation and Detection
Authors:
Nariman Farsad,
Christopher Rose,
Muriel Médard,
Andrea Goldsmith
Abstract:
This work introduces the particle-intensity channel (PIC) as a model for molecular communication systems and characterizes the properties of the optimal input distribution and the capacity limits for this system. In the PIC, the transmitter encodes information, in symbols of a given duration, based on the number of particles released, and the receiver detects and decodes the message based on the n…
▽ More
This work introduces the particle-intensity channel (PIC) as a model for molecular communication systems and characterizes the properties of the optimal input distribution and the capacity limits for this system. In the PIC, the transmitter encodes information, in symbols of a given duration, based on the number of particles released, and the receiver detects and decodes the message based on the number of particles detected during the symbol interval. In this channel, the transmitter may be unable to control precisely the number of particles released, and the receiver may not detect all the particles that arrive. We demonstrate that the optimal input distribution for this channel always has mass points at zero and the maximum number of particles that can be released. We then consider diffusive particle transport, derive the capacity expression when the input distribution is binary, and show conditions under which the binary input is capacity-achieving. In particular, we demonstrate that when the transmitter cannot generate particles at a high rate, the optimal input distribution is binary.
△ Less
Submitted 22 May, 2017;
originally announced May 2017.
-
Coordinating Collaborative Chat in Massive Open Online Courses
Authors:
Gaurav Singh Tomar,
Sreecharan Sankaranarayanan,
Xu Wang,
Carolyn Penstein Rosé
Abstract:
An earlier study of a collaborative chat intervention in a Massive Open Online Course (MOOC) identified negative effects on attrition stemming from a requirement for students to be matched with exactly one partner prior to beginning the activity. That study raised questions about how to orchestrate a collaborative chat intervention in a MOOC context in order to provide the benefit of synchronous s…
▽ More
An earlier study of a collaborative chat intervention in a Massive Open Online Course (MOOC) identified negative effects on attrition stemming from a requirement for students to be matched with exactly one partner prior to beginning the activity. That study raised questions about how to orchestrate a collaborative chat intervention in a MOOC context in order to provide the benefit of synchronous social engagement without the coordination difficulties. In this paper we present a careful analysis of an intervention designed to overcome coordination difficulties by welcoming students into the chat on a rolling basis as they arrive rather than requiring them to be matched with a partner before beginning. The results suggest the most positive impact when experiencing a chat with exactly one partner rather than more or less. A qualitative analysis of the chat data reveals differential experiences between these configurations that suggests a potential explanation for the effect and raises questions for future research.
△ Less
Submitted 18 April, 2017;
originally announced April 2017.