Search | arXiv e-print repository

arXiv:2406.19561 [pdf, other]

Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning

Authors: Bradley Burega, John D. Martin, Luke Kapeluck, Michael Bowling

Abstract: We study how a Reinforcement Learning (RL) system can remain sample-efficient when learning from an imperfect model of the environment. This is particularly challenging when the learning system is resource-constrained and in continual settings, where the environment dynamics change. To address these challenges, our paper introduces an online, meta-gradient algorithm that tunes a probability with w… ▽ More We study how a Reinforcement Learning (RL) system can remain sample-efficient when learning from an imperfect model of the environment. This is particularly challenging when the learning system is resource-constrained and in continual settings, where the environment dynamics change. To address these challenges, our paper introduces an online, meta-gradient algorithm that tunes a probability with which states are queried during Dyna-style planning. Our study compares the aggregate, empirical performance of this meta-gradient method to baselines that employ conventional sampling strategies. Results indicate that our method improves efficiency of the planning process, which, as a consequence, improves the sample-efficiency of the overall learning process. On the whole, we observe that our meta-learned solutions avoid several pathologies of conventional planning approaches, such as sampling inaccurate transitions and those that stall credit assignment. We believe these findings could prove useful, in future work, for designing model-based RL systems at scale. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.11880 [pdf, other]

Knowledge Return Oriented Prompting (KROP)

Authors: Jason Martin, Kenneth Yeung

Abstract: Many Large Language Models (LLMs) and LLM-powered apps deployed today use some form of prompt filter or alignment to protect their integrity. However, these measures aren't foolproof. This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks, rendering them virtually undetectable to most of these security measures. Many Large Language Models (LLMs) and LLM-powered apps deployed today use some form of prompt filter or alignment to protect their integrity. However, these measures aren't foolproof. This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks, rendering them virtually undetectable to most of these security measures. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.11036 [pdf, other]

garak: A Framework for Security Probing Large Language Models

Authors: Leon Derczynski, Erick Galinkin, Jeffrey Martin, Subho Majumdar, Nanna Inie

Abstract: As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models produce unpredictable output, are constantly updated, and the potential adversary is highly diverse: anyone with access to the internet and a decent command of natura… ▽ More As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models produce unpredictable output, are constantly updated, and the potential adversary is highly diverse: anyone with access to the internet and a decent command of natural language. Further, what constitutes a security weak in one context may not be an issue in a different context; one-fits-all guardrails remain theoretical. In this paper, we argue that it is time to rethink what constitutes ``LLM security'', and pursue a holistic approach to LLM security evaluation, where exploration and discovery of issues are central. To this end, this paper introduces garak (Generative AI Red-teaming and Assessment Kit), a framework which can be used to discover and identify vulnerabilities in a target LLM or dialog system. garak probes an LLM in a structured fashion to discover potential vulnerabilities. The outputs of the framework describe a target model's weaknesses, contribute to an informed discussion of what composes vulnerabilities in unique contexts, and can inform alignment and policy discussions for LLM deployment. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: https://garak.ai

arXiv:2406.00225 [pdf, other]

Kinematic Model of Magnetic Domain Wall Motion for Fast, High-Accuracy Simulations

Authors: Kristi Doleh, Leonard Humphrey, Chandler M. Linseisen, Michael D. Kitcher, Joanna M. Martin, Can Cui, Jean Anne C. Incorvia, Felipe Garcia-Sanchez, Naimul Hassan, Alexander J. Edwards, Joseph S. Friedman

Abstract: Domain wall (DW) devices have garnered recent interest for diverse applications including memory, logic, and neuromorphic primitives; fast, accurate device models are therefore imperative for large-scale system design and verification. Extant DW motion models are sub-optimal for large-scale system design either over-consuming compute resources with physics-heavy equations or oversimplifying the ph… ▽ More Domain wall (DW) devices have garnered recent interest for diverse applications including memory, logic, and neuromorphic primitives; fast, accurate device models are therefore imperative for large-scale system design and verification. Extant DW motion models are sub-optimal for large-scale system design either over-consuming compute resources with physics-heavy equations or oversimplifying the physics, drastically reducing model accuracy. We propose a DW model inspired by the phenomenological similarities between motions of a DW and a classical object being acted on by forces like air resistance or static friction. Our proposed phenomenological model predicts DW motion within 1.2% on average compared with micromagnetic simulations that are 400 times slower. Additionally our model is seven times faster than extant collective coordinate models and 14 times more accurate than extant hyper-reduced models making it an essential tool for large-scale DW circuit design and simulation. The model is publicly posted along with scripts that automatically extract model parameters from user-provided simulation or experimental data to extend the model to alternative micromagnetic parameters. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.09153 [pdf, other]

Adapting Abstract Meaning Representation Parsing to the Clinical Narrative -- the SPRING THYME parser

Authors: Jon Z. Cai, Kristin Wright-Bettner, Martha Palmer, Guergana K. Savova, James H. Martin

Abstract: This paper is dedicated to the design and evaluation of the first AMR parser tailored for clinical notes. Our objective was to facilitate the precise transformation of the clinical notes into structured AMR expressions, thereby enhancing the interpretability and usability of clinical text data at scale. Leveraging the colon cancer dataset from the Temporal Histories of Your Medical Events (THYME)… ▽ More This paper is dedicated to the design and evaluation of the first AMR parser tailored for clinical notes. Our objective was to facilitate the precise transformation of the clinical notes into structured AMR expressions, thereby enhancing the interpretability and usability of clinical text data at scale. Leveraging the colon cancer dataset from the Temporal Histories of Your Medical Events (THYME) corpus, we adapted a state-of-the-art AMR parser utilizing continuous training. Our approach incorporates data augmentation techniques to enhance the accuracy of AMR structure predictions. Notably, through this learning strategy, our parser achieved an impressive F1 score of 88% on the THYME corpus's colon cancer dataset. Moreover, our research delved into the efficacy of data required for domain adaptation within the realm of clinical notes, presenting domain adaptation data requirements for AMR parsing. This exploration not only underscores the parser's robust performance but also highlights its potential in facilitating a deeper understanding of clinical narratives through structured semantic representations. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted to the 6th Clinical NLP Workshop at NAACL, 2024

arXiv:2404.17730 [pdf, other]

Bridging the Social & Technical Divide in Augmentative and Alternative Communication (AAC) Applications for Autistic Adults

Authors: Lara J. Martin, Malathy Nagalakshmi

Abstract: Natural Language Processing (NLP) techniques are being used more frequently to improve high-tech Augmentative and Alternative Communication (AAC), but many of these techniques are integrated without the inclusion of the users' perspectives. As many of these tools are created with children in mind, autistic adults are often neglected in the design of AAC tools to begin with. We conducted in-depth i… ▽ More Natural Language Processing (NLP) techniques are being used more frequently to improve high-tech Augmentative and Alternative Communication (AAC), but many of these techniques are integrated without the inclusion of the users' perspectives. As many of these tools are created with children in mind, autistic adults are often neglected in the design of AAC tools to begin with. We conducted in-depth interviews with 12 autistic adults to find the pain points of current AAC and determine what general technological advances they would find helpful. We found that in addition to technological issues, there are many societal issues as well. We found 9 different categories of themes from our interviews: input options, output options, selecting or adapting AAC for a good fit, when to start or swap AAC, benefits (of use), access (to AAC), stumbling blocks for continued use, social concerns, and lack of control. In this paper, we go through these nine categories in depth and then suggest possible guidelines for the NLP community, AAC application makers, and policy makers to improve AAC use for autistic adults. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.08949 [pdf, other]

Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles

Authors: Abhijnan Nath, Huma Jamil, Shafiuddin Rehan Ahmed, George Baker, Rahul Ghosh, James H. Martin, Nathaniel Blanchard, Nikhil Krishnaswamy

Abstract: Event coreference resolution (ECR) is the task of determining whether distinct mentions of events within a multi-document corpus are actually linked to the same underlying occurrence. Images of the events can help facilitate resolution when language is ambiguous. Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple l… ▽ More Event coreference resolution (ECR) is the task of determining whether distinct mentions of events within a multi-document corpus are actually linked to the same underlying occurrence. Images of the events can help facilitate resolution when language is ambiguous. Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple linear map between vision and language models. As existing ECR benchmark datasets rarely provide images for all event mentions, we augment the popular ECB+ dataset with event-centric images scraped from the internet and generated using image diffusion models. We establish three methods that incorporate images and text for coreference: 1) a standard fused model with finetuning, 2) a novel linear map** method without finetuning and 3) an ensembling approach based on splitting mention pairs by semantic and discourse-level difficulty. We evaluate on 2 datasets: the augmented ECB+, and AIDA Phase 1. Our ensemble systems using cross-modal linear map** establish an upper limit (91.9 CoNLL F1) on ECB+ ECR performance given the preprocessing assumptions used, and establish a novel baseline on AIDA Phase 1. Our results demonstrate the utility of multimodal information in ECR for certain challenging coreference problems, and highlight a need for more multimodal resources in the coreference resolution space. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: To appear at LREC-COLING 2024

arXiv:2404.08656 [pdf, other]

Linear Cross-document Event Coreference Resolution with X-AMR

Authors: Shafiuddin Rehan Ahmed, George Arthur Baker, Evi Judge, Michael Regan, Kristin Wright-Bettner, Martha Palmer, James H. Martin

Abstract: Event Coreference Resolution (ECR) as a pairwise mention classification task is expensive both for automated systems and manual annotations. The task's quadratic difficulty is exacerbated when using Large Language Models (LLMs), making prompt engineering for ECR prohibitively costly. In this work, we propose a graphical representation of events, X-AMR, anchored around individual mentions using a \… ▽ More Event Coreference Resolution (ECR) as a pairwise mention classification task is expensive both for automated systems and manual annotations. The task's quadratic difficulty is exacerbated when using Large Language Models (LLMs), making prompt engineering for ECR prohibitively costly. In this work, we propose a graphical representation of events, X-AMR, anchored around individual mentions using a \textbf{cross}-document version of \textbf{A}bstract \textbf{M}eaning \textbf{R}epresentation. We then linearize the ECR with a novel multi-hop coreference algorithm over the event graphs. The event graphs simplify ECR, making it a) LLM cost-effective, b) compositional and interpretable, and c) easily annotated. For a fair assessment, we first enrich an existing ECR benchmark dataset with these event graphs using an annotator-friendly tool we introduce. Then, we employ GPT-4, the newest LLM by OpenAI, for these annotations. Finally, using the ECR algorithm, we assess GPT-4 against humans and analyze its limitations. Through this research, we aim to advance the state-of-the-art for efficient ECR and shed light on the potential shortcomings of current LLMs at this task. Code and annotations: \url{https://github.com/ahmeshaf/gpt_coref} △ Less

Submitted 24 March, 2024; originally announced April 2024.

Comments: LREC-COLING 2024 main conference

arXiv:2403.15407 [pdf, other]

X-AMR Annotation Tool

Authors: Shafiuddin Rehan Ahmed, Jon Z. Cai, Martha Palmer, James H. Martin

Abstract: This paper presents a novel Cross-document Abstract Meaning Representation (X-AMR) annotation tool designed for annotating key corpus-level event semantics. Leveraging machine assistance through the Prodigy Annotation Tool, we enhance the user experience, ensuring ease and efficiency in the annotation process. Through empirical analyses, we demonstrate the effectiveness of our tool in augmenting a… ▽ More This paper presents a novel Cross-document Abstract Meaning Representation (X-AMR) annotation tool designed for annotating key corpus-level event semantics. Leveraging machine assistance through the Prodigy Annotation Tool, we enhance the user experience, ensuring ease and efficiency in the annotation process. Through empirical analyses, we demonstrate the effectiveness of our tool in augmenting an existing event corpus, highlighting its advantages when integrated with GPT-4. Code and annotations: https://github.com/ahmeshaf/gpt_coref △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: EACL 2024 System Demonstration

arXiv:2403.04761 [pdf, other]

doi 10.1145/3613904.3642001

DeepSee: Multidimensional Visualizations of Seabed Ecosystems

Authors: Adam Coscia, Haley M. Sapers, Noah Deutsch, Malika Khurana, John S. Magyar, Sergio A. Parra, Daniel R. Utter, Rebecca L. Wipfler, David W. Caress, Eric J. Martin, Jennifer B. Paduan, Maggie Hendrie, Santiago Lombeyda, Hillary Mushkin, Alex Endert, Scott Davidoff, Victoria J. Orphan

Abstract: Scientists studying deep ocean microbial ecosystems use limited numbers of sediment samples collected from the seafloor to characterize important life-sustaining biogeochemical cycles in the environment. Yet conducting fieldwork to sample these extreme remote environments is both expensive and time consuming, requiring tools that enable scientists to explore the sampling history of field sites and… ▽ More Scientists studying deep ocean microbial ecosystems use limited numbers of sediment samples collected from the seafloor to characterize important life-sustaining biogeochemical cycles in the environment. Yet conducting fieldwork to sample these extreme remote environments is both expensive and time consuming, requiring tools that enable scientists to explore the sampling history of field sites and predict where taking new samples is likely to maximize scientific return. We conducted a collaborative, user-centered design study with a team of scientific researchers to develop DeepSee, an interactive data workspace that visualizes 2D and 3D interpolations of biogeochemical and microbial processes in context together with sediment sampling history overlaid on 2D seafloor maps. Based on a field deployment and qualitative interviews, we found that DeepSee increased the scientific return from limited sample sizes, catalyzed new research workflows, reduced long-term costs of sharing data, and supported teamwork and communication between team members with diverse research goals. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted to CHI 2024. 16 pages, 7 figures, 2 tables. For a demo video, see https://youtu.be/HJ4zbueJ9cs . For a live demo, visit https://www.its.caltech.edu/~datavis/deepsee/ . The source code is available at https://github.com/orphanlab/DeepSee

arXiv:2402.07462 [pdf]

A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse?

Authors: Nathan I. N. Henry, Mangor Pedersen, Matt Williams, Jamin L. B. Martin, Liesje Donkin

Abstract: The value-loading problem is a significant challenge for researchers aiming to create artificial intelligence (AI) systems that align with human values and preferences. This problem requires a method to define and regulate safe and optimal limits of AI behaviors. In this work, we propose HALO (Hormetic ALignment via Opponent processes), a regulatory paradigm that uses hormetic analysis to regulate… ▽ More The value-loading problem is a significant challenge for researchers aiming to create artificial intelligence (AI) systems that align with human values and preferences. This problem requires a method to define and regulate safe and optimal limits of AI behaviors. In this work, we propose HALO (Hormetic ALignment via Opponent processes), a regulatory paradigm that uses hormetic analysis to regulate the behavioral patterns of AI. Behavioral hormesis is a phenomenon where low frequencies of a behavior have beneficial effects, while high frequencies are harmful. By modeling behaviors as allostatic opponent processes, we can use either Behavioral Frequency Response Analysis (BFRA) or Behavioral Count Response Analysis (BCRA) to quantify the hormetic limits of repeatable behaviors. We demonstrate how HALO can solve the 'paperclip maximizer' scenario, a thought experiment where an unregulated AI tasked with making paperclips could end up converting all matter in the universe into paperclips. Our approach may be used to help create an evolving database of 'values' based on the hedonic calculus of repeatable behaviors with decreasing marginal utility. This positions HALO as a promising solution for the value-loading problem, which involves embedding human-aligned values into an AI system, and the weak-to-strong generalization problem, which explores whether weak models can supervise stronger models as they become more intelligent. Hence, HALO opens several research avenues that may lead to the development of a computational value system that allows an AI algorithm to learn whether the decisions it makes are right or wrong. △ Less

Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: 24 pages, 7 figures

MSC Class: 68T01; 68T37; 68T42 ACM Class: I.2.0; I.2.8; I.2.11

arXiv:2401.03306 [pdf, other]

MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning

Authors: Rafael Rafailov, Kyle Hatch, Victor Kolev, John D. Martin, Mariano Phielipp, Chelsea Finn

Abstract: We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations in the context of realistic robot tasks. Recent offline model-free approaches successfully use online fine-tuning to either improve the performance of the agent over the data collection policy or adapt to novel tasks. At the same time, model-based RL algorithms have ach… ▽ More We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations in the context of realistic robot tasks. Recent offline model-free approaches successfully use online fine-tuning to either improve the performance of the agent over the data collection policy or adapt to novel tasks. At the same time, model-based RL algorithms have achieved significant progress in sample efficiency and the complexity of the tasks they can solve, yet remain under-utilized in the fine-tuning setting. In this work, we argue that existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains due to issues with distribution shifts, off-dynamics data, and non-stationary rewards. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization, while preventing model exploitation by controlling epistemic uncertainty. We find that our approach successfully solves tasks from the MetaWorld benchmark, as well as the Franka Kitchen robot manipulation environment completely from images. To the best of our knowledge, MOTO is the first method to solve this environment from pixels. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: This is an updated version of a manuscript that originally appeared at CoRL 2023. The project website is here https://sites.google.com/view/mo2o

Journal ref: Proceedings of The 7th Conference on Robot Learning, PMLR 229:3654-3671, 2023

arXiv:2312.10257 [pdf, other]

The Physics-Informed Neural Network Gravity Model: Generation III

Authors: John Martin, Hanspeter Schaub

Abstract: Scientific machine learning and the advent of the Physics-Informed Neural Network (PINN) show considerable potential in their capacity to identify solutions to complex differential equations. Over the past two years, much work has gone into the development of PINNs capable of solving the gravity field modeling problem -- i.e.\ learning a differentiable form of the gravitational potential from posi… ▽ More Scientific machine learning and the advent of the Physics-Informed Neural Network (PINN) show considerable potential in their capacity to identify solutions to complex differential equations. Over the past two years, much work has gone into the development of PINNs capable of solving the gravity field modeling problem -- i.e.\ learning a differentiable form of the gravitational potential from position and acceleration estimates. While the past PINN gravity models (PINN-GMs) have demonstrated advantages in model compactness, robustness to noise, and sample efficiency; there remain key modeling challenges which this paper aims to address. Specifically, this paper introduces the third generation of the Physics-Informed Neural Network Gravity Model (PINN-GM-III) which solves the problems of extrapolation error, bias towards low-altitude samples, numerical instability at high-altitudes, and compliant boundary conditions through numerous modifications to the model's design. The PINN-GM-III is tested by modeling a known heterogeneous density asteroid, and its performance is evaluated using seven core metrics which showcases its strengths against its predecessors and other analytic and numerical gravity models. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 42 pages, 14 figures, submitted to The Journal of Astronautical Sciences

arXiv:2312.09394 [pdf, other]

HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents

Authors: Dániel Horváth, Jesús Bujalance Martín, Ferenc Gábor Erdős, Zoltán Istenes, Fabien Moutarde

Abstract: Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Inspired by human learning, in this work, we propose a metho… ▽ More Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Inspired by human learning, in this work, we propose a method named highlight experience replay (HiER) that creates a secondary highlight replay buffer for the most relevant experiences. For the weights update, the transitions are sampled from both the standard and the highlight experience replay buffer. It can be applied with or without the techniques of hindsight experience replay (HER) and prioritized experience replay (PER). Our method significantly improves the performance of the state-of-the-art, validated on 8 tasks of three robotic benchmarks. Furthermore, to exploit the full potential of HiER, we propose HiER+ in which HiER is enhanced with an arbitrary data collection curriculum learning method. Our implementation, the qualitative results, and a video presentation are available on the project site: http://www.danielhorvath.eu/hier/. △ Less

Submitted 9 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accpeted for publication in IEEE Access

arXiv:2311.10928 [pdf, other]

CAMRA: Copilot for AMR Annotation

Authors: Jon Z. Cai, Shafiuddin Rehan Ahmed, Julia Bonn, Kristin Wright-Bettner, Martha Palmer, James H. Martin

Abstract: In this paper, we introduce CAMRA (Copilot for AMR Annotatations), a cutting-edge web-based tool designed for constructing Abstract Meaning Representation (AMR) from natural language text. CAMRA offers a novel approach to deep lexical semantics annotation such as AMR, treating AMR annotation akin to coding in programming languages. Leveraging the familiarity of programming paradigms, CAMRA encompa… ▽ More In this paper, we introduce CAMRA (Copilot for AMR Annotatations), a cutting-edge web-based tool designed for constructing Abstract Meaning Representation (AMR) from natural language text. CAMRA offers a novel approach to deep lexical semantics annotation such as AMR, treating AMR annotation akin to coding in programming languages. Leveraging the familiarity of programming paradigms, CAMRA encompasses all essential features of existing AMR editors, including example lookup, while going a step further by integrating Propbank roleset lookup as an autocomplete feature within the tool. Notably, CAMRA incorporates AMR parser models as coding co-pilots, greatly enhancing the efficiency and accuracy of AMR annotators. To demonstrate the tool's capabilities, we provide a live demo accessible at: https://camra.colorado.edu △ Less

Submitted 20 February, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: EMNLP 2023 System Demonstration

arXiv:2311.01468 [pdf, other]

Remember what you did so you know what to do next

Authors: Manuel R. Ciosici, Alex Hedges, Yash Kankanampati, Justin Martin, Marjorie Freedman, Ralph Weischedel

Abstract: We explore using a moderately sized large language model (GPT-J 6B parameters) to create a plan for a simulated robot to achieve 30 classes of goals in ScienceWorld, a text game simulator for elementary science experiments. Previously published empirical work claimed that large language models (LLMs) are a poor fit (Wang et al., 2022) compared to reinforcement learning. Using the Markov assumption… ▽ More We explore using a moderately sized large language model (GPT-J 6B parameters) to create a plan for a simulated robot to achieve 30 classes of goals in ScienceWorld, a text game simulator for elementary science experiments. Previously published empirical work claimed that large language models (LLMs) are a poor fit (Wang et al., 2022) compared to reinforcement learning. Using the Markov assumption (a single previous step), the LLM outperforms the reinforcement learning-based approach by a factor of 1.4. When we fill the LLM's input buffer with as many prior steps as possible, improvement rises to 3.5x. Even when training on only 6.5% of the training data, we observe a 2.2x improvement over the reinforcement-learning-based approach. Our experiments show that performance varies widely across the 30 classes of actions, indicating that averaging over tasks can hide significant performance issues. In work contemporaneous with ours, Lin et al. (2023) demonstrated a two-part approach (SwiftSage) that uses a small LLM (T5-large) complemented by OpenAI's massive LLMs to achieve outstanding results in ScienceWorld. Our 6-B parameter, single-stage GPT-J matches the performance of SwiftSage's two-stage architecture when it incorporates GPT-3.5 turbo which has 29-times more parameters than GPT-J. △ Less

Submitted 30 October, 2023; originally announced November 2023.

Comments: Identical to EMNLP 2023 Findings

arXiv:2310.07084 [pdf, other]

Investigating the Adversarial Robustness of Density Estimation Using the Probability Flow ODE

Authors: Marius Arvinte, Cory Cornelius, Jason Martin, Nageen Himayat

Abstract: Beyond their impressive sampling capabilities, score-based diffusion models offer a powerful analysis tool in the form of unbiased density estimation of a query sample under the training data distribution. In this work, we investigate the robustness of density estimation using the probability flow (PF) neural ordinary differential equation (ODE) model against gradient-based likelihood maximization… ▽ More Beyond their impressive sampling capabilities, score-based diffusion models offer a powerful analysis tool in the form of unbiased density estimation of a query sample under the training data distribution. In this work, we investigate the robustness of density estimation using the probability flow (PF) neural ordinary differential equation (ODE) model against gradient-based likelihood maximization attacks and the relation to sample complexity, where the compressed size of a sample is used as a measure of its complexity. We introduce and evaluate six gradient-based log-likelihood maximization attacks, including a novel reverse integration attack. Our experimental evaluations on CIFAR-10 show that density estimation using the PF ODE is robust against high-complexity, high-likelihood attacks, and that in some cases adversarial samples are semantically meaningful, as expected from a robust estimator. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.16744 [pdf]

Predicting Long-term Renal Impairment in Post-COVID-19 Patients with Machine Learning Algorithms

Authors: Maitham G. Yousif, Hector J. Castro, John Martin, Hayder A. Albaqer, Fadhil G. Al-Amran, Habeeb W. Shubber, Salman Rawaf

Abstract: The COVID-19 pandemic has had far-reaching implications for global public health. As we continue to grapple with its consequences, it becomes increasingly clear that post-COVID-19 complications are a significant concern. Among these complications, renal impairment has garnered particular attention due to its potential long-term health impacts. This study, conducted with a cohort of 821 post-COVID-… ▽ More The COVID-19 pandemic has had far-reaching implications for global public health. As we continue to grapple with its consequences, it becomes increasingly clear that post-COVID-19 complications are a significant concern. Among these complications, renal impairment has garnered particular attention due to its potential long-term health impacts. This study, conducted with a cohort of 821 post-COVID-19 patients from diverse regions of Iraq across the years 2021, 2022, and 2023, endeavors to predict the risk of long-term renal impairment using advanced machine learning algorithms. Our findings have the potential to revolutionize post-COVID-19 patient care by enabling early identification and intervention for those at risk of renal impairment, ultimately improving clinical outcomes. This research encompasses comprehensive data collection and preprocessing, feature selection, and the development of predictive models using various machine learning algorithms. The study's objectives are to assess the incidence of long-term renal impairment in post-COVID-19 patients, identify associated risk factors, create predictive models, and evaluate their accuracy. We anticipate that our machine learning models, drawing from a rich dataset, will provide valuable insights into the risk of renal impairment, ultimately enhancing patient care and quality of life. In conclusion, the research presented herein offers a critical contribution to the field of post-COVID-19 care. By harnessing the power of machine learning, we aim to predict long-term renal impairment risk accurately. These predictions have the potential to inform healthcare professionals, enabling them to take proactive measures and provide targeted interventions for post-COVID-19 patients at risk of renal complications, thus minimizing the impact of this serious health concern. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.16066 [pdf, other]

Label Augmentation Method for Medical Landmark Detection in Hip Radiograph Images

Authors: Yehyun Suh, Peter Chan, J. Ryan Martin, Daniel Moyer

Abstract: This work reports the empirical performance of an automated medical landmark detection method for predict clinical markers in hip radiograph images. Notably, the detection method was trained using a label-only augmentation scheme; our results indicate that this form of augmentation outperforms traditional data augmentation and produces highly sample efficient estimators. We train a generic U-Net-b… ▽ More This work reports the empirical performance of an automated medical landmark detection method for predict clinical markers in hip radiograph images. Notably, the detection method was trained using a label-only augmentation scheme; our results indicate that this form of augmentation outperforms traditional data augmentation and produces highly sample efficient estimators. We train a generic U-Net-based architecture under a curriculum consisting of two phases: initially relaxing the landmarking task by enlarging the label points to regions, then gradually eroding these label regions back to the base task. We measure the benefits of this approach on six datasets of radiographs with gold-standard expert annotations. △ Less

Submitted 8 December, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.09993 [pdf]

Long-term Neurological Sequelae in Post-COVID-19 Patients: A Machine Learning Approach to Predict Outcomes

Authors: Hayder A. Albaqer, Kadhum J. Al-Jibouri, John Martin, Fadhil G. Al-Amran, Salman Rawaf, Maitham G. Yousif

Abstract: The COVID-19 pandemic has brought to light a concerning aspect of long-term neurological complications in post-recovery patients. This study delved into the investigation of such neurological sequelae in a cohort of 500 post-COVID-19 patients, encompassing individuals with varying illness severity. The primary aim was to predict outcomes using a machine learning approach based on diverse clinical… ▽ More The COVID-19 pandemic has brought to light a concerning aspect of long-term neurological complications in post-recovery patients. This study delved into the investigation of such neurological sequelae in a cohort of 500 post-COVID-19 patients, encompassing individuals with varying illness severity. The primary aim was to predict outcomes using a machine learning approach based on diverse clinical data and neuroimaging parameters. The results revealed that 68% of the post-COVID-19 patients reported experiencing neurological symptoms, with fatigue, headache, and anosmia being the most common manifestations. Moreover, 22% of the patients exhibited more severe neurological complications, including encephalopathy and stroke. The application of machine learning models showed promising results in predicting long-term neurological outcomes. Notably, the Random Forest model achieved an accuracy of 85%, sensitivity of 80%, and specificity of 90% in identifying patients at risk of develo** neurological sequelae. These findings underscore the importance of continuous monitoring and follow-up care for post-COVID-19 patients, particularly in relation to potential neurological complications. The integration of machine learning-based outcome prediction offers a valuable tool for early intervention and personalized treatment strategies, aiming to improve patient care and clinical decision-making. In conclusion, this study sheds light on the prevalence of long-term neurological complications in post-COVID-19 patients and demonstrates the potential of machine learning in predicting outcomes, thereby contributing to enhanced patient management and better health outcomes. Further research and larger studies are warranted to validate and refine these predictive models and to gain deeper insights into the underlying mechanisms of post-COVID-19 neurological sequelae. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2308.16258 [pdf, other]

Robust Principles: Architectural Design Principles for Adversarially Robust CNNs

Authors: ShengYun Peng, Weilin Xu, Cory Cornelius, Matthew Hull, Kevin Li, Rahul Duggal, Mansi Phute, Jason Martin, Duen Horng Chau

Abstract: Our research aims to unify existing works' diverging opinions on how architectural components affect the adversarial robustness of CNNs. To accomplish our goal, we synthesize a suite of three generalizable robust architectural design principles: (a) optimal range for depth and width configurations, (b) preferring convolutional over patchify stem stage, and (c) robust residual block design through… ▽ More Our research aims to unify existing works' diverging opinions on how architectural components affect the adversarial robustness of CNNs. To accomplish our goal, we synthesize a suite of three generalizable robust architectural design principles: (a) optimal range for depth and width configurations, (b) preferring convolutional over patchify stem stage, and (c) robust residual block design through adopting squeeze and excitation blocks and non-parametric smooth activation functions. Through extensive experiments across a wide spectrum of dataset scales, adversarial training methods, model parameters, and network design spaces, our principles consistently and markedly improve AutoAttack accuracy: 1-3 percentage points (pp) on CIFAR-10 and CIFAR-100, and 4-9 pp on ImageNet. The code is publicly available at https://github.com/poloclub/robust-principles. △ Less

Submitted 31 August, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Published at BMVC'23

arXiv:2308.07540 [pdf, other]

doi 10.1609/aiide.v19i1.27534

CALYPSO: LLMs as Dungeon Masters' Assistants

Authors: Andrew Zhu, Lara J. Martin, Andrew Head, Chris Callison-Burch

Abstract: The role of a Dungeon Master, or DM, in the game Dungeons & Dragons is to perform multiple tasks simultaneously. The DM must digest information about the game setting and monsters, synthesize scenes to present to other players, and respond to the players' interactions with the scene. Doing all of these tasks while maintaining consistency within the narrative and story world is no small feat of hum… ▽ More The role of a Dungeon Master, or DM, in the game Dungeons & Dragons is to perform multiple tasks simultaneously. The DM must digest information about the game setting and monsters, synthesize scenes to present to other players, and respond to the players' interactions with the scene. Doing all of these tasks while maintaining consistency within the narrative and story world is no small feat of human cognition, making the task tiring and unapproachable to new players. Large language models (LLMs) like GPT-3 and ChatGPT have shown remarkable abilities to generate coherent natural language text. In this paper, we conduct a formative evaluation with DMs to establish the use cases of LLMs in D&D and tabletop gaming generally. We introduce CALYPSO, a system of LLM-powered interfaces that support DMs with information and inspiration specific to their own scenario. CALYPSO distills game context into bite-sized prose and helps brainstorm ideas without distracting the DM from the game. When given access to CALYPSO, DMs reported that it generated high-fidelity text suitable for direct presentation to players, and low-fidelity ideas that the DM could develop further while maintaining their creative agency. We see CALYPSO as exemplifying a paradigm of AI-augmented tools that provide synchronous creative assistance within established game worlds, and tabletop gaming more broadly. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 11 pages, 4 figures. AIIDE 2023

Journal ref: AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE) 2023

arXiv:2307.14628 [pdf, other]

Rapid and Scalable Bayesian AB Testing

Authors: Srivas Chennu, Andrew Maher, Christian Pangerl, Subash Prabanantham, Jae Hyeon Bae, Jamie Martin, Bud Goswami

Abstract: AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners, and the constraints imposed by the statistical hypothesis testing methodologies commonly used for analysis of AB tests. These include the lack of statistical p… ▽ More AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners, and the constraints imposed by the statistical hypothesis testing methodologies commonly used for analysis of AB tests. These include the lack of statistical power in multivariate designs with many factors, correlations between these factors, the need of sequential testing for early stop**, and the inability to pool knowledge from past tests. Here, we propose a solution that applies hierarchical Bayesian estimation to address the above limitations. In comparison to current sequential AB testing methodology, we increase statistical power by exploiting correlations between factors, enabling sequential testing and progressive early stop**, without incurring excessive false positive risk. We also demonstrate how this methodology can be extended to enable the extraction of composite global learnings from past AB tests, to accelerate future tests. We underpin our work with a solid theoretical framework that articulates the value of hierarchical estimation. We demonstrate its utility using both numerical simulations and a large set of real-world AB tests. Together, these results highlight the practical value of our approach for statistical inference in the technology industry. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: The 10th IEEE International Conference On Data Science And Advanced Analytics

arXiv:2306.05434 [pdf, other]

How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?

Authors: Shafiuddin Rehan Ahmed, Abhijnan Nath, Michael Regan, Adam Pollins, Nikhil Krishnaswamy, James H. Martin

Abstract: Annotating cross-document event coreference links is a time-consuming and cognitively demanding task that can compromise annotation quality and efficiency. To address this, we propose a model-in-the-loop annotation approach for event coreference resolution, where a machine learning model suggests likely corefering event pairs only. We evaluate the effectiveness of this approach by first simulating… ▽ More Annotating cross-document event coreference links is a time-consuming and cognitively demanding task that can compromise annotation quality and efficiency. To address this, we propose a model-in-the-loop annotation approach for event coreference resolution, where a machine learning model suggests likely corefering event pairs only. We evaluate the effectiveness of this approach by first simulating the annotation process and then, using a novel annotator-centric Recall-Annotation effort trade-off metric, we compare the results of various underlying models and datasets. We finally present a method for obtaining 97\% recall while substantially reducing the workload required by a fully manual annotation process. Code and data can be found at https://github.com/ahmeshaf/model_in_coref △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: The 17th Liguistics Annotation Workshop, 2023 (LAW-XVII) short paper. 10 pages, 6 figures, 1 table

arXiv:2305.05672 [pdf, other]

$2 * n$ is better than $n^2$: Decomposing Event Coreference Resolution into Two Tractable Problems

Authors: Shafiuddin Rehan Ahmed, Abhijnan Nath, James H. Martin, Nikhil Krishnaswamy

Abstract: Event Coreference Resolution (ECR) is the task of linking mentions of the same event either within or across documents. Most mention pairs are not coreferent, yet many that are coreferent can be identified through simple techniques such as lemma matching of the event triggers or the sentences in which they appear. Existing methods for training coreference systems sample from a largely skewed distr… ▽ More Event Coreference Resolution (ECR) is the task of linking mentions of the same event either within or across documents. Most mention pairs are not coreferent, yet many that are coreferent can be identified through simple techniques such as lemma matching of the event triggers or the sentences in which they appear. Existing methods for training coreference systems sample from a largely skewed distribution, making it difficult for the algorithm to learn coreference beyond surface matching. Additionally, these methods are intractable because of the quadratic operations needed. To address these challenges, we break the problem of ECR into two parts: a) a heuristic to efficiently filter out a large number of non-coreferent pairs, and b) a training approach on a balanced set of coreferent and non-coreferent mention pairs. By following this approach, we show that we get comparable results to the state of the art on two popular ECR datasets while significantly reducing compute requirements. We also analyze the mention pairs that are "hard" to accurately classify as coreferent or non-coreferent. Code at https://github.com/ahmeshaf/lemma_ce_coref △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: Findings of the Association of Computational Linguistics, ACL 2023. 13 pages, 7 figures, 6 tables

arXiv:2305.01528 [pdf, other]

doi 10.18653/v1/2023.acl-long.229

FIREBALL: A Dataset of Dungeons and Dragons Actual-Play with Structured Game State Information

Authors: Andrew Zhu, Karmanya Aggarwal, Alexander Feng, Lara J. Martin, Chris Callison-Burch

Abstract: Dungeons & Dragons (D&D) is a tabletop roleplaying game with complex natural language interactions between players and hidden state information. Recent work has shown that large language models (LLMs) that have access to state information can generate higher quality game turns than LLMs that use dialog history alone. However, previous work used game state information that was heuristically created… ▽ More Dungeons & Dragons (D&D) is a tabletop roleplaying game with complex natural language interactions between players and hidden state information. Recent work has shown that large language models (LLMs) that have access to state information can generate higher quality game turns than LLMs that use dialog history alone. However, previous work used game state information that was heuristically created and was not a true gold standard game state. We present FIREBALL, a large dataset containing nearly 25,000 unique sessions from real D&D gameplay on Discord with true game state info. We recorded game play sessions of players who used the Avrae bot, which was developed to aid people in playing D&D online, capturing language, game commands and underlying game state information. We demonstrate that FIREBALL can improve natural language generation (NLG) by using Avrae state information, improving both automated metrics and human judgments of quality. Additionally, we show that LLMs can generate executable Avrae commands, particularly after finetuning. △ Less

Submitted 25 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

Comments: 21 pages, 2 figures. Accepted at ACL 2023

Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 4171-4193

arXiv:2304.09996 [pdf, other]

Robust Route Planning with Distributional Reinforcement Learning in a Stochastic Road Network Environment

Authors: Xi Lin, Paul Szenher, John D. Martin, Brendan Englot

Abstract: Route planning is essential to mobile robot navigation problems. In recent years, deep reinforcement learning (DRL) has been applied to learning optimal planning policies in stochastic environments without prior knowledge. However, existing works focus on learning policies that maximize the expected return, the performance of which can vary greatly when the level of stochasticity in the environmen… ▽ More Route planning is essential to mobile robot navigation problems. In recent years, deep reinforcement learning (DRL) has been applied to learning optimal planning policies in stochastic environments without prior knowledge. However, existing works focus on learning policies that maximize the expected return, the performance of which can vary greatly when the level of stochasticity in the environment is high. In this work, we propose a distributional reinforcement learning based framework that learns return distributions which explicitly reflect environmental stochasticity. Policies based on the second-order stochastic dominance (SSD) relation can be used to make adjustable route decisions according to user preference on performance robustness. Our proposed method is evaluated in a simulated road network environment, and experimental results show that our method is able to plan the shortest routes that minimize stochasticity in travel time when robustness is preferred, while other state-of-the-art DRL methods are agnostic to environmental stochasticity. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: The 20th International Conference on Ubiquitous Robots (UR 2023)

arXiv:2302.13048 [pdf, other]

doi 10.18653/v1/2023.acl-demo.1

Human-in-the-Loop Schema Induction

Authors: Tianyi Zhang, Isaac Tham, Zhaoyi Hou, Jiaxuan Ren, Liyang Zhou, Hainiu Xu, Li Zhang, Lara J. Martin, Rotem Dror, Sha Li, Heng Ji, Martha Palmer, Susan Brown, Reece Suchocki, Chris Callison-Burch

Abstract: Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction(IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic el… ▽ More Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction(IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic elements, manual edit of those elements, and conversion of those into a schema graph. By qualitatively comparing our system to previous ones, we show that our system not only transfers to new domains more easily than previous approaches, but also reduces efforts of human curation thanks to our interactive interface. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: 10 pages, ACL2023 demo track

arXiv:2302.12944 [pdf, other]

Dependency Dialogue Acts -- Annotation Scheme and Case Study

Authors: Jon Z. Cai, Brendan King, Margaret Perkoff, Shiran Dudy, Jie Cao, Marie Grace, Natalia Wojarnik, Ananya Ganesh, James H. Martin, Martha Palmer, Marilyn Walker, Jeffrey Flanigan

Abstract: In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse,… ▽ More In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse, and response structure in multi-party multi-threaded conversations. A few key features distinguish DDA from existing dialogue annotation frameworks such as SWBD-DAMSL and the ISO 24617-2 standard. First, DDA prioritizes the relational structure of the dialogue units and the dialog context, annotating both dialog acts and rhetorical relations as response relations to particular utterances. Second, DDA embraces overloading in dialogues, encouraging annotators to specify multiple response relations and dialog acts for each dialog unit. Lastly, DDA places an emphasis on adequately capturing how a speaker is using the full dialog context to plan and organize their speech. With these features, DDA is highly expressive and recall-oriented with regard to conversation dynamics between multiple speakers. In what follows, we present the DDA annotation framework and case studies annotating DDA structures in multi-party, multi-threaded conversations. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: The 13th International Workshop on Spoken Dialogue Systems Technology

Journal ref: The 13th International Workshop on Spoken Dialogue Systems Technology 2023

arXiv:2302.01337 [pdf]

doi 10.1109/CSCI58124.2022.00289

Comprehensive and user-analytics-friendly cancer patient database for physicians and researchers

Authors: Ali Firooz, Avery T. Funkhouser, Julie C. Martin, W. Jeffery Edenfield, Homayoun Valafar, Anna V. Blenda

Abstract: Nuanced cancer patient care is needed, as the development and clinical course of cancer is multifactorial with influences from the general health status of the patient, germline and neoplastic mutations, co-morbidities, and environment. To effectively tailor an individualized treatment to each patient, such multifactorial data must be presented to providers in an easy-to-access and easy-to-analyze… ▽ More Nuanced cancer patient care is needed, as the development and clinical course of cancer is multifactorial with influences from the general health status of the patient, germline and neoplastic mutations, co-morbidities, and environment. To effectively tailor an individualized treatment to each patient, such multifactorial data must be presented to providers in an easy-to-access and easy-to-analyze fashion. To address the need, a relational database has been developed integrating status of cancer-critical gene mutations, serum galectin profiles, serum and tumor glycomic profiles, with clinical, demographic, and lifestyle data points of individual cancer patients. The database, as a backend, provides physicians and researchers with a single, easily accessible repository of cancer profiling data to aid-in and enhance individualized treatment. Our interactive database allows care providers to amalgamate cohorts from these groups to find correlations between different data types with the possibility of finding "molecular signatures" based upon a combination of genetic mutations, galectin serum levels, glycan compositions, and patient clinical data and lifestyle choices. Our project provides a framework for an integrated, interactive, and growing database to analyze molecular and clinical patterns across cancer stages and subtypes and provides opportunities for increased diagnostic and prognostic power. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: 7 pages, 12 figures, peer reviewed and accepted in "International Conference on Computational Science and Computational Intelligence (CSCI 22)"

Journal ref: Proceedings of the 2022 International Conference on Computational Science and Computational Intelligence (CSCI)

arXiv:2301.08104 [pdf, other]

doi 10.1609/icwsm.v17i1.22141

Author as Character and Narrator: Deconstructing Personal Narratives from the r/AmITheAsshole Reddit Community

Authors: Salvatore Giorgi, Ke Zhao, Alexander H. Feng, Lara J. Martin

Abstract: In the r/AmITheAsshole subreddit, people anonymously share first person narratives that contain some moral dilemma or conflict and ask the community to judge who is at fault (i.e., who is "the asshole"). In general, first person narratives are a unique storytelling domain where the author is the narrator (the person telling the story) but can also be a character (the person living the story) and,… ▽ More In the r/AmITheAsshole subreddit, people anonymously share first person narratives that contain some moral dilemma or conflict and ask the community to judge who is at fault (i.e., who is "the asshole"). In general, first person narratives are a unique storytelling domain where the author is the narrator (the person telling the story) but can also be a character (the person living the story) and, thus, the author has two distinct voices presented in the story. In this study, we identify linguistic and narrative features associated with the author as the character or as a narrator. We use these features to answer the following questions: (1) what makes an asshole character and (2) what makes an asshole narrator? We extract both Author-as-Character features (e.g., demographics, narrative event chain, and emotional arc) and Author-as-Narrator features (i.e., the style and emotion of the story as a whole) in order to identify which aspects of the narrative are correlated with the final moral judgment. Our work shows that "assholes" as Characters frame themselves as lacking agency with a more positive personal arc, while "assholes" as Narrators will tell emotional and opinionated stories. △ Less

Submitted 15 March, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

Comments: Accepted to the 17th International AAAI Conference on Web and Social Media (ICWSM), 2023

Journal ref: Proceedings of the International AAAI Conference on Web and Social Media (ICWSM) 2023, 17(1), 233-244

arXiv:2301.03110 [pdf, other]

RobArch: Designing Robust Architectures against Adversarial Attacks

Authors: ShengYun Peng, Weilin Xu, Cory Cornelius, Kevin Li, Rahul Duggal, Duen Horng Chau, Jason Martin

Abstract: Adversarial Training is the most effective approach for improving the robustness of Deep Neural Networks (DNNs). However, compared to the large body of research in optimizing the adversarial training process, there are few investigations into how architecture components affect robustness, and they rarely constrain model capacity. Thus, it is unclear where robustness precisely comes from. In this w… ▽ More Adversarial Training is the most effective approach for improving the robustness of Deep Neural Networks (DNNs). However, compared to the large body of research in optimizing the adversarial training process, there are few investigations into how architecture components affect robustness, and they rarely constrain model capacity. Thus, it is unclear where robustness precisely comes from. In this work, we present the first large-scale systematic study on the robustness of DNN architecture components under fixed parameter budgets. Through our investigation, we distill 18 actionable robust network design guidelines that empower model developers to gain deep insights. We demonstrate these guidelines' effectiveness by introducing the novel Robust Architecture (RobArch) model that instantiates the guidelines to build a family of top-performing models across parameter capacities against strong adversarial attacks. RobArch achieves the new state-of-the-art AutoAttack accuracy on the RobustBench ImageNet leaderboard. The code is available at $\href{https://github.com/ShengYun-Peng/RobArch}{\text{this url}}$. △ Less

Submitted 8 January, 2023; originally announced January 2023.

arXiv:2301.00280 [pdf]

RECOMED: A Comprehensive Pharmaceutical Recommendation System

Authors: Mariam Zomorodi, Ismail Ghodsollahee, Jennifer H. Martin, Nicholas J. Talley, Vahid Salari, Pawel Plawiak, Kazem Rahimi, U. Rajendra Acharya

Abstract: A comprehensive pharmaceutical recommendation system was designed based on the patients and drugs features extracted from Drugs.com and Druglib.com. First, data from these databases were combined, and a dataset of patients and drug information was built. Secondly, the patients and drugs were clustered, and then the recommendation was performed using different ratings provided by patients, and impo… ▽ More A comprehensive pharmaceutical recommendation system was designed based on the patients and drugs features extracted from Drugs.com and Druglib.com. First, data from these databases were combined, and a dataset of patients and drug information was built. Secondly, the patients and drugs were clustered, and then the recommendation was performed using different ratings provided by patients, and importantly by the knowledge obtained from patients and drug specifications, and considering drug interactions. To the best of our knowledge, we are the first group to consider patients conditions and history in the proposed approach for selecting a specific medicine appropriate for that particular user. Our approach applies artificial intelligence (AI) models for the implementation. Sentiment analysis using natural language processing approaches is employed in pre-processing along with neural network-based methods and recommender system algorithms for modeling the system. In our work, patients conditions and drugs features are used for making two models based on matrix factorization. Then we used drug interaction to filter drugs with severe or mild interactions with other drugs. We developed a deep learning model for recommending drugs by using data from 2304 patients as a training set, and then we used data from 660 patients as our validation set. After that, we used knowledge from critical information about drugs and combined the outcome of the model into a knowledge-based system with the rules obtained from constraints on taking medicine. △ Less

Submitted 21 August, 2023; v1 submitted 31 December, 2022; originally announced January 2023.

Comments: 39 pages, 14 figures, 13 tables

arXiv:2212.10754 [pdf, other]

doi 10.18653/v1/2023.findings-acl.832

CoRRPUS: Code-based Structured Prompting for Neurosymbolic Story Understanding

Authors: Yijiang River Dong, Lara J. Martin, Chris Callison-Burch

Abstract: Story generation and understanding -- as with all NLG/NLU tasks -- has seen a surge in neurosymbolic work. Researchers have recognized that, while large language models (LLMs) have tremendous utility, they can be augmented with symbolic means to be even better and to make up for any flaws that the neural networks might have. However, symbolic methods are extremely costly in terms of the amount of… ▽ More Story generation and understanding -- as with all NLG/NLU tasks -- has seen a surge in neurosymbolic work. Researchers have recognized that, while large language models (LLMs) have tremendous utility, they can be augmented with symbolic means to be even better and to make up for any flaws that the neural networks might have. However, symbolic methods are extremely costly in terms of the amount of time and expertise needed to create them. In this work, we capitalize on state-of-the-art Code-LLMs, such as Codex, to bootstrap the use of symbolic methods for tracking the state of stories and aiding in story understanding. We show that our CoRRPUS system and abstracted prompting procedures can beat current state-of-the-art structured LLM techniques on pre-existing story understanding tasks (bAbI Task 2 and Re^3) with minimal hand engineering. We hope that this work can help highlight the importance of symbolic representations and specialized prompting for LLMs as these models require some guidance for performing reasoning tasks properly. △ Less

Submitted 8 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: Accepted to Findings of ACL 2023

Journal ref: Findings of ACL 2023, pp. 13152-13168

arXiv:2212.10420 [pdf, other]

Settling the Reward Hypothesis

Authors: Michael Bowling, John D. Martin, David Abel, Will Dabney

Abstract: The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis. This will not conclude with a simple affirmation or refutation, but rather specify completely the implicit requirements on goals and purposes under which the hy… ▽ More The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis. This will not conclude with a simple affirmation or refutation, but rather specify completely the implicit requirements on goals and purposes under which the hypothesis holds. △ Less

Submitted 16 September, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

arXiv:2212.10283 [pdf, other]

Interpretable models for extrapolation in scientific machine learning

Authors: Eric S. Muckley, James E. Saal, Bryce Meredig, Christopher S. Roper, John H. Martin

Abstract: Data-driven models are central to scientific discovery. In efforts to achieve state-of-the-art model accuracy, researchers are employing increasingly complex machine learning algorithms that often outperform simple regressions in interpolative settings (e.g. random k-fold cross-validation) but suffer from poor extrapolation performance, portability, and human interpretability, which limits their p… ▽ More Data-driven models are central to scientific discovery. In efforts to achieve state-of-the-art model accuracy, researchers are employing increasingly complex machine learning algorithms that often outperform simple regressions in interpolative settings (e.g. random k-fold cross-validation) but suffer from poor extrapolation performance, portability, and human interpretability, which limits their potential for facilitating novel scientific insight. Here we examine the trade-off between model performance and interpretability across a broad range of science and engineering problems with an emphasis on materials science datasets. We compare the performance of black box random forest and neural network machine learning algorithms to that of single-feature linear regressions which are fitted using interpretable input features discovered by a simple random search algorithm. For interpolation problems, the average prediction errors of linear regressions were twice as high as those of black box models. Remarkably, when prediction tasks required extrapolation, linear models yielded average error only 5% higher than that of black box models, and outperformed black box models in roughly 40% of the tested prediction tasks, which suggests that they may be desirable over complex algorithms in many extrapolation problems because of their superior interpretability, computational overhead, and ease of use. The results challenge the common assumption that extrapolative models for scientific machine learning are constrained by an inherent trade-off between performance and interpretability. △ Less

Submitted 16 December, 2022; originally announced December 2022.

Comments: DISTRIBUTION STATEMENT A (Approved for Public Release, Distribution Unlimited)

arXiv:2212.09821 [pdf, other]

Reduced Order Model of a Generic Submarine for Maneuvering Near the Surface

Authors: J. Ezequiel Martin, Maxwell Hammond, Nicholas Rober, Yakin Kim, Venanzio Cichella, Pablo Carrica

Abstract: A reduced order model of a generic submarine is presented. Computational fluid dynamics (CFD) results are used to create and validate a model that includes depth dependence and the effect of waves on the craft. The model and the procedure to obtain its coefficients are discussed, and examples of the data used to obtain the model coefficients are presented. An example of operation following a compl… ▽ More A reduced order model of a generic submarine is presented. Computational fluid dynamics (CFD) results are used to create and validate a model that includes depth dependence and the effect of waves on the craft. The model and the procedure to obtain its coefficients are discussed, and examples of the data used to obtain the model coefficients are presented. An example of operation following a complex path is presented and results from the reduced order model are compared to those from an equivalent CFD calculation. The controller implemented to complete these maneuvers is also presented. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: Presented at the 34th Symposium on Naval Hydrodynamics, Washington DC, USA, 26 June - 1 July 2022

arXiv:2210.07109 [pdf, other]

doi 10.18653/v1/2022.emnlp-main.637

Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence

Authors: Chris Callison-Burch, Gaurav Singh Tomar, Lara J. Martin, Daphne Ippolito, Suma Bailis, David Reitter

Abstract: AI researchers have posited Dungeons and Dragons (D&D) as a challenge problem to test systems on various language-related capabilities. In this paper, we frame D&D specifically as a dialogue system challenge, where the tasks are to both generate the next conversational turn in the game and predict the state of the game given the dialogue history. We create a gameplay dataset consisting of nearly 9… ▽ More AI researchers have posited Dungeons and Dragons (D&D) as a challenge problem to test systems on various language-related capabilities. In this paper, we frame D&D specifically as a dialogue system challenge, where the tasks are to both generate the next conversational turn in the game and predict the state of the game given the dialogue history. We create a gameplay dataset consisting of nearly 900 games, with a total of 7,000 players, 800,000 dialogue turns, 500,000 dice rolls, and 58 million words. We automatically annotate the data with partial state information about the game play. We train a large language model (LM) to generate the next game turn, conditioning it on different information. The LM can respond as a particular character or as the player who runs the game--i.e., the Dungeon Master (DM). It is trained to produce dialogue that is either in-character (roleplaying in the fictional world) or out-of-character (discussing rules or strategy). We perform a human evaluation to determine what factors make the generated output plausible and interesting. We further perform an automatic evaluation to determine how well the model can predict the game state given the history and examine how well tracking the game state improves its ability to produce plausible conversational output. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Accepted at EMNLP 2022

Journal ref: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 9379-9393, Dec. 2022

arXiv:2209.14030 [pdf, other]

doi 10.4204/EPTCS.371.15

Monitoring ROS2: from Requirements to Autonomous Robots

Authors: Ivan Perez, Anastasia Mavridou, Tom Pressburger, Alexander Will, Patrick J. Martin

Abstract: Runtime verification (RV) has the potential to enable the safe operation of safety-critical systems that are too complex to formally verify, such as Robot Operating System 2 (ROS2) applications. Writing correct monitors can itself be complex, and errors in the monitoring subsystem threaten the mission as a whole. This paper provides an overview of a formal approach to generating runtime monitors f… ▽ More Runtime verification (RV) has the potential to enable the safe operation of safety-critical systems that are too complex to formally verify, such as Robot Operating System 2 (ROS2) applications. Writing correct monitors can itself be complex, and errors in the monitoring subsystem threaten the mission as a whole. This paper provides an overview of a formal approach to generating runtime monitors for autonomous robots from requirements written in a structured natural language. Our approach integrates the Formal Requirement Elicitation Tool (FRET) with Copilot, a runtime verification framework, through the Ogma integration tool. FRET is used to specify requirements with unambiguous semantics, which are then automatically translated into temporal logic formulae. Ogma generates monitor specifications from the FRET output, which are compiled into hard-real time C99. To facilitate integration of the monitors in ROS2, we have extended Ogma to generate ROS2 packages defining monitoring nodes, which run the monitors when new data becomes available, and publish the results of any violations. The goal of our approach is to treat the generated ROS2 packages as black boxes and integrate them into larger ROS2 systems with minimal effort. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: In Proceedings FMAS2022 ASYDE2022, arXiv:2209.13181

ACM Class: D.2.1; D.2.4; I.2.9;

Journal ref: EPTCS 371, 2022, pp. 208-216

arXiv:2208.03609 [pdf, other]

Continual Learning for Tumor Classification in Histopathology Images

Authors: Veena Kaustaban, Qinle Ba, Ipshita Bhattacharya, Nahil Sobh, Satarupa Mukherjee, Jim Martin, Mohammad Saleh Miri, Christoph Guetter, Amal Chaturvedi

Abstract: Recent years have seen great advancements in the development of deep learning models for histopathology image analysis in digital pathology applications, evidenced by the increasingly common deployment of these models in both research and clinical settings. Although such models have shown unprecedented performance in solving fundamental computational tasks in DP applications, they suffer from cata… ▽ More Recent years have seen great advancements in the development of deep learning models for histopathology image analysis in digital pathology applications, evidenced by the increasingly common deployment of these models in both research and clinical settings. Although such models have shown unprecedented performance in solving fundamental computational tasks in DP applications, they suffer from catastrophic forgetting when adapted to unseen data with transfer learning. With an increasing need for deep learning models to handle ever changing data distributions, including evolving patient population and new diagnosis assays, continual learning models that alleviate model forgetting need to be introduced in DP based analysis. However, to our best knowledge, there is no systematic study of such models for DP-specific applications. Here, we propose CL scenarios in DP settings, where histopathology image data from different sources/distributions arrive sequentially, the knowledge of which is integrated into a single model without training all the data from scratch. We then established an augmented dataset for colorectal cancer H&E classification to simulate shifts of image appearance and evaluated CL model performance in the proposed CL scenarios. We leveraged a breast tumor H&E dataset along with the colorectal cancer to evaluate CL from different tumor types. In addition, we evaluated CL methods in an online few-shot setting under the constraints of annotation and computational resources. We revealed promising results of CL in DP applications, potentially paving the way for application of these methods in clinical practice. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: Accepted by MOVI, a MICCAI2022 workshop: https://sites.google.com/view/movi2022

arXiv:2207.10719 [pdf, other]

Synthetic Dataset Generation for Adversarial Machine Learning Research

Authors: Xiruo Liu, Shibani Singh, Cory Cornelius, Colin Busho, Mike Tan, Anindya Paul, Jason Martin

Abstract: Existing adversarial example research focuses on digitally inserted perturbations on top of existing natural image datasets. This construction of adversarial examples is not realistic because it may be difficult, or even impossible, for an attacker to deploy such an attack in the real-world due to sensing and environmental effects. To better understand adversarial examples against cyber-physical s… ▽ More Existing adversarial example research focuses on digitally inserted perturbations on top of existing natural image datasets. This construction of adversarial examples is not realistic because it may be difficult, or even impossible, for an attacker to deploy such an attack in the real-world due to sensing and environmental effects. To better understand adversarial examples against cyber-physical systems, we propose approximating the real-world through simulation. In this paper we describe our synthetic dataset generation tool that enables scalable collection of such a synthetic dataset with realistic adversarial examples. We use the CARLA simulator to collect such a dataset and demonstrate simulated attacks that undergo the same environmental transforms and processing as real-world images. Our tools have been used to collect datasets to help evaluate the efficacy of adversarial examples, and can be found at https://github.com/carla-simulator/carla/pull/4992. △ Less

Submitted 21 July, 2022; originally announced July 2022.

Journal ref: AdvML Frontiers 2022

arXiv:2206.13960 [pdf, ps, other]

Dynamic Memory for Interpretable Sequential Optimisation

Authors: Srivas Chennu, Andrew Maher, Jamie Martin, Subash Prabanantham

Abstract: Real-world applications of reinforcement learning for recommendation and experimentation faces a practical challenge: the relative reward of different bandit arms can evolve over the lifetime of the learning agent. To deal with these non-stationary cases, the agent must forget some historical knowledge, as it may no longer be relevant to minimise regret. We present a solution to handling non-stati… ▽ More Real-world applications of reinforcement learning for recommendation and experimentation faces a practical challenge: the relative reward of different bandit arms can evolve over the lifetime of the learning agent. To deal with these non-stationary cases, the agent must forget some historical knowledge, as it may no longer be relevant to minimise regret. We present a solution to handling non-stationarity that is suitable for deployment at scale, to provide business operators with automated adaptive optimisation. Our solution aims to provide interpretable learning that can be trusted by humans, whilst responding to non-stationarity to minimise regret. To this end, we develop an adaptive Bayesian learning agent that employs a novel form of dynamic memory. It enables interpretability through statistical hypothesis testing, by targeting a set point of statistical power when comparing rewards and adjusting its memory dynamically to achieve this power. By design, the agent is agnostic to different kinds of non-stationarity. Using numerical simulations, we compare its performance against an existing proposal and show that, under multiple non-stationary scenarios, our agent correctly adapts to real changes in the true rewards. In all bandit solutions, there is an explicit trade-off between learning and achieving maximal performance. Our solution sits on a different point on this trade-off when compared to another similarly robust approach: we prioritise interpretability, which relies on more learning, at the cost of some regret. We describe the architecture of a large-scale deployment of automatic optimisation-as-a-service where our agent achieves interpretability whilst adapting to changing circumstances. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: 2nd International Workshop on Online and Adaptive Recommender Systems, 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, Washington DC

arXiv:2206.02848 [pdf]

Plagiarism deterrence for introductory programming

Authors: Simon J. Cohen, Michael J. Martin, Chance A. Shipley, Abhishek Kumar, Andrew R. Cohen

Abstract: Plagiarism in introductory programming courses is an enormous challenge for both students and institutions. For students, relying on the work of others too early in their academic development can make it impossible to acquire necessary skills for independent success in the future. For institutions, widespread student cheating can dilute the quality of the educational experience being offered. Curr… ▽ More Plagiarism in introductory programming courses is an enormous challenge for both students and institutions. For students, relying on the work of others too early in their academic development can make it impossible to acquire necessary skills for independent success in the future. For institutions, widespread student cheating can dilute the quality of the educational experience being offered. Currently available solutions consider only pairwise comparisons between student submissions and focus on punitive deterrence. Our approach instead relies on a class-wide statistical characterization that can be clearly and securely shared with students via an intuitive new p-value representing independence of student effort. A pairwise, compression-based similarity detection algorithm captures relationships between assignments more accurately. An automated deterrence system is used to warn students that their behavior is being closely monitored. High-confidence instances are made directly available for instructor review using our open-source toolkit. An unbiased scoring system aids students and the instructor in understanding true independence of effort. Preliminary results indicate that the system can provide meaningful measurements of independence from week one, improving the efficacy of technical education. △ Less

Submitted 6 June, 2022; originally announced June 2022.

arXiv:2205.10736 [pdf, other]

Should Models Be Accurate?

Authors: Esra'a Saleh, John D. Martin, Anna Koop, Arash Pourzarabi, Michael Bowling

Abstract: Model-based Reinforcement Learning (MBRL) holds promise for data-efficiency by planning with model-generated experience in addition to learning with experience from the environment. However, in complex or changing environments, models in MBRL will inevitably be imperfect, and their detrimental effects on learning can be difficult to mitigate. In this work, we question whether the objective of thes… ▽ More Model-based Reinforcement Learning (MBRL) holds promise for data-efficiency by planning with model-generated experience in addition to learning with experience from the environment. However, in complex or changing environments, models in MBRL will inevitably be imperfect, and their detrimental effects on learning can be difficult to mitigate. In this work, we question whether the objective of these models should be the accurate simulation of environment dynamics at all. We focus our investigations on Dyna-style planning in a prediction setting. First, we highlight and support three motivating points: a perfectly accurate model of environment dynamics is not practically achievable, is not necessary, and is not always the most useful anyways. Second, we introduce a meta-learning algorithm for training models with a focus on their usefulness to the learner instead of their accuracy in modelling the environment. Our experiments show that in a simple non-stationary environment, our algorithm enables faster learning than even using an accurate model built with domain-specific knowledge of the non-stationarity. △ Less

Submitted 22 May, 2022; originally announced May 2022.

Comments: The 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making ( RLDM 2022 )

arXiv:2205.01254 [pdf, other]

Deep API Learning Revisited

Authors: James Martin, ** L. C. Guo

Abstract: Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either poor quality of API documentation or ineffective query-based searching strategy. To help solve this issue, researchers have proposed various methods to suggest t… ▽ More Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either poor quality of API documentation or ineffective query-based searching strategy. To help solve this issue, researchers have proposed various methods to suggest the sequence of APIs given natural language queries representing the information needs from programmers. Among such efforts, Gu et al. adopted a deep learning method, in particular an RNN Encoder-Decoder architecture, to perform this task and obtained promising results on common APIs in Java. In this work, we aim to reproduce their results and apply the same methods for APIs in Python. Additionally, we compare the performance with a more recent Transformer-based method, i.e., CodeBERT, for the same task. Our experiment reveals a clear drop in performance measures when careful data cleaning is performed. Owing to the pretraining from a large number of source code files and effective encoding technique, CodeBERT outperforms the method by Gu et al., to a large extent. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: 10 pages, 6 figures. This paper is accepted at ICPC 2022 (the 30th IEEE/ACM International Conference on Program Comprehension)

ACM Class: D.2.13

arXiv:2204.12421 [pdf, other]

Disambiguation of morpho-syntactic features of African American English -- the case of habitual be

Authors: Harrison Santiago, Joshua Martin, Sarah Moeller, Kevin Tang

Abstract: Recent research has highlighted that natural language processing (NLP) systems exhibit a bias against African American speakers. The bias errors are often caused by poor representation of linguistic features unique to African American English (AAE), due to the relatively low probability of occurrence of many such features in training data. We present a workflow to overcome such bias in the case of… ▽ More Recent research has highlighted that natural language processing (NLP) systems exhibit a bias against African American speakers. The bias errors are often caused by poor representation of linguistic features unique to African American English (AAE), due to the relatively low probability of occurrence of many such features in training data. We present a workflow to overcome such bias in the case of habitual "be". Habitual "be" is isomorphic, and therefore ambiguous, with other forms of "be" found in both AAE and other varieties of English. This creates a clear challenge for bias in NLP technologies. To overcome the scarcity, we employ a combination of rule-based filters and data augmentation that generate a corpus balanced between habitual and non-habitual instances. With this balanced corpus, we train unbiased machine learning classifiers, as demonstrated on a corpus of AAE transcribed texts, achieving .65 F$_1$ score disambiguating habitual "be". △ Less

Submitted 26 April, 2022; originally announced April 2022.

arXiv:2204.10836 [pdf, other]

doi 10.1038/s41467-022-33407-5

Federated Learning Enables Big Data for Rare Cancer Boundary Detection

Authors: Sarthak Pati, Ujjwal Baid, Brandon Edwards, Micah Sheller, Shih-Han Wang, G Anthony Reina, Patrick Foley, Alexey Gruzdev, Deepthi Karkada, Christos Davatzikos, Chiharu Sako, Satyam Ghodasara, Michel Bilello, Suyash Mohan, Philipp Vollmuth, Gianluca Brugnara, Chandrakanth J Preetha, Felix Sahm, Klaus Maier-Hein, Maximilian Zenk, Martin Bendszus, Wolfgang Wick, Evan Calabrese, Jeffrey Rudie, Javier Villanueva-Meyer , et al. (254 additional authors not shown)

Abstract: Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc… ▽ More Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing. △ Less

Submitted 25 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: federated learning, deep learning, convolutional neural network, segmentation, brain tumor, glioma, glioblastoma, FeTS, BraTS

arXiv:2204.09652 [pdf, other]

The TalkMoves Dataset: K-12 Mathematics Lesson Transcripts Annotated for Teacher and Student Discursive Moves

Authors: Abhijit Suresh, Jennifer Jacobs, Charis Harty, Margaret Perkoff, James H. Martin, Tamara Sumner

Abstract: Transcripts of teaching episodes can be effective tools to understand discourse patterns in classroom instruction. According to most educational experts, sustained classroom discourse is a critical component of equitable, engaging, and rich learning environments for students. This paper describes the TalkMoves dataset, composed of 567 human-annotated K-12 mathematics lesson transcripts (including… ▽ More Transcripts of teaching episodes can be effective tools to understand discourse patterns in classroom instruction. According to most educational experts, sustained classroom discourse is a critical component of equitable, engaging, and rich learning environments for students. This paper describes the TalkMoves dataset, composed of 567 human-annotated K-12 mathematics lesson transcripts (including entire lessons or portions of lessons) derived from video recordings. The set of transcripts primarily includes in-person lessons with whole-class discussions and/or small group work, as well as some online lessons. All of the transcripts are human-transcribed, segmented by the speaker (teacher or student), and annotated at the sentence level for ten discursive moves based on accountable talk theory. In addition, the transcripts include utterance-level information in the form of dialogue act labels based on the Switchboard Dialog Act Corpus. The dataset can be used by educators, policymakers, and researchers to understand the nature of teacher and student discourse in K-12 math classrooms. Portions of this dataset have been used to develop the TalkMoves application, which provides teachers with automated, immediate, and actionable feedback about their mathematics instruction. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: 9 pages, 2 figures, Accepted for a Poster + Demo presentation at the 13th International Conference on Language Resources and Evaluation 2022

arXiv:2202.07880 [pdf, other]

$\rm{C {\small IS}}^2$: A Simplified Commonsense Inference Evaluation for Story Prose

Authors: Bryan Li, Lara J. Martin, Chris Callison-Burch

Abstract: Transformers have been showing near-human performance on a variety of tasks, but they are not without their limitations. We discuss the issue of conflating results of transformers that are instructed to do multiple tasks simultaneously. In particular, we focus on the domain of commonsense reasoning within story prose, which we call contextual commonsense inference (CCI). We look at the GLUCOSE (Mo… ▽ More Transformers have been showing near-human performance on a variety of tasks, but they are not without their limitations. We discuss the issue of conflating results of transformers that are instructed to do multiple tasks simultaneously. In particular, we focus on the domain of commonsense reasoning within story prose, which we call contextual commonsense inference (CCI). We look at the GLUCOSE (Mostafazadeh et al. 2020) dataset and task for predicting implicit commonsense inferences between story sentences. Since the GLUCOSE task simultaneously generates sentences and predicts the CCI relation, there is a conflation in the results. Is the model really measuring CCI or is its ability to generate grammatical text carrying the results? In this paper, we introduce the task contextual commonsense inference in sentence selection ($\rm{C {\small IS}}^2$), a simplified task that avoids conflation by eliminating language generation altogether. Our findings emphasize the necessity of future work to disentangle language generation from the desired NLP tasks at hand. △ Less

Submitted 19 October, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Published at the Workshop on Commonsense Representation and Reasoning (CSRR) @ ACL 2022

arXiv:2202.03246 [pdf]

AI-based artistic representation of emotions from EEG signals: a discussion on fairness, inclusion, and aesthetics

Authors: Piera Riccio, Kristin Bergaust, Boel Christensen-Scheel, Juan-Carlos De Martin, Maria A. Zuluaga, Stefano Nichele

Abstract: While Artificial Intelligence (AI) technologies are being progressively developed, artists and researchers are investigating their role in artistic practices. In this work, we present an AI-based Brain-Computer Interface (BCI) in which humans and machines interact to express feelings artistically. This system and its production of images give opportunities to reflect on the complexities and range… ▽ More While Artificial Intelligence (AI) technologies are being progressively developed, artists and researchers are investigating their role in artistic practices. In this work, we present an AI-based Brain-Computer Interface (BCI) in which humans and machines interact to express feelings artistically. This system and its production of images give opportunities to reflect on the complexities and range of human emotions and their expressions. In this discussion, we seek to understand the dynamics of this interaction to reach better co-existence in fairness, inclusion, and aesthetics. △ Less

Submitted 7 February, 2022; originally announced February 2022.

Comments: Accepted to the Politics of the Machines conference 2021 (POM Berlin 2021)

Showing 1–50 of 129 results for author: Martín, J