Search | arXiv e-print repository

arXiv:2406.19538 [pdf, other]

Context Matters: An Empirical Study of the Impact of Contextual Information in Temporal Question Answering Systems

Authors: Dan Schumacher, Fatemeh Haji, Tara Grey, Niharika Bandlamudi, Nupoor Karnik, Gagana Uday Kumar, Jason Cho-Yu Chiang, Paul Rad, Nishant Vishwamitra, Anthony Rios

Abstract: Large language models (LLMs) often struggle with temporal reasoning, crucial for tasks like historical event analysis and time-sensitive information retrieval. Despite advancements, state-of-the-art models falter in handling temporal information, especially when faced with irrelevant or noisy contexts. This paper addresses this gap by empirically examining the robustness of temporal question-answe… ▽ More Large language models (LLMs) often struggle with temporal reasoning, crucial for tasks like historical event analysis and time-sensitive information retrieval. Despite advancements, state-of-the-art models falter in handling temporal information, especially when faced with irrelevant or noisy contexts. This paper addresses this gap by empirically examining the robustness of temporal question-answering (TQA) systems trained on various context types, including relevant, irrelevant, slightly altered, and no context. Our findings indicate that training with a mix of these contexts enhances model robustness and accuracy. Additionally, we show that the position of context relative to the question significantly impacts performance, with question-first positioning yielding better results. We introduce two new context-rich TQA datasets, ContextAQA and ContextTQE, and provide comprehensive evaluations and guidelines for training robust TQA models. Our work lays the foundation for develo** reliable and context-aware temporal QA systems, with broader implications for enhancing LLM robustness against diverse and potentially adversarial information. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.17574 [pdf, other]

Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats

Authors: Ryan Pavlich, Nima Ebadi, Richard Tarbell, Billy Linares, Adrian Tan, Rachael Humphreys, Jayanta Kumar Das, Rambod Ghandiparsi, Hannah Haley, Jerris George, Rocky Slavin, Kim-Kwang Raymond Choo, Glenn Dietrich, Anthony Rios

Abstract: Recognizing the promise of natural language interfaces to databases, prior studies have emphasized the development of text-to-SQL systems. While substantial progress has been made in this field, existing research has concentrated on generating SQL statements from text queries. The broader challenge, however, lies in inferring new information about the returned data. Our research makes two major co… ▽ More Recognizing the promise of natural language interfaces to databases, prior studies have emphasized the development of text-to-SQL systems. While substantial progress has been made in this field, existing research has concentrated on generating SQL statements from text queries. The broader challenge, however, lies in inferring new information about the returned data. Our research makes two major contributions to address this gap. First, we introduce a novel Internet-of-Things (IoT) text-to-SQL dataset comprising 10,985 text-SQL pairs and 239,398 rows of network traffic activity. The dataset contains additional query types limited in prior text-to-SQL datasets, notably temporal-related queries. Our dataset is sourced from a smart building's IoT ecosystem exploring sensor read and network traffic data. Second, our dataset allows two-stage processing, where the returned data (network traffic) from a generated SQL can be categorized as malicious or not. Our results show that joint training to query and infer information about the data can improve overall text-to-SQL performance, nearly matching substantially larger models. We also show that current large language models (e.g., GPT3.5) struggle to infer new information about returned data, thus our dataset provides a novel test bed for integrating complex domain-specific reasoning into LLMs. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.14545 [pdf, other]

Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems

Authors: Đorđe Klisura, Anthony Rios

Abstract: Relational databases are integral to modern information systems, serving as the foundation for storing, querying, and managing data efficiently and effectively. Advancements in large language modeling have led to the emergence of text-to-SQL technologies, significantly enhancing the querying and extracting of information from these databases and raising concerns about privacy and security. Our res… ▽ More Relational databases are integral to modern information systems, serving as the foundation for storing, querying, and managing data efficiently and effectively. Advancements in large language modeling have led to the emergence of text-to-SQL technologies, significantly enhancing the querying and extracting of information from these databases and raising concerns about privacy and security. Our research extracts the database schema elements underlying a text-to-SQL model. Knowledge of the schema can make attacks such as SQL injection easier. By asking specially crafted questions, we have developed a zero-knowledge framework designed to probe various database schema elements without knowledge of the database itself. The text-to-SQL models then process these questions to produce an output that we use to uncover the structure of the database schema. We apply it to specialized text-to-SQL models fine-tuned on text-SQL pairs and generative language models used for SQL generation. Overall, we can reconstruct the table names with an F1 of nearly .75 for fine-tuned models and .96 for generative. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14500 [pdf, other]

Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary

Authors: Xingmeng Zhao, Tongnian Wang, Anthony Rios

Abstract: Radiology report summarization (RRS) is crucial for patient care, requiring concise "Impressions" from detailed "Findings." This paper introduces a novel prompting strategy to enhance RRS by first generating a layperson summary. This approach normalizes key observations and simplifies complex information using non-expert communication techniques inspired by doctor-patient interactions. Combined wi… ▽ More Radiology report summarization (RRS) is crucial for patient care, requiring concise "Impressions" from detailed "Findings." This paper introduces a novel prompting strategy to enhance RRS by first generating a layperson summary. This approach normalizes key observations and simplifies complex information using non-expert communication techniques inspired by doctor-patient interactions. Combined with few-shot in-context learning, this method improves the model's ability to link general terms to specific findings. We evaluate this approach on the MIMIC-CXR, CheXpert, and MIMIC-III datasets, benchmarking it against 7B/8B parameter state-of-the-art open-source large language models (LLMs) like Meta-Llama-3-8B-Instruct. Our results demonstrate improvements in summarization accuracy and accessibility, particularly in out-of-domain tests, with improvements as high as 5% for some metrics. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2404.01961 [pdf, other]

Team UTSA-NLP at SemEval 2024 Task 5: Prompt Ensembling for Argument Reasoning in Civil Procedures with GPT4

Authors: Dan Schumacher, Anthony Rios

Abstract: In this paper, we present our system for the SemEval Task 5, The Legal Argument Reasoning Task in Civil Procedure Challenge. Legal argument reasoning is an essential skill that all law students must master. Moreover, it is important to develop natural language processing solutions that can reason about a question given terse domain-specific contextual information. Our system explores a prompt-base… ▽ More In this paper, we present our system for the SemEval Task 5, The Legal Argument Reasoning Task in Civil Procedure Challenge. Legal argument reasoning is an essential skill that all law students must master. Moreover, it is important to develop natural language processing solutions that can reason about a question given terse domain-specific contextual information. Our system explores a prompt-based solution using GPT4 to reason over legal arguments. We also evaluate an ensemble of prompting strategies, including chain-of-thought reasoning and in-context learning. Overall, our system results in a Macro F1 of .8095 on the validation dataset and .7315 (5th out of 21 teams) on the final test set. Code for this project is available at https://github.com/danschumac1/CivilPromptReasoningGPT4. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted to SemEval@NAACL 2024

arXiv:2403.17363 [pdf, other]

Extracting Biomedical Entities from Noisy Audio Transcripts

Authors: Nima Ebadi, Kellen Morgan, Adrian Tan, Billy Linares, Sheri Osborn, Emma Majors, Jeremy Davis, Anthony Rios

Abstract: Automatic Speech Recognition (ASR) technology is fundamental in transcribing spoken language into text, with considerable applications in the clinical realm, including streamlining medical transcription and integrating with Electronic Health Record (EHR) systems. Nevertheless, challenges persist, especially when transcriptions contain noise, leading to significant drops in performance when Natural… ▽ More Automatic Speech Recognition (ASR) technology is fundamental in transcribing spoken language into text, with considerable applications in the clinical realm, including streamlining medical transcription and integrating with Electronic Health Record (EHR) systems. Nevertheless, challenges persist, especially when transcriptions contain noise, leading to significant drops in performance when Natural Language Processing (NLP) models are applied. Named Entity Recognition (NER), an essential clinical task, is particularly affected by such noise, often termed the ASR-NLP gap. Prior works have primarily studied ASR's efficiency in clean recordings, leaving a research gap concerning the performance in noisy environments. This paper introduces a novel dataset, BioASR-NER, designed to bridge the ASR-NLP gap in the biomedical domain, focusing on extracting adverse drug reactions and mentions of entities from the Brief Test of Adult Cognition by Telephone (BTACT) exam. Our dataset offers a comprehensive collection of almost 2,000 clean and noisy recordings. In addressing the noise challenge, we present an innovative transcript-cleaning method using GPT4, investigating both zero-shot and few-shot methodologies. Our study further delves into an error analysis, shedding light on the types of errors in transcription software, corrections by GPT4, and the challenges GPT4 faces. This paper aims to foster improved understanding and potential solutions for the ASR-NLP gap, ultimately supporting enhanced healthcare documentation practices. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Accepted to LREC-COLING 2024

arXiv:2403.08221 [pdf, other]

doi 10.1145/3613904.3642816

Help Supporters: Exploring the Design Space of Assistive Technologies to Support Face-to-Face Help Between Blind and Sighted Strangers

Authors: Yuanyang Teng, Connor Courtien, David Angel Rios, Yves M. Tseng, Jacqueline Gibson, Maryam Aziz, Avery Reyna, Rajan Vaish, Brian A. Smith

Abstract: Blind and low-vision (BLV) people face many challenges when venturing into public environments, often wishing it were easier to get help from people nearby. Ironically, while many sighted individuals are willing to help, such interactions are infrequent. Asking for help is socially awkward for BLV people, and sighted people lack experience in hel** BLV people. Through a mixed-ability research-th… ▽ More Blind and low-vision (BLV) people face many challenges when venturing into public environments, often wishing it were easier to get help from people nearby. Ironically, while many sighted individuals are willing to help, such interactions are infrequent. Asking for help is socially awkward for BLV people, and sighted people lack experience in hel** BLV people. Through a mixed-ability research-through-design process, we explore four diverse approaches toward how assistive technology can serve as help supporters that collaborate with both BLV and sighted parties throughout the help process. These approaches span two phases: the connection phase (finding someone to help) and the collaboration phase (facilitating help after finding someone). Our findings from a 20-participant mixed-ability study reveal how help supporters can best facilitate connection, which types of information they should present during both phases, and more. We discuss design implications for future approaches to support face-to-face help. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: To Appear In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) Association for Computing Machinery, New York, NY, USA. 24 pages

arXiv:2403.03750 [pdf, other]

doi 10.3929/ethz-b-000661775

German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset

Authors: Laura Mascarell, Ribin Chalumattu, Annette Rios

Abstract: The advent of Large Language Models (LLMs) has led to remarkable progress on a wide range of natural language processing tasks. Despite the advances, these large-sized models still suffer from hallucinating information in their output, which poses a major issue in automatic text summarization, as we must guarantee that the generated summary is consistent with the content of the source document. Pr… ▽ More The advent of Large Language Models (LLMs) has led to remarkable progress on a wide range of natural language processing tasks. Despite the advances, these large-sized models still suffer from hallucinating information in their output, which poses a major issue in automatic text summarization, as we must guarantee that the generated summary is consistent with the content of the source document. Previous research addresses the challenging task of detecting hallucinations in the output (i.e. inconsistency detection) in order to evaluate the faithfulness of the generated summaries. However, these works primarily focus on English and recent multilingual approaches lack German data. This work presents absinth, a manually annotated dataset for hallucination detection in German news summarization and explores the capabilities of novel open-source LLMs on this task in both fine-tuning and in-context learning settings. We open-source and release the absinth dataset to foster further research on hallucination detection in German. △ Less

Submitted 14 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: 11 pages, 2 figures, 7 tables, conference: Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Turin, Italy, May 20-25, 2024

ACM Class: I.2.7

arXiv:2401.09407 [pdf, other]

Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text

Authors: Mazal Bethany, Brandon Wherry, Emet Bethany, Nishant Vishwamitra, Anthony Rios, Peyman Najafirad

Abstract: With the recent proliferation of Large Language Models (LLMs), there has been an increasing demand for tools to detect machine-generated text. The effective detection of machine-generated text face two pertinent problems: First, they are severely limited in generalizing against real-world scenarios, where machine-generated text is produced by a variety of generators, including but not limited to G… ▽ More With the recent proliferation of Large Language Models (LLMs), there has been an increasing demand for tools to detect machine-generated text. The effective detection of machine-generated text face two pertinent problems: First, they are severely limited in generalizing against real-world scenarios, where machine-generated text is produced by a variety of generators, including but not limited to GPT-4 and Dolly, and spans diverse domains, ranging from academic manuscripts to social media posts. Second, existing detection methodologies treat texts produced by LLMs through a restrictive binary classification lens, neglecting the nuanced diversity of artifacts generated by different LLMs. In this work, we undertake a systematic study on the detection of machine-generated text in real-world scenarios. We first study the effectiveness of state-of-the-art approaches and find that they are severely limited against text produced by diverse generators and domains in the real world. Furthermore, t-SNE visualizations of the embeddings from a pretrained LLM's encoder show that they cannot reliably distinguish between human and machine-generated text. Based on our findings, we introduce a novel system, T5LLMCipher, for detecting machine-generated text using a pretrained T5 encoder combined with LLM embedding sub-clustering to address the text produced by diverse generators and domains in the real world. We evaluate our approach across 9 machine-generated text systems and 9 domains and find that our approach provides state-of-the-art generalization ability, with an average increase in F1 score on machine-generated text of 19.6\% on unseen generators and domains compared to the top performing existing approaches and correctly attributes the generator of text with an accuracy of 93.6\%. △ Less

Submitted 2 April, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

arXiv:2310.16681 [pdf, other]

BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories?

Authors: Xingmeng Zhao, Tongnian Wang, Sheri Osborn, Anthony Rios

Abstract: Language models have seen significant growth in the size of their corpus, leading to notable performance improvements. Yet, there has been limited progress in develo** models that handle smaller, more human-like datasets. As part of the BabyLM shared task, this study explores the impact of reinforcement learning from human feedback (RLHF) on language models pretrained from scratch with a limited… ▽ More Language models have seen significant growth in the size of their corpus, leading to notable performance improvements. Yet, there has been limited progress in develo** models that handle smaller, more human-like datasets. As part of the BabyLM shared task, this study explores the impact of reinforcement learning from human feedback (RLHF) on language models pretrained from scratch with a limited training corpus. Comparing two GPT-2 variants, the larger model performs better in storytelling tasks after RLHF fine-tuning. These findings suggest that RLHF techniques may be more advantageous for larger models due to their higher learning and adaptation capacity, though more experiments are needed to confirm this finding. These insights highlight the potential benefits of RLHF fine-tuning for language models within limited data, enhancing their ability to maintain narrative focus and coherence while adhering better to initial instructions in storytelling tasks. The code for this work is publicly at https://github.com/Zephyr1022/BabyStories-UTSA. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted to BabyLM workshop at CoNLL

arXiv:2309.03918 [pdf, other]

A recommender for the management of chronic pain in patients undergoing spinal cord stimulation

Authors: Tigran Tchrakian, Mykhaylo Zayats, Alessandra Pascale, Dat Huynh, Pritish Parida, Carla Agurto Rios, Sergiy Zhuk, Jeffrey L. Rogers, ENVISION Studies Physician Author Group, Boston Scientific Research Scientists Consortium

Abstract: Spinal cord stimulation (SCS) is a therapeutic approach used for the management of chronic pain. It involves the delivery of electrical impulses to the spinal cord via an implanted device, which when given suitable stimulus parameters can mask or block pain signals. Selection of optimal stimulation parameters usually happens in the clinic under the care of a provider whereas at-home SCS optimizati… ▽ More Spinal cord stimulation (SCS) is a therapeutic approach used for the management of chronic pain. It involves the delivery of electrical impulses to the spinal cord via an implanted device, which when given suitable stimulus parameters can mask or block pain signals. Selection of optimal stimulation parameters usually happens in the clinic under the care of a provider whereas at-home SCS optimization is managed by the patient. In this paper, we propose a recommender system for the management of pain in chronic pain patients undergoing SCS. In particular, we use a contextual multi-armed bandit (CMAB) approach to develop a system that recommends SCS settings to patients with the aim of improving their condition. These recommendations, sent directly to patients though a digital health ecosystem, combined with a patient monitoring system closes the therapeutic loop around a chronic pain patient over their entire patient journey. We evaluated the system in a cohort of SCS-implanted ENVISION study subjects (Clinicaltrials.gov ID: NCT03240588) using a combination of quality of life metrics and Patient States (PS), a novel measure of holistic outcomes. SCS recommendations provided statistically significant improvement in clinical outcomes (pain and/or QoL) in 85\% of all subjects (N=21). Among subjects in moderate PS (N=7) prior to receiving recommendations, 100\% showed statistically significant improvements and 5/7 had improved PS dwell time. This analysis suggests SCS patients may benefit from SCS recommendations, resulting in additional clinical improvement on top of benefits already received from SCS therapy. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2307.09558 [pdf]

doi 10.1007/s10055-023-00821-z

Fitted avatars: automatic skeleton adjustment for self-avatars in virtual reality

Authors: Jose Luis Ponton, Víctor Ceballos, Lesly Acosta, Alejandro Ríos, Eva Monclús, Nuria Pelechano

Abstract: In the era of the metaverse, self-avatars are gaining popularity, as they can enhance presence and provide embodiment when a user is immersed in Virtual Reality. They are also very important in collaborative Virtual Reality to improve communication through gestures. Whether we are using a complex motion capture solution or a few trackers with inverse kinematics (IK), it is essential to have a good… ▽ More In the era of the metaverse, self-avatars are gaining popularity, as they can enhance presence and provide embodiment when a user is immersed in Virtual Reality. They are also very important in collaborative Virtual Reality to improve communication through gestures. Whether we are using a complex motion capture solution or a few trackers with inverse kinematics (IK), it is essential to have a good match in size between the avatar and the user, as otherwise mismatches in self-avatar posture could be noticeable for the user. To achieve such a correct match in dimensions, a manual process is often required, with the need for a second person to take measurements of body limbs and introduce them into the system. This process can be time-consuming, and prone to errors. In this paper, we propose an automatic measuring method that simply requires the user to do a small set of exercises while wearing a Head-Mounted Display (HMD), two hand controllers, and three trackers. Our work provides an affordable and quick method to automatically extract user measurements and adjust the virtual humanoid skeleton to the exact dimensions. Our results show that our method can reduce the misalignment produced by the IK system when compared to other solutions that simply apply a uniform scaling to an avatar based on the height of the HMD, and make assumptions about the locations of joints with respect to the trackers. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: Published in Virtual Reality Springer

Journal ref: Springer Virtual Reality (2023) 1-20

arXiv:2305.18618 [pdf]

Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard

Authors: Vagelis Plevris, George Papazafeiropoulos, Alejandro Jiménez Rios

Abstract: A comparison between three chatbots which are based on large language models, namely ChatGPT-3.5, ChatGPT-4 and Google Bard is presented, focusing on their ability to give correct answers to mathematics and logic problems. In particular, we check their ability to Understand the problem at hand; Apply appropriate algorithms or methods for its solution; and Generate a coherent response and a correct… ▽ More A comparison between three chatbots which are based on large language models, namely ChatGPT-3.5, ChatGPT-4 and Google Bard is presented, focusing on their ability to give correct answers to mathematics and logic problems. In particular, we check their ability to Understand the problem at hand; Apply appropriate algorithms or methods for its solution; and Generate a coherent response and a correct answer. We use 30 questions that are clear, without any ambiguities, fully described with plain text only, and have a unique, well defined correct answer. The questions are divided into two sets of 15 each. The questions of Set A are 15 "Original" problems that cannot be found online, while Set B contains 15 "Published" problems that one can find online, usually with their solution. Each question is posed three times to each chatbot. The answers are recorded and discussed, highlighting their strengths and weaknesses. It has been found that for straightforward arithmetic, algebraic expressions, or basic logic puzzles, chatbots may provide accurate solutions, although not in every attempt. However, for more complex mathematical problems or advanced logic tasks, their answers, although written in a usually "convincing" way, may not be reliable. Consistency is also an issue, as many times a chatbot will provide conflicting answers when given the same question more than once. A comparative quantitative evaluation of the three chatbots is made through scoring their final answers based on correctness. It was found that ChatGPT-4 outperforms ChatGPT-3.5 in both sets of questions. Bard comes third in the original questions of Set A, behind the other two chatbots, while it has the best performance (first place) in the published questions of Set B. This is probably because Bard has direct access to the internet, in contrast to ChatGPT chatbots which do not have any communication with the outside world. △ Less

Submitted 30 May, 2023; originally announced May 2023.

MSC Class: 68T50 ACM Class: I.2.0; I.2.7

arXiv:2305.15591 [pdf, other]

Lightweight Learner for Shared Knowledge Lifelong Learning

Authors: Yunhao Ge, Yuecheng Li, Di Wu, Ao Xu, Adam M. Jones, Amanda Sofie Rios, Iordanis Fostiropoulos, Shixian Wen, Po-Hsuan Huang, Zachary William Murdock, Gozde Sahin, Shuo Ni, Kiran Lekkala, Sumedh Anand Sontakke, Laurent Itti

Abstract: In Lifelong Learning (LL), agents continually learn as they encounter new conditions and tasks. Most current LL is limited to a single agent that learns tasks sequentially. Dedicated LL machinery is then deployed to mitigate the forgetting of old tasks as new tasks are learned. This is inherently slow. We propose a new Shared Knowledge Lifelong Learning (SKILL) challenge, which deploys a decentral… ▽ More In Lifelong Learning (LL), agents continually learn as they encounter new conditions and tasks. Most current LL is limited to a single agent that learns tasks sequentially. Dedicated LL machinery is then deployed to mitigate the forgetting of old tasks as new tasks are learned. This is inherently slow. We propose a new Shared Knowledge Lifelong Learning (SKILL) challenge, which deploys a decentralized population of LL agents that each sequentially learn different tasks, with all agents operating independently and in parallel. After learning their respective tasks, agents share and consolidate their knowledge over a decentralized communication network, so that, in the end, all agents can master all tasks. We present one solution to SKILL which uses Lightweight Lifelong Learning (LLL) agents, where the goal is to facilitate efficient sharing by minimizing the fraction of the agent that is specialized for any given task. Each LLL agent thus consists of a common task-agnostic immutable part, where most parameters are, and individual task-specific modules that contain fewer parameters but are adapted to each task. Agents share their task-specific modules, plus summary information ("task anchors") representing their tasks in the common task-agnostic latent space of all agents. Receiving agents register each received task-specific module using the corresponding anchor. Thus, every agent improves its ability to solve new tasks each time new task-specific modules and anchors are received. On a new, very challenging SKILL-102 dataset with 102 image classification tasks (5,033 classes in total, 2,041,225 training, 243,464 validation, and 243,464 test images), we achieve much higher (and SOTA) accuracy over 8 LL baselines, while also achieving near perfect parallelization. Code and data can be found at https://github.com/gyhandy/Shared-Knowledge-Lifelong-Learning △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Transactions on Machine Learning Research (TMLR) paper

arXiv:2303.12898 [pdf, other]

Towards Understanding the Generalization of Medical Text-to-SQL Models and Datasets

Authors: Richard Tarbell, Kim-Kwang Raymond Choo, Glenn Dietrich, Anthony Rios

Abstract: Electronic medical records (EMRs) are stored in relational databases. It can be challenging to access the required information if the user is unfamiliar with the database schema or general database fundamentals. Hence, researchers have explored text-to-SQL generation methods that provide healthcare professionals direct access to EMR data without needing a database expert. However, currently availa… ▽ More Electronic medical records (EMRs) are stored in relational databases. It can be challenging to access the required information if the user is unfamiliar with the database schema or general database fundamentals. Hence, researchers have explored text-to-SQL generation methods that provide healthcare professionals direct access to EMR data without needing a database expert. However, currently available datasets have been essentially "solved" with state-of-the-art models achieving accuracy greater than or near 90%. In this paper, we show that there is still a long way to go before solving text-to-SQL generation in the medical domain. To show this, we create new splits of the existing medical text-to-SQL dataset MIMICSQL that better measure the generalizability of the resulting models. We evaluate state-of-the-art language models on our new split showing substantial drops in performance with accuracy drop** from up to 92% to 28%, thus showing substantial room for improvement. Moreover, we introduce a novel data augmentation approach to improve the generalizability of the language models. Overall, this paper is the first step towards develo** more robust text-to-SQL models in the medical domain.\footnote{The dataset and code will be released upon acceptance. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2301.06178 [pdf, other]

Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News

Authors: Xingmeng Zhao, Xavier Walton, Suhana Shrestha, Anthony Rios

Abstract: Increasing the number of cyclists, whether for general transport or recreation, can provide health improvements and reduce the environmental impact of vehicular transportation. However, the public's perception of cycling may be driven by the ideologies and reporting standards of news agencies. For instance, people may identify cyclists on the road as "dangerous" if news agencies overly report cycl… ▽ More Increasing the number of cyclists, whether for general transport or recreation, can provide health improvements and reduce the environmental impact of vehicular transportation. However, the public's perception of cycling may be driven by the ideologies and reporting standards of news agencies. For instance, people may identify cyclists on the road as "dangerous" if news agencies overly report cycling accidents, limiting the number of people that cycle for transportation. Moreover, if fewer people cycle, there may be less funding from the government to invest in safe infrastructure. In this paper, we explore the perceived perception of cyclists within news headlines. To accomplish this, we introduce a new dataset, "Bike Frames", that can help provide insight into how headlines portray cyclists and help detect accident-related headlines. Next, we introduce a multi-task (MT) regularization approach that increases the detection accuracy of accident-related posts, demonstrating improvements over traditional MT frameworks. Finally, we compare and contrast the perceptions of cyclists with motorcyclist-related headlines to ground the findings with another related activity for both male- and female-related posts. Our findings show that general news websites are more likely to report accidents about cyclists than other events. Moreover, cyclist-specific websites are more likely to report about accidents than motorcycling-specific websites, even though there is more potential danger for motorcyclists. Finally, we show substantial differences in the reporting about male vs. female-related persons, e.g., more male-related cyclists headlines are related to accidents, but more female-related motorcycling headlines about accidents. WARNING: This paper contains descriptions of accidents and death. △ Less

Submitted 15 January, 2023; originally announced January 2023.

arXiv:2301.01212 [pdf, ps, other]

doi 10.1007/978-3-031-15471-3_32

Assessment of creditworthiness models privacy-preserving training with synthetic data

Authors: Ricardo Muñoz-Cancino, Cristián Bravo, Sebastián A. Ríos, Manuel Graña

Abstract: Credit scoring models are the primary instrument used by financial institutions to manage credit risk. The scarcity of research on behavioral scoring is due to the difficult data access. Financial institutions have to maintain the privacy and security of borrowers' information refrain them from collaborating in research initiatives. In this work, we present a methodology that allows us to evaluate… ▽ More Credit scoring models are the primary instrument used by financial institutions to manage credit risk. The scarcity of research on behavioral scoring is due to the difficult data access. Financial institutions have to maintain the privacy and security of borrowers' information refrain them from collaborating in research initiatives. In this work, we present a methodology that allows us to evaluate the performance of models trained with synthetic data when they are applied to real-world data. Our results show that synthetic data quality is increasingly poor when the number of attributes increases. However, creditworthiness assessment models trained with synthetic data show a reduction of 3\% of AUC and 6\% of KS when compared with models trained with real data. These results have a significant impact since they encourage credit risk investigation from synthetic data, making it possible to maintain borrowers' privacy and to address problems that until now have been hampered by the availability of information. △ Less

Submitted 31 December, 2022; originally announced January 2023.

Journal ref: Hybrid Artificial Intelligent Systems. HAIS 2022. Lecture Notes in Computer Science(), vol 13469

arXiv:2212.12801 [pdf, other]

Linguistic Elements of Engaging Customer Service Discourse on Social Media

Authors: Sonam Singh, Anthony Rios

Abstract: Customers are rapidly turning to social media for customer support. While brand agents on these platforms are motivated and well-intentioned to help and engage with customers, their efforts are often ignored if their initial response to the customer does not match a specific tone, style, or topic the customer is aiming to receive. The length of a conversation can reflect the effort and quality of… ▽ More Customers are rapidly turning to social media for customer support. While brand agents on these platforms are motivated and well-intentioned to help and engage with customers, their efforts are often ignored if their initial response to the customer does not match a specific tone, style, or topic the customer is aiming to receive. The length of a conversation can reflect the effort and quality of the initial response made by a brand toward collaborating and hel** consumers, even when the overall sentiment of the conversation might not be very positive. Thus, through this study, we aim to bridge this critical gap in the existing literature by analyzing language's content and stylistic aspects such as expressed empathy, psycho-linguistic features, dialogue tags, and metrics for quantifying personalization of the utterances that can influence the engagement of an interaction. This paper demonstrates that we can predict engagement using initial customer and brand posts. △ Less

Submitted 24 December, 2022; originally announced December 2022.

Comments: Accepted to NLP+CSS at EMNLP 2022

arXiv:2212.12800 [pdf, other]

A Marker-based Neural Network System for Extracting Social Determinants of Health

Authors: Xingmeng Zhao, Anthony Rios

Abstract: Objective. The impact of social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured in free-text clinical notes, but there are limited methods for automatically extracting them. We explore a multi-stage pipeline involving named entity recognition… ▽ More Objective. The impact of social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured in free-text clinical notes, but there are limited methods for automatically extracting them. We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to extract SDoH information from clinical notes automatically. Materials and Methods. The study uses the N2C2 Shared Task data, which was collected from two sources of clinical notes: MIMIC-III and University of Washington Harborview Medical Centers. It contains 4480 social history sections with full annotation for twelve SDoHs. In order to handle the issue of overlap** entities, we developed a novel marker-based NER model. We used it in a multi-stage pipeline to extract SDoH information from clinical notes. Results. Our marker-based system outperformed the state-of-the-art span-based models at handling overlap** entities based on the overall Micro-F1 score performance. It also achieved state-of-the-art performance compared to the shared task methods. Conclusion. The major finding of this study is that the multi-stage pipeline effectively extracts SDoH information from clinical notes. This approach can potentially improve the understanding and tracking of SDoHs in clinical settings. However, error propagation may be an issue, and further research is needed to improve the extraction of entities with complex semantic meanings and low-resource entities using external knowledge. △ Less

Submitted 24 December, 2022; originally announced December 2022.

arXiv:2212.12799 [pdf, other]

A Comprehensive Study of Gender Bias in Chemical Named Entity Recognition Models

Authors: Xingmeng Zhao, Ali Niazi, Anthony Rios

Abstract: Chemical named entity recognition (NER) models are used in many downstream tasks, from adverse drug reaction identification to pharmacoepidemiology. However, it is unknown whether these models work the same for everyone. Performance disparities can potentially cause harm rather than the intended good. This paper assesses gender-related performance disparities in chemical NER systems. We develop a… ▽ More Chemical named entity recognition (NER) models are used in many downstream tasks, from adverse drug reaction identification to pharmacoepidemiology. However, it is unknown whether these models work the same for everyone. Performance disparities can potentially cause harm rather than the intended good. This paper assesses gender-related performance disparities in chemical NER systems. We develop a framework for measuring gender bias in chemical NER models using synthetic data and a newly annotated corpus of over 92,405 words with self-identified gender information from Reddit. Our evaluation of multiple biomedical NER models reveals evident biases. For instance, synthetic data suggests female-related names are frequently misclassified as chemicals, especially for brand name mentions. Additionally, we observe performance disparities between female- and male-associated data in both datasets. Many systems fail to detect contraceptives such as birth control. Our findings emphasize the biases in chemical NER models, urging practitioners to account for these biases in downstream applications. △ Less

Submitted 13 March, 2024; v1 submitted 24 December, 2022; originally announced December 2022.

arXiv:2212.00089 [pdf, other]

Ferroelectric FET based Context-Switching FPGA Enabling Dynamic Reconfiguration for Adaptive Deep Learning Machines

Authors: Yixin Xu, Zijian Zhao, Yi Xiao, Tongguang Yu, Halid Mulaosmanovic, Dominik Kleimaier, Stefan Duenkel, Sven Beyer, Xiao Gong, Rajiv Joshi, X. Sharon Hu, Shixian Wen, Amanda Sofie Rios, Kiran Lekkala, Laurent Itti, Eric Homan, Sumitha George, Vijaykrishnan Narayanan, Kai Ni

Abstract: Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. In this paper, we perfor… ▽ More Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. In this paper, we perform technology-circuit-architecture co-design to break this tradeoff with no additional area cost and lower power consumption compared with conventional designs while providing dynamic reconfiguration, which can hide the reconfiguration time behind the execution time. Leveraging the intrinsic transistor structure and non-volatility of ferroelectric FET (FeFET), compact FPGA primitives are proposed and experimentally verified, including 1FeFET look-up table (LUT) cell, 1FeFET routing cell for connection blocks (CBs) and switch boxes (SBs). To support dynamic reconfiguration, two local copies of primitives are placed in parallel, which enables loading of arbitrary configuration without interrupting the active configuration execution. A comprehensive evaluation shows that compared with the SRAM-based FPGA, our dynamic reconfiguration design shows 63.0%/71.1% reduction in LUT/CB area and 82.7%/53.6% reduction in CB/SB power consumption with minimal penalty in the critical path delay (9.6%). We further implement a Super-Sub network model to show the benefit from the context-switching capability of our design. We also evaluate the timing performance of our design over conventional FPGA in various application scenarios. In one scenario that users switch between two preloaded configurations, our design yields significant time saving by 78.7% on average. In the other scenario of implementing multiple configurations with dynamic reconfiguration, our design offers time saving of 20.3% on average. △ Less

Submitted 30 November, 2022; originally announced December 2022.

Comments: 54 pages, 15 figures

arXiv:2211.15464 [pdf, other]

Considerations for meaningful sign language machine translation based on glosses

Authors: Mathias Müller, Zifan Jiang, Amit Moryossef, Annette Rios, Sarah Ebling

Abstract: Automatic sign language processing is gaining popularity in Natural Language Processing (NLP) research (Yin et al., 2021). In machine translation (MT) in particular, sign language translation based on glosses is a prominent approach. In this paper, we review recent works on neural gloss translation. We find that limitations of glosses in general and limitations of specific datasets are not discuss… ▽ More Automatic sign language processing is gaining popularity in Natural Language Processing (NLP) research (Yin et al., 2021). In machine translation (MT) in particular, sign language translation based on glosses is a prominent approach. In this paper, we review recent works on neural gloss translation. We find that limitations of glosses in general and limitations of specific datasets are not discussed in a transparent manner and that there is no common standard for evaluation. To address these issues, we put forward concrete recommendations for future research on gloss translation. Our suggestions advocate awareness of the inherent limitations of gloss-based approaches, realistic datasets, stronger baselines and convincing evaluation. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2209.07353 [pdf, other]

Measuring Geographic Performance Disparities of Offensive Language Classifiers

Authors: Brandon Lwowski, Paul Rad, Anthony Rios

Abstract: Text classifiers are applied at scale in the form of one-size-fits-all solutions. Nevertheless, many studies show that classifiers are biased regarding different languages and dialects. When measuring and discovering these biases, some gaps present themselves and should be addressed. First, ``Does language, dialect, and topical content vary across geographical regions?'' and secondly ``If there ar… ▽ More Text classifiers are applied at scale in the form of one-size-fits-all solutions. Nevertheless, many studies show that classifiers are biased regarding different languages and dialects. When measuring and discovering these biases, some gaps present themselves and should be addressed. First, ``Does language, dialect, and topical content vary across geographical regions?'' and secondly ``If there are differences across the regions, do they impact model performance?''. We introduce a novel dataset called GeoOLID with more than 14 thousand examples across 15 geographically and demographically diverse cities to address these questions. We perform a comprehensive analysis of geographical-related content and their impact on performance disparities of offensive language detection models. Overall, we find that current models do not generalize across locations. Likewise, we show that while offensive language models produce false positives on African American English, model performance is not correlated with each city's minority population proportions. Warning: This paper contains offensive language. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: Accepted by 29th International Conference on Computational Linguistics (COLING 2022)

arXiv:2209.00470 [pdf, other]

Negation detection in Dutch clinical texts: an evaluation of rule-based and machine learning methods

Authors: Bram van Es, Leon C. Reteig, Sander C. Tan, Marijn Schraagen, Myrthe M. Hemker, Sebastiaan R. S. Arends, Miguel A. R. Rios, Saskia Haitjema

Abstract: As structured data are often insufficient, labels need to be extracted from free text in electronic health records when develo** models for clinical information retrieval and decision support systems. One of the most important contextual properties in clinical text is negation, which indicates the absence of findings. We aimed to improve large scale extraction of labels by comparing three method… ▽ More As structured data are often insufficient, labels need to be extracted from free text in electronic health records when develo** models for clinical information retrieval and decision support systems. One of the most important contextual properties in clinical text is negation, which indicates the absence of findings. We aimed to improve large scale extraction of labels by comparing three methods for negation detection in Dutch clinical notes. We used the Erasmus Medical Center Dutch Clinical Corpus to compare a rule-based method based on ContextD, a biLSTM model using MedCAT and (finetuned) RoBERTa-based models. We found that both the biLSTM and RoBERTa models consistently outperform the rule-based model in terms of F1 score, precision and recall. In addition, we systematically categorized the classification errors for each model, which can be used to further improve model performance in particular applications. Combining the three models naively was not beneficial in terms of performance. We conclude that the biLSTM and RoBERTa-based models in particular are highly accurate accurate in detecting clinical negations, but that ultimately all three approaches can be viable depending on the use case at hand. △ Less

Submitted 1 September, 2022; originally announced September 2022.

Comments: 24, 8, journal

MSC Class: 68T50; 68P20 ACM Class: I.2.7; J.3; H.3.3

arXiv:2204.06122 [pdf, other]

On the dynamics of credit history and social interaction features, and their impact on creditworthiness assessment performance

Authors: Ricardo Muñoz-Cancino, Cristián Bravo, Sebastián A. Ríos, Manuel Graña

Abstract: For more than a half-century, credit risk management has used credit scoring models in each of its well-defined stages to manage credit risk. Application scoring is used to decide whether to grant a credit or not, while behavioral scoring is used mainly for portfolio management and to take preventive actions in case of default signals. In both cases, network data has recently been shown to be valu… ▽ More For more than a half-century, credit risk management has used credit scoring models in each of its well-defined stages to manage credit risk. Application scoring is used to decide whether to grant a credit or not, while behavioral scoring is used mainly for portfolio management and to take preventive actions in case of default signals. In both cases, network data has recently been shown to be valuable to increase the predictive power of these models, especially when the borrower's historical data is scarce or not available. This study aims to understand the creditworthiness assessment performance dynamics and how it is influenced by the credit history, repayment behavior, and social network features. To accomplish this, we introduced a machine learning classification framework to analyze 97.000 individuals and companies from the moment they obtained their first loan to 12 months afterward. Our novel and massive dataset allow us to characterize each borrower according to their credit behavior, and social and economic relationships. Our research shows that borrowers' history increases performance at a decreasing rate during the first six months and then stabilizes. The most notable effect on perfomance of social networks features occurs at loan application; in personal scoring, this effect prevails a few months, while in business scoring adds value throughout the study period. These findings are of great value to improve credit risk management and optimize the use of traditional information and alternative data sources. △ Less

Submitted 12 April, 2022; originally announced April 2022.

arXiv:2203.14920 [pdf, other]

UTSA NLP at SemEval-2022 Task 4: An Exploration of Simple Ensembles of Transformers, Convolutional, and Recurrent Neural Networks

Authors: Xingmeng Zhao, Anthony Rios

Abstract: The act of appearing kind or helpful via the use of but having a feeling of superiority condescending and patronizing language can have have serious mental health implications to those that experience it. Thus, detecting this condescending and patronizing language online can be useful for online moderation systems. Thus, in this manuscript, we describe the system developed by Team UTSA SemEval-202… ▽ More The act of appearing kind or helpful via the use of but having a feeling of superiority condescending and patronizing language can have have serious mental health implications to those that experience it. Thus, detecting this condescending and patronizing language online can be useful for online moderation systems. Thus, in this manuscript, we describe the system developed by Team UTSA SemEval-2022 Task 4, Detecting Patronizing and Condescending Language. Our approach explores the use of several deep learning architectures including RoBERTa, convolutions neural networks, and Bidirectional Long Short-Term Memory Networks. Furthermore, we explore simple and effective methods to create ensembles of neural network models. Overall, we experimented with several ensemble models and found that the a simple combination of five RoBERTa models achieved an F-score of .6441 on the development dataset and .5745 on the final test dataset. Finally, we also performed a comprehensive error analysis to better understand the limitations of the model and provide ideas for further research. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Submitted to SemEval 2022

arXiv:2203.08694 [pdf, other]

Turning Stocks into Memes: A Dataset for Understanding How Social Communities Can Drive Wall Street

Authors: Richard Alvarez, Paras Bhatt, Xingmeng Zhao, Anthony Rios

Abstract: Who actually expresses an intent to buy GameStop shares on Reddit? What convinces people to buy stocks? Are people convinced to support a coordinated plan to adversely impact Wall Street investors? Existing literature on understanding intent has mainly relied on surveys and self reporting; however there are limitations to these methodologies. Hence, in this paper, we develop an annotated dataset o… ▽ More Who actually expresses an intent to buy GameStop shares on Reddit? What convinces people to buy stocks? Are people convinced to support a coordinated plan to adversely impact Wall Street investors? Existing literature on understanding intent has mainly relied on surveys and self reporting; however there are limitations to these methodologies. Hence, in this paper, we develop an annotated dataset of communications centered on the GameStop phenomenon to analyze the subscriber intentions behaviors within the r/WallStreetBets community to buy (or not buy) stocks. Likewise, we curate a dataset to better understand how intent interacts with a user's general support towards the coordinated actions of the community for GameStop. Overall, our dataset can provide insight to social scientists on the persuasive power to buy into social movements online by adopting common language and narrative. WARNING: This paper contains offensive language that commonly appears on Reddit's r/WallStreetBets subreddit. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: Accepted to ICWSM 2022

arXiv:2202.06946 [pdf]

Prototy** a Virtual Agent for Pre-school English Teaching

Authors: Eduardo Benitez Sandoval, Diego Vazquez Rojas, Clarissa A. Parada Cereceres, Alvaro Anzueto Rios, Amit Barde, Mark Billinghurst

Abstract: This paper describes a case study and the insights gained from prototy** an Intelligent Virtual Agent (IVA) for English vocabulary building for Spanish-speaking preschool children. After an initial exploration to evaluate the feasibility of develo** an IVA, we followed a Human-Centered Design (HCD) approach to create a prototype. We report on the multidisciplinary process used that incorporate… ▽ More This paper describes a case study and the insights gained from prototy** an Intelligent Virtual Agent (IVA) for English vocabulary building for Spanish-speaking preschool children. After an initial exploration to evaluate the feasibility of develo** an IVA, we followed a Human-Centered Design (HCD) approach to create a prototype. We report on the multidisciplinary process used that incorporated two well-known educative concepts: gamification and story-telling as the main components for engagement. Our results suggest that a multidisciplinary approach to develo** an educational IVA is effective. We report on the relevant aspects of the ideation and design processes that informed the vision and mission of the project. △ Less

Submitted 8 February, 2022; originally announced February 2022.

Comments: Accepted in the IEEE Virtual Reality Conference 2022, Christchurch, New Zealand

ACM Class: I.3.8; K.3.1

arXiv:2201.08098 [pdf, other]

What can we learn from misclassified ImageNet images?

Authors: Shixian Wen, Amanda Sofie Rios, Kiran Lekkala, Laurent Itti

Abstract: Understanding the patterns of misclassified ImageNet images is particularly important, as it could guide us to design deep neural networks (DNN) that generalize better. However, the richness of ImageNet imposes difficulties for researchers to visually find any useful patterns of misclassification. Here, to help find these patterns, we propose "Superclassing ImageNet dataset". It is a subset of Ima… ▽ More Understanding the patterns of misclassified ImageNet images is particularly important, as it could guide us to design deep neural networks (DNN) that generalize better. However, the richness of ImageNet imposes difficulties for researchers to visually find any useful patterns of misclassification. Here, to help find these patterns, we propose "Superclassing ImageNet dataset". It is a subset of ImageNet which consists of 10 superclasses, each containing 7-116 related subclasses (e.g., 52 bird types, 116 dog types). By training neural networks on this dataset, we found that: (i) Misclassifications are rarely across superclasses, but mainly among subclasses within a superclass. (ii) Ensemble networks trained each only on subclasses of a given superclass perform better than the same network trained on all subclasses of all superclasses. Hence, we propose a two-stage Super-Sub framework, and demonstrate that: (i) The framework improves overall classification performance by 3.3%, by first inferring a superclass using a generalist superclass-level network, and then using a specialized network for final subclass-level classification. (ii) Although the total parameter storage cost increases to a factor N+1 for N superclasses compared to using a single network, with finetuning, delta and quantization aware training techniques this can be reduced to 0.2N+1. Another advantage of this efficient implementation is that the memory cost on the GPU during inference is equivalent to using only one network. The reason is we initiate each subclass-level network through addition of small parameter variations (deltas) to the superclass-level network. (iii) Finally, our framework promises to be more scalable and generalizable than the common alternative of simply scaling up a vanilla network in size, since very large networks often suffer from overfitting and gradient vanishing. △ Less

Submitted 20 January, 2022; originally announced January 2022.

arXiv:2111.13666 [pdf, other]

doi 10.1016/j.eswa.2022.118809

On the combination of graph data for assessing thin-file borrowers' creditworthiness

Authors: Ricardo Muñoz-Cancino, Cristián Bravo, Sebastián A. Ríos, Manuel Graña

Abstract: The thin-file borrowers are customers for whom a creditworthiness assessment is uncertain due to their lack of credit history; many researchers have used borrowers' relationships and interactions networks in the form of graphs as an alternative data source to address this. Incorporating network data is traditionally made by hand-crafted feature engineering, and lately, the graph neural network has… ▽ More The thin-file borrowers are customers for whom a creditworthiness assessment is uncertain due to their lack of credit history; many researchers have used borrowers' relationships and interactions networks in the form of graphs as an alternative data source to address this. Incorporating network data is traditionally made by hand-crafted feature engineering, and lately, the graph neural network has emerged as an alternative, but it still does not improve over the traditional method's performance. Here we introduce a framework to improve credit scoring models by blending several Graph Representation Learning methods: feature engineering, graph embeddings, and graph neural networks. We stacked their outputs to produce a single score in this approach. We validated this framework using a unique multi-source dataset that characterizes the relationships and credit history for the entire population of a Latin American country, applying it to credit risk models, application, and behavior, targeting both individuals and companies. Our results show that the graph representation learning methods should be used as complements, and these should not be seen as self-sufficient methods as is currently done. In terms of AUC and KS, we enhance the statistical performance, outperforming traditional methods. In Corporate lending, where the gain is much higher, it confirms that evaluating an unbanked company cannot solely consider its features. The business ecosystem where these firms interact with their owners, suppliers, customers, and other companies provides novel knowledge that enables financial institutions to enhance their creditworthiness assessment. Our results let us know when and which group to use graph data and what effects on performance to expect. They also show the enormous value of graph data on the unbanked credit scoring problem, principally to help companies' banking. △ Less

Submitted 16 September, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

Journal ref: Expert Systems with Applications, 2022, 118809

arXiv:2111.08174 [pdf, other]

ShapeY: Measuring Shape Recognition Capacity Using Nearest Neighbor Matching

Authors: Jong Woo Nam, Amanda S. Rios, Bartlett W. Mel

Abstract: Object recognition in humans depends primarily on shape cues. We have developed a new approach to measuring the shape recognition performance of a vision system based on nearest neighbor view matching within the system's embedding space. Our performance benchmark, ShapeY, allows for precise control of task difficulty, by enforcing that view matching span a specified degree of 3D viewpoint change a… ▽ More Object recognition in humans depends primarily on shape cues. We have developed a new approach to measuring the shape recognition performance of a vision system based on nearest neighbor view matching within the system's embedding space. Our performance benchmark, ShapeY, allows for precise control of task difficulty, by enforcing that view matching span a specified degree of 3D viewpoint change and/or appearance change. As a first test case we measured the performance of ResNet50 pre-trained on ImageNet. Matching error rates were high. For example, a 27 degree change in object pitch led ResNet50 to match the incorrect object 45% of the time. Appearance changes were also highly disruptive. Examination of false matches indicates that ResNet50's embedding space is severely "tangled". These findings suggest ShapeY can be a useful tool for charting the progress of artificial vision systems towards human-level shape recognition capabilities. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: 6 pages, 5 figures, Accepted to NeurIPS: ImageNet Past, Present, and Future

arXiv:2107.08030 [pdf, other]

A New Robust Multivariate Mode Estimator for Eye-tracking Calibration

Authors: Adrien Brilhault, Sergio Neuenschwander, Ricardo Araujo Rios

Abstract: We propose in this work a new method for estimating the main mode of multivariate distributions, with application to eye-tracking calibrations. When performing eye-tracking experiments with poorly cooperative subjects, such as infants or monkeys, the calibration data generally suffer from high contamination. Outliers are typically organized in clusters, corresponding to the time intervals when sub… ▽ More We propose in this work a new method for estimating the main mode of multivariate distributions, with application to eye-tracking calibrations. When performing eye-tracking experiments with poorly cooperative subjects, such as infants or monkeys, the calibration data generally suffer from high contamination. Outliers are typically organized in clusters, corresponding to the time intervals when subjects were not looking at the calibration points. In this type of multimodal distributions, most central tendency measures fail at estimating the principal fixation coordinates (the first mode), resulting in errors and inaccuracies when map** the gaze to the screen coordinates. Here, we developed a new algorithm to identify the first mode of multivariate distributions, named BRIL, which rely on recursive depth-based filtering. This novel approach was tested on artificial mixtures of Gaussian and Uniform distributions, and compared to existing methods (conventional depth medians, robust estimators of location and scatter, and clustering-based approaches). We obtained outstanding performances, even for distributions containing very high proportions of outliers, both grouped in clusters and randomly distributed. Finally, we demonstrate the strength of our method in a real-world scenario using experimental data from eye-tracking calibrations with Capuchin monkeys, especially for distributions where other algorithms typically lack accuracy. △ Less

Submitted 16 July, 2021; originally announced July 2021.

arXiv:2106.06811 [pdf, other]

Case Study on Detecting COVID-19 Health-Related Misinformation in Social Media

Authors: Mir Mehedi A. Pritom, Rosana Montanez Rodriguez, Asad Ali Khan, Sebastian A. Nugroho, Esra'a Alrashydah, Beatrice N. Ruiz, Anthony Rios

Abstract: COVID-19 pandemic has generated what public health officials called an infodemic of misinformation. As social distancing and stay-at-home orders came into effect, many turned to social media for socializing. This increase in social media usage has made it a prime vehicle for the spreading of misinformation. This paper presents a mechanism to detect COVID-19 health-related misinformation in social… ▽ More COVID-19 pandemic has generated what public health officials called an infodemic of misinformation. As social distancing and stay-at-home orders came into effect, many turned to social media for socializing. This increase in social media usage has made it a prime vehicle for the spreading of misinformation. This paper presents a mechanism to detect COVID-19 health-related misinformation in social media following an interdisciplinary approach. Leveraging social psychology as a foundation and existing misinformation frameworks, we defined misinformation themes and associated keywords incorporated into the misinformation detection mechanism using applied machine learning techniques. Next, using the Twitter dataset, we explored the performance of the proposed methodology using multiple state-of-the-art machine learning classifiers. Our method shows promising results with at most 78% accuracy in classifying health-related misinformation versus true information using uni-gram-based NLP feature generations from tweets and the Decision Tree classifier. We also provide suggestions on alternatives for countering misinformation and ethical consideration for the study. △ Less

Submitted 12 June, 2021; originally announced June 2021.

Comments: 10 pages

arXiv:2106.01170 [pdf, other]

Detecting Bot-Generated Text by Characterizing Linguistic Accommodation in Human-Bot Interactions

Authors: Paras Bhatt, Anthony Rios

Abstract: Language generation models' democratization benefits many domains, from answering health-related questions to enhancing education by providing AI-driven tutoring services. However, language generation models' democratization also makes it easier to generate human-like text at-scale for nefarious activities, from spreading misinformation to targeting specific groups with hate speech. Thus, it is es… ▽ More Language generation models' democratization benefits many domains, from answering health-related questions to enhancing education by providing AI-driven tutoring services. However, language generation models' democratization also makes it easier to generate human-like text at-scale for nefarious activities, from spreading misinformation to targeting specific groups with hate speech. Thus, it is essential to understand how people interact with bots and develop methods to detect bot-generated text. This paper shows that bot-generated text detection methods are more robust across datasets and models if we use information about how people respond to it rather than using the bot's text directly. We also analyze linguistic alignment, providing insight into differences between human-human and human-bot conversations. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: 13 pages, to be published in Findings of ACL-IJCNLP 2021

arXiv:2104.10166 [pdf, other]

Evaluating the Immediate Applicability of Pose Estimation for Sign Language Recognition

Authors: Amit Moryossef, Ioannis Tsochantaridis, Joe Dinn, Necati Cihan Camgöz, Richard Bowden, Tao Jiang, Annette Rios, Mathias Müller, Sarah Ebling

Abstract: Signed languages are visual languages produced by the movement of the hands, face, and body. In this paper, we evaluate representations based on skeleton poses, as these are explainable, person-independent, privacy-preserving, low-dimensional representations. Basically, skeletal representations generalize over an individual's appearance and background, allowing us to focus on the recognition of mo… ▽ More Signed languages are visual languages produced by the movement of the hands, face, and body. In this paper, we evaluate representations based on skeleton poses, as these are explainable, person-independent, privacy-preserving, low-dimensional representations. Basically, skeletal representations generalize over an individual's appearance and background, allowing us to focus on the recognition of motion. But how much information is lost by the skeletal representation? We perform two independent studies using two state-of-the-art pose estimation systems. We analyze the applicability of the pose estimation systems to sign language recognition by evaluating the failure cases of the recognition models. Importantly, this allows us to characterize the current limitations of skeletal pose estimation approaches in sign language recognition. △ Less

Submitted 20 April, 2021; originally announced April 2021.

arXiv:2104.08726 [pdf, other]

AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages

Authors: Abteen Ebrahimi, Manuel Mager, Arturo Oncevay, Vishrav Chaudhary, Luis Chiruzzo, Angela Fan, John Ortega, Ricardo Ramos, Annette Rios, Ivan Meza-Ruiz, Gustavo A. Giménez-Lugo, Elisabeth Mager, Graham Neubig, Alexis Palmer, Rolando Coto-Solano, Ngoc Thang Vu, Katharina Kann

Abstract: Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible for unseen languages. To explore this question, we… ▽ More Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, an extension of XNLI (Conneau et al., 2018) to 10 indigenous languages of the Americas. We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches. Additionally, we explore model adaptation via continued pretraining and provide an analysis of the dataset by considering hypothesis-only models. We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.62%. Continued pretraining offers improvements, with an average accuracy of 44.05%. Surprisingly, training on poorly translated data by far outperforms all other methods with an accuracy of 48.72%. △ Less

Submitted 16 March, 2022; v1 submitted 18 April, 2021; originally announced April 2021.

Comments: Accepted to ACL 2022

arXiv:2104.03945 [pdf, other]

On Biasing Transformer Attention Towards Monotonicity

Authors: Annette Rios, Chantal Amrhein, Noëmi Aepli, Rico Sennrich

Abstract: Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it o… ▽ More Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior. △ Less

Submitted 8 April, 2021; originally announced April 2021.

Comments: To be published in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2021)

arXiv:2103.12028 [pdf, other]

doi 10.1162/tacl_a_00447

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Authors: Julia Kreutzer, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Ortiz Suarez, Iroro Orife, Kelechi Ogueji, Andre Niyongabo Rubungo, Toan Q. Nguyen, Mathias Müller, André Müller , et al. (27 additional authors not shown)

Abstract: With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have system… ▽ More With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases. △ Less

Submitted 21 February, 2022; v1 submitted 22 March, 2021; originally announced March 2021.

Comments: Accepted at TACL; pre-MIT Press publication version

Journal ref: Transactions of the Association for Computational Linguistics (2022) 10: 50-72

arXiv:2101.08674 [pdf, other]

DAF:re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition

Authors: Edwin Arkel Rios, Wen-Huang Cheng, Bo-Cheng Lai

Abstract: In this work we tackle the challenging problem of anime character recognition. Anime, referring to animation produced within Japan and work derived or inspired from it. For this purpose we present DAF:re (DanbooruAnimeFaces:revamped), a large-scale, crowd-sourced, long-tailed dataset with almost 500 K images spread across more than 3000 classes. Additionally, we conduct experiments on DAF:re and s… ▽ More In this work we tackle the challenging problem of anime character recognition. Anime, referring to animation produced within Japan and work derived or inspired from it. For this purpose we present DAF:re (DanbooruAnimeFaces:revamped), a large-scale, crowd-sourced, long-tailed dataset with almost 500 K images spread across more than 3000 classes. Additionally, we conduct experiments on DAF:re and similar datasets using a variety of classification models, including CNN based ResNets and self-attention based Vision Transformer (ViT). Our results give new insights into the generalization and transfer learning properties of ViT models on substantially different domain datasets from those used for the upstream pre-training, including the influence of batch and image size in their training. Additionally, we share our dataset, source-code, pre-trained checkpoints and results, as Animesion, the first end-to-end framework for large-scale anime character recognition: https://github.com/arkel23/animesion △ Less

Submitted 21 January, 2021; originally announced January 2021.

Comments: 5 pages, 3 figures, 4 tables

ACM Class: I.2; I.4

arXiv:2011.13429 [pdf]

Explaining Deep Learning Models for Structured Data using Layer-Wise Relevance Propagation

Authors: hsan Ullah, Andre Rios, Vaibhav Gala, Susan Mckeever

Abstract: Trust and credibility in machine learning models is bolstered by the ability of a model to explain itsdecisions. While explainability of deep learning models is a well-known challenge, a further chal-lenge is clarity of the explanation itself, which must be interpreted by downstream users. Layer-wiseRelevance Propagation (LRP), an established explainability technique developed for deep models inco… ▽ More Trust and credibility in machine learning models is bolstered by the ability of a model to explain itsdecisions. While explainability of deep learning models is a well-known challenge, a further chal-lenge is clarity of the explanation itself, which must be interpreted by downstream users. Layer-wiseRelevance Propagation (LRP), an established explainability technique developed for deep models incomputer vision, provides intuitive human-readable heat maps of input images. We present the novelapplication of LRP for the first time with structured datasets using a deep neural network (1D-CNN),for Credit Card Fraud detection and Telecom Customer Churn prediction datasets. We show how LRPis more effective than traditional explainability concepts of Local Interpretable Model-agnostic Ex-planations (LIME) and Shapley Additive Explanations (SHAP) for explainability. This effectivenessis both local to a sample level and holistic over the whole testing set. We also discuss the significantcomputational time advantage of LRP (1-2s) over LIME (22s) and SHAP (108s), and thus its poten-tial for real time application scenarios. In addition, our validation of LRP has highlighted features forenhancing model performance, thus opening up a new area of research of using XAI as an approachfor feature subset selection △ Less

Submitted 26 November, 2020; originally announced November 2020.

Comments: 13 pages, 5 figures, 6 tables

arXiv:2011.04783 [pdf, other]

Lifelong Learning Without a Task Oracle

Authors: Amanda Rios, Laurent Itti

Abstract: Supervised deep neural networks are known to undergo a sharp decline in the accuracy of older tasks when new tasks are learned, termed "catastrophic forgetting". Many state-of-the-art solutions to continual learning rely on biasing and/or partitioning a model to accommodate successive tasks incrementally. However, these methods largely depend on the availability of a task-oracle to confer task ide… ▽ More Supervised deep neural networks are known to undergo a sharp decline in the accuracy of older tasks when new tasks are learned, termed "catastrophic forgetting". Many state-of-the-art solutions to continual learning rely on biasing and/or partitioning a model to accommodate successive tasks incrementally. However, these methods largely depend on the availability of a task-oracle to confer task identities to each test sample, without which the models are entirely unable to perform. To address this shortcoming, we propose and compare several candidate task-assigning mappers which require very little memory overhead: (1) Incremental unsupervised prototype assignment using either nearest means, Gaussian Mixture Models or fuzzy ART backbones; (2) Supervised incremental prototype assignment with fast fuzzy ARTMAP; (3) Shallow perceptron trained via a dynamic coreset. Our proposed model variants are trained either from pre-trained feature extractors or task-dependent feature embeddings of the main classifier network. We apply these pipeline variants to continual learning benchmarks, comprised of either sequences of several datasets or within one single dataset. Overall, these methods, despite their simplicity and compactness, perform very close to a ground truth oracle, especially in experiments of inter-dataset task assignment. Moreover, best-performing variants only impose an average cost of 1.7% parameter memory increase. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: Proceedings of the IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI 2020)

arXiv:2011.01703 [pdf, other]

Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation

Authors: Annette Rios, Mathias Müller, Rico Sennrich

Abstract: Zero-shot neural machine translation is an attractive goal because of the high cost of obtaining data and building translation systems for new translation directions. However, previous papers have reported mixed success in zero-shot translation. It is hard to predict in which settings it will be effective, and what limits performance compared to a fully supervised system. In this paper, we investi… ▽ More Zero-shot neural machine translation is an attractive goal because of the high cost of obtaining data and building translation systems for new translation directions. However, previous papers have reported mixed success in zero-shot translation. It is hard to predict in which settings it will be effective, and what limits performance compared to a fully supervised system. In this paper, we investigate zero-shot performance of a multilingual EN$\leftrightarrow${FR,CS,DE,FI} system trained on WMT data. We find that zero-shot performance is highly unstable and can vary by more than 6 BLEU between training runs, making it difficult to reliably track improvements. We observe a bias towards copying the source in zero-shot translation, and investigate how the choice of subword segmentation affects this bias. We find that language-specific subword segmentation results in less subword copying at training time, and leads to better zero-shot performance compared to jointly trained segmentation. A recent trend in multilingual models is to not train on parallel data between all language pairs, but have a single bridge language, e.g. English. We find that this negatively affects zero-shot translation and leads to a failure mode where the model ignores the language tag and instead produces English output in zero-shot directions. We show that this bias towards English can be effectively reduced with even a small amount of parallel data in some of the non-English pairs. △ Less

Submitted 3 November, 2020; originally announced November 2020.

Comments: Accepted at WMT 2020

arXiv:2009.13954 [pdf, other]

doi 10.1109/TNNLS.2021.3054423

Beneficial Perturbation Network for designing general adaptive artificial intelligence systems

Authors: Shixian Wen, Amanda Rios, Yunhao Ge, Laurent Itti

Abstract: The human brain is the gold standard of adaptive learning. It not only can learn and benefit from experience, but also can adapt to new situations. In contrast, deep neural networks only learn one sophisticated but fixed map** from inputs to outputs. This limits their applicability to more dynamic situations, where input to output map** may change with different contexts. A salient example is… ▽ More The human brain is the gold standard of adaptive learning. It not only can learn and benefit from experience, but also can adapt to new situations. In contrast, deep neural networks only learn one sophisticated but fixed map** from inputs to outputs. This limits their applicability to more dynamic situations, where input to output map** may change with different contexts. A salient example is continual learning - learning new independent tasks sequentially without forgetting previous tasks. Continual learning of multiple tasks in artificial neural networks using gradient descent leads to catastrophic forgetting, whereby a previously learned map** of an old task is erased when learning new map**s for new tasks. Here, we propose a new biologically plausible type of deep neural network with extra, out-of-network, task-dependent biasing units to accommodate these dynamic situations. This allows, for the first time, a single network to learn potentially unlimited parallel input to output map**s, and to switch on the fly between them at runtime. Biasing units are programmed by leveraging beneficial perturbations (opposite to well-known adversarial perturbations) for each task. Beneficial perturbations for a given task bias the network toward that task, essentially switching the network into a different mode to process that task. This largely eliminates catastrophic interference between tasks. Our approach is memory-efficient and parameter-efficient, can accommodate many tasks, and achieves state-of-the-art performance across different tasks and domains. △ Less

Submitted 1 February, 2021; v1 submitted 26 September, 2020; originally announced September 2020.

Comments: Accepted at IEEE Transactions on Neural Networks and Learning Systems Keyword: Adaptive artificial intelligence system , Switch modes , Beneficial perturbations , Continual learning , Adversarial examples

Journal ref: IEEE Transactions on Neural Networks and Learning Systems 2021

arXiv:2009.12724 [pdf, other]

Beneficial Perturbations Network for Defending Adversarial Examples

Authors: Shixian Wen, Amanda Rios, Laurent Itti

Abstract: Deep neural networks can be fooled by adversarial attacks: adding carefully computed small adversarial perturbations to clean inputs can cause misclassification on state-of-the-art machine learning models. The reason is that neural networks fail to accommodate the distribution drift of the input data caused by adversarial perturbations. Here, we present a new solution - Beneficial Perturbation Net… ▽ More Deep neural networks can be fooled by adversarial attacks: adding carefully computed small adversarial perturbations to clean inputs can cause misclassification on state-of-the-art machine learning models. The reason is that neural networks fail to accommodate the distribution drift of the input data caused by adversarial perturbations. Here, we present a new solution - Beneficial Perturbation Network (BPN) - to defend against adversarial attacks by fixing the distribution drift. During training, BPN generates and leverages beneficial perturbations (somewhat opposite to well-known adversarial perturbations) by adding new, out-of-network biasing units. Biasing units influence the parameter space of the network, to preempt and neutralize future adversarial perturbations on input data samples. To achieve this, BPN creates reverse adversarial attacks during training, with very little cost, by recycling the training gradients already computed. Reverse attacks are captured by the biasing units, and the biases can in turn effectively defend against future adversarial examples. Reverse attacks are a shortcut, i.e., they affect the network's parameters without requiring instantiation of adversarial examples that could assist training. We provide comprehensive empirical evidence showing that 1) BPN is robust to adversarial examples and is much more running memory and computationally efficient compared to classical adversarial training. 2) BPN can defend against adversarial examples with negligible additional computation and parameter costs compared to training only on clean examples; 3) BPN hurts the accuracy on clean examples much less than classic adversarial training; 4) BPN can improve the generalization of the network 5) BPN trained only with Fast Gradient Sign Attack can generalize to defend PGD attacks. △ Less

Submitted 13 September, 2021; v1 submitted 26 September, 2020; originally announced September 2020.

Comments: The paper is under consideration at Pattern Recognition Letters

arXiv:2006.07674 [pdf, ps, other]

doi 10.1016/j.entcs.2020.08.006

Pure Pattern Calculus à la de Bruijn

Authors: Alexis Martín, Alejandro Ríos, Andrés Viso

Abstract: It is well-known in the field of programming languages that dealing with variable names and binders may lead to conflicts such as undesired captures when implementing interpreters or compilers. This situation has been overcome by resorting to de Bruijn indices for calculi where binders capture only one variable name, like the $λ$-calculus. The advantage of this approach relies on the fact that so-… ▽ More It is well-known in the field of programming languages that dealing with variable names and binders may lead to conflicts such as undesired captures when implementing interpreters or compilers. This situation has been overcome by resorting to de Bruijn indices for calculi where binders capture only one variable name, like the $λ$-calculus. The advantage of this approach relies on the fact that so-called $α$-equivalence becomes syntactical equality when working with indices. In recent years pattern calculi have gained considerable attention given their expressiveness. They turn out to be notoriously convenient to study the foundations of modern functional programming languages modeling features like pattern matching, path polymorphism, pattern polymorphism, etc. However, the literature falls short when it comes to dealing with $α$-conversion and binders capturing simultaneously several variable names. Such is the case of the Pure Pattern Calculus (PPC): a natural extension of $λ$-calculus that allows to abstract virtually any term. This paper extends de Bruijn's ideas to properly overcome the multi-binding problem by introducing a novel presentation of PPC with bidimensional indices, in an effort to implement a prototype for a typed functional programming language based on PPC that captures path polymorphism. △ Less

Submitted 28 June, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

arXiv:2002.04011 [pdf, ps, other]

doi 10.1007/978-3-030-59025-3\_2

The Bang Calculus Revisited

Authors: Antonio Bucciarelli, Delia Kesner, Alejandro Ríos, Andrés Viso

Abstract: Call-by-Push-Value (CBPV) is a programming paradigm subsuming both Callby-Name (CBN) and Call-by-Value (CBV) semantics. The essence of this paradigm is captured by the Bang Calculus, a (concise) term language connecting CBPV and Linear Logic. This paper presents a revisited version of the Bang Calculus, called $λ!$, enjoying some important properties missing in the original formulation. Indeed,… ▽ More Call-by-Push-Value (CBPV) is a programming paradigm subsuming both Callby-Name (CBN) and Call-by-Value (CBV) semantics. The essence of this paradigm is captured by the Bang Calculus, a (concise) term language connecting CBPV and Linear Logic. This paper presents a revisited version of the Bang Calculus, called $λ!$, enjoying some important properties missing in the original formulation. Indeed, the new calculus integrates permutative conversions to unblock value redexes while being confluent at the same time. A second contribution is related to nonidempotent types. We provide a quantitative type system for our $λ!$-calculus, and we show that the length of the (weak) reduction of a typed term to its normal form plus the size of this normal form is bounded by the size of its type derivation. We also explore the properties of this type system with respect to CBN/CBV translations. We keep the original CBN translation from $λ$-calculus to the Bang Calculus, which preserves normal forms and is sound and complete with respect to the (quantitative) type system for CBN. However, in the case of CBV, we reformulate both the translation and the type system to restore two main properties: preservation of normal forms and completeness. Last but not least, the quantitative system is refined to a tight one, which transforms the previous upper bound on the length of reduction to normal form plus its size into two independent exact measures for them. △ Less

Submitted 5 May, 2023; v1 submitted 10 February, 2020; originally announced February 2020.

arXiv:1911.03109 [pdf, other]

Domain Robustness in Neural Machine Translation

Authors: Mathias Müller, Annette Rios, Rico Sennrich

Abstract: Translating text that diverges from the training domain is a key challenge for machine translation. Domain robustness---the generalization of models to unseen test domains---is low for both statistical (SMT) and neural machine translation (NMT). In this paper, we study the performance of SMT and NMT models on out-of-domain test sets. We find that in unknown domains, SMT and NMT suffer from very di… ▽ More Translating text that diverges from the training domain is a key challenge for machine translation. Domain robustness---the generalization of models to unseen test domains---is low for both statistical (SMT) and neural machine translation (NMT). In this paper, we study the performance of SMT and NMT models on out-of-domain test sets. We find that in unknown domains, SMT and NMT suffer from very different problems: SMT systems are mostly adequate but not fluent, while NMT systems are mostly fluent, but not adequate. For NMT, we identify such hallucinations (translations that are fluent but unrelated to the source) as a key reason for low domain robustness. To mitigate this problem, we empirically compare methods that are reported to improve adequacy or in-domain robustness in terms of their effectiveness at improving domain robustness. In experiments on German to English OPUS data, and German to Romansh (a low-resource setting) we find that several methods improve domain robustness. While those methods do lead to higher BLEU scores overall, they only slightly increase the adequacy of translations compared to SMT. △ Less

Submitted 24 September, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: V2: AMTA camera-ready

arXiv:1811.02668 [pdf]

Automated Diagnosis of Lymphoma with Digital Pathology Images Using Deep Learning

Authors: Hanadi El Achi, Tatiana Belousova, Lei Chen, Amer Wahed, Iris Wang, Zhihong Hu, Zeyad Kanaan, Adan Rios, Andy N. D. Nguyen

Abstract: Recent studies have shown promising results in using Deep Learning to detect malignancy in whole slide imaging. However, they were limited to just predicting positive or negative finding for a specific neoplasm. We attempted to use Deep Learning with a convolutional neural network algorithm to build a lymphoma diagnostic model for four diagnostic categories: benign lymph node, diffuse large B cell… ▽ More Recent studies have shown promising results in using Deep Learning to detect malignancy in whole slide imaging. However, they were limited to just predicting positive or negative finding for a specific neoplasm. We attempted to use Deep Learning with a convolutional neural network algorithm to build a lymphoma diagnostic model for four diagnostic categories: benign lymph node, diffuse large B cell lymphoma, Burkitt lymphoma, and small lymphocytic lymphoma. Our software was written in Python language. We obtained digital whole slide images of Hematoxylin and Eosin stained slides of 128 cases including 32 cases for each diagnostic category. Four sets of 5 representative images, 40x40 pixels in dimension, were taken for each case. A total of 2,560 images were obtained from which 1,856 were used for training, 464 for validation and 240 for testing. For each test set of 5 images, the predicted diagnosis was combined from prediction of 5 images. The test results showed excellent diagnostic accuracy at 95% for image-by-image prediction and at 10% for set-by-set prediction. This preliminary study provided a proof of concept for incorporating automated lymphoma diagnostic screen into future pathology workflow to augment the pathologists' productivity. △ Less

Submitted 30 October, 2018; originally announced November 2018.

Comments: 13 pages, 2 figures, 2 tables

arXiv:1811.01146 [pdf, other]

Closed-Loop Memory GAN for Continual Learning

Authors: Amanda Rios, Laurent Itti

Abstract: Sequential learning of tasks using gradient descent leads to an unremitting decline in the accuracy of tasks for which training data is no longer available, termed catastrophic forgetting. Generative models have been explored as a means to approximate the distribution of old tasks and bypass storage of real data. Here we propose a cumulative closed-loop memory replay GAN (CloGAN) provided with ext… ▽ More Sequential learning of tasks using gradient descent leads to an unremitting decline in the accuracy of tasks for which training data is no longer available, termed catastrophic forgetting. Generative models have been explored as a means to approximate the distribution of old tasks and bypass storage of real data. Here we propose a cumulative closed-loop memory replay GAN (CloGAN) provided with external regularization by a small memory unit selected for maximum sample diversity. We evaluate incremental class learning using a notoriously hard paradigm, single-headed learning, in which each task is a disjoint subset of classes in the overall dataset, and performance is evaluated on all previous classes. First, we show that when constructing a dynamic memory unit to preserve sample heterogeneity, model performance asymptotically approaches training on the full dataset. We then show that using a stochastic generator to continuously output fresh new images during training increases performance significantly further meanwhile generating quality images. We compare our approach to several baselines including fine-tuning by gradient descent (FGD), Elastic Weight Consolidation (EWC), Deep Generative Replay (DGR) and Memory Replay GAN (MeRGAN). Our method has very low long-term memory cost, the memory unit, as well as negligible intermediate memory storage. △ Less

Submitted 28 September, 2020; v1 submitted 2 November, 2018; originally announced November 2018.

Comments: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-2019). https://doi.org/10.24963/ijcai.2019/462

arXiv:1810.13247 [pdf]

Application of Deep Learning on Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations

Authors: Mei Lin, Vanya Jaitly, Iris Wang, Zhihong Hu, Lei Chen, Md. Amer Wahed, Zeyad Kanaan, Adan Rios, Andy N. D. Nguyen

Abstract: We explore how Deep Learning (DL) can be utilized to predict prognosis of acute myeloid leukemia (AML). Out of TCGA (The Cancer Genome Atlas) database, 94 AML cases are used in this study. Input data include age, 10 common cytogenetic and 23 most common mutation results; output is the prognosis (diagnosis to death, DTD). In our DL network, autoencoders are stacked to form a hierarchical DL model f… ▽ More We explore how Deep Learning (DL) can be utilized to predict prognosis of acute myeloid leukemia (AML). Out of TCGA (The Cancer Genome Atlas) database, 94 AML cases are used in this study. Input data include age, 10 common cytogenetic and 23 most common mutation results; output is the prognosis (diagnosis to death, DTD). In our DL network, autoencoders are stacked to form a hierarchical DL model from which raw data are compressed and organized and high-level features are extracted. The network is written in R language and is designed to predict prognosis of AML for a given case (DTD of more than or less than 730 days). The DL network achieves an excellent accuracy of 83% in predicting prognosis. As a proof-of-concept study, our preliminary results demonstrate a practical application of DL in future practice of prognostic prediction using next-gen sequencing (NGS) data. △ Less

Submitted 30 October, 2018; originally announced October 2018.

Comments: 11 pages, 1 table, 1 figure. arXiv admin note: substantial text overlap with arXiv:1801.01019

Showing 1–50 of 60 results for author: Ríos, A