Search | arXiv e-print repository

doi 10.3847/PSJ/ad5b5e

Geophysical Observations of the 24 September 2023 OSIRIS-REx Sample Return Capsule Re-Entry

Authors: Elizabeth A. Silber, Daniel C. Bowman, Chris G. Carr, David P. Eisenberg, Brian R. Elbing, Benjamin Fernando, Milton A. Garcés, Robert Haaser, Siddharth Krishnamoorthy, Charles A. Langston, Yasuhiro Nishikawa, Jeremy Webster, Jacob F. Anderson, Stephen Arrowsmith, Sonia Bazargan, Luke Beardslee, Brant Beck, Jordan W. Bishop, Philip Blom, Grant Bracht, David L. Chichester, Anthony Christe, Kenneth Cummins, James Cutts, Lisa Danielson , et al. (57 additional authors not shown)

Abstract: Sample Return Capsules (SRCs) entering Earth's atmosphere at hypervelocity from interplanetary space are a valuable resource for studying meteor phenomena. The 24 September 2023 arrival of the OSIRIS-REx (Origins, Spectral Interpretation, Resource Identification, and Security-Regolith Explorer) SRC provided an unprecedented chance for geophysical observations of a well-characterized source with kn… ▽ More Sample Return Capsules (SRCs) entering Earth's atmosphere at hypervelocity from interplanetary space are a valuable resource for studying meteor phenomena. The 24 September 2023 arrival of the OSIRIS-REx (Origins, Spectral Interpretation, Resource Identification, and Security-Regolith Explorer) SRC provided an unprecedented chance for geophysical observations of a well-characterized source with known parameters, including timing and trajectory. A collaborative effort involving researchers from 16 institutions executed a carefully planned geophysical observational campaign at strategically chosen locations, deploying over 400 ground-based sensors encompassing infrasound, seismic, distributed acoustic sensing (DAS), and GPS technologies. Additionally, balloons equipped with infrasound sensors were launched to capture signals at higher altitudes. This campaign (the largest of its kind so far) yielded a wealth of invaluable data anticipated to fuel scientific inquiry for years to come. The success of the observational campaign is evidenced by the near-universal detection of signals across instruments, both proximal and distal. This paper presents a comprehensive overview of the collective scientific effort, field deployment, and preliminary findings. The early findings have the potential to inform future space missions and terrestrial campaigns, contributing to our understanding of meteoroid interactions with planetary atmospheres. Furthermore, the dataset collected during this campaign will improve entry and propagation models as well as augment the study of atmospheric dynamics and shock phenomena generated by meteoroids and similar sources. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 87 pages, 14 figures

arXiv:2407.01757 [pdf, other]

Distributed Instruments for Planetary Surface Science: Scientific Opportunities and Technology Feasibility

Authors: Federico Rossi, Robert C. Anderson, Saptarshi Bandyopadhyay, Erik Brandon, Ashish Goel, Joshua Vander Hook, Michael Mischna, Michaela Villarreal, Mark Wronkiewicz

Abstract: In this paper, we assess the scientific promise and technology feasibility of distributed instruments for planetary science. A distributed instrument is an instrument designed to collect spatially and temporally correlated data from multiple networked, geographically distributed point sensors. Distributed instruments are ubiquitous in Earth science, where they are routinely employed for weather an… ▽ More In this paper, we assess the scientific promise and technology feasibility of distributed instruments for planetary science. A distributed instrument is an instrument designed to collect spatially and temporally correlated data from multiple networked, geographically distributed point sensors. Distributed instruments are ubiquitous in Earth science, where they are routinely employed for weather and climate science, seismic studies and resource prospecting, and detection of industrial emissions. However, to date, their adoption in planetary surface science has been minimal. It is natural to ask whether this lack of adoption is driven by low potential to address high-priority questions in planetary science; immature technology; or both. To address this question, we survey high-priority planetary science questions that are uniquely well-suited to distributed instruments. We identify four areas of research where distributed instruments hold promise to unlock answers that are largely inaccessible to monolithic sensors, namely, weather and climate studies of Mars; localization of seismic events on rocky and icy bodies; localization of trace gas emissions, primarily on Mars; and magnetometry studies of internal composition. Next, we survey enabling technologies for distributed sensors and assess their maturity. We identify sensor placement (including descent and landing on planetary surfaces), power, and instrument autonomy as three key areas requiring further investment to enable future distributed instruments. Overall, this work shows that distributed instruments hold great promise for planetary science, and paves the way for follow-on studies of future distributed instruments for Solar System in-situ science. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.08931 [pdf, other]

Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning

Authors: Arnav Goel, Medha Hira, Anubha Gupta

Abstract: Advent of modern deep learning techniques has given rise to advancements in the field of Speech Emotion Recognition (SER). However, most systems prevalent in the field fail to generalize to speakers not seen during training. This study focuses on handling challenges of multilingual SER, specifically on unseen speakers. We introduce CAMuLeNet, a novel architecture leveraging co-attention based fusi… ▽ More Advent of modern deep learning techniques has given rise to advancements in the field of Speech Emotion Recognition (SER). However, most systems prevalent in the field fail to generalize to speakers not seen during training. This study focuses on handling challenges of multilingual SER, specifically on unseen speakers. We introduce CAMuLeNet, a novel architecture leveraging co-attention based fusion and multitask learning to address this problem. Additionally, we benchmark pretrained encoders of Whisper, HuBERT, Wav2Vec2.0, and WavLM using 10-fold leave-speaker-out cross-validation on five existing multilingual benchmark datasets: IEMOCAP, RAVDESS, CREMA-D, EmoDB and CaFE and, release a novel dataset for SER on the Hindi language (BhavVani). CAMuLeNet shows an average improvement of approximately 8% over all benchmarks on unseen speakers determined by our cross-validation strategy. △ Less

Submitted 19 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: 5 pages, Accepted to INTERSPEECH 2024. The first two authors contributed equally

arXiv:2406.00022 [pdf, other]

Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning

Authors: Arnav Goel, Medha Hira, Anubha Gupta

Abstract: The field of prosody transfer in speech synthesis systems is rapidly advancing. This research is focused on evaluating learning methods for adapting pre-trained monolingual text-to-speech (TTS) models to multilingual conditions, i.e., Supervised Fine-Tuning (SFT) and Transfer Learning (TL). This comparison utilizes three distinct metrics: Mean Opinion Score (MOS), Recognition Accuracy (RA), and Me… ▽ More The field of prosody transfer in speech synthesis systems is rapidly advancing. This research is focused on evaluating learning methods for adapting pre-trained monolingual text-to-speech (TTS) models to multilingual conditions, i.e., Supervised Fine-Tuning (SFT) and Transfer Learning (TL). This comparison utilizes three distinct metrics: Mean Opinion Score (MOS), Recognition Accuracy (RA), and Mel Cepstral Distortion (MCD). Results demonstrate that, in comparison to SFT, TL leads to significantly enhanced performance, with an average MOS higher by 1.53 points, a 37.5% increase in RA, and approximately a 7.8-point improvement in MCD. These findings are instrumental in hel** build TTS models for low-resource languages. △ Less

Submitted 18 June, 2024; v1 submitted 23 May, 2024; originally announced June 2024.

Comments: 7 pages, Accepted to ICLR 2024 - Tiny Track

arXiv:2406.00021 [pdf, other]

CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning

Authors: Medha Hira, Arnav Goel, Anubha Gupta

Abstract: This paper presents CrossVoice, a novel cascade-based Speech-to-Speech Translation (S2ST) system employing advanced ASR, MT, and TTS technologies with cross-lingual prosody preservation through transfer learning. We conducted comprehensive experiments comparing CrossVoice with direct-S2ST systems, showing improved BLEU scores on tasks such as Fisher Es-En, VoxPopuli Fr-En and prosody preservation… ▽ More This paper presents CrossVoice, a novel cascade-based Speech-to-Speech Translation (S2ST) system employing advanced ASR, MT, and TTS technologies with cross-lingual prosody preservation through transfer learning. We conducted comprehensive experiments comparing CrossVoice with direct-S2ST systems, showing improved BLEU scores on tasks such as Fisher Es-En, VoxPopuli Fr-En and prosody preservation on benchmark datasets CVSS-T and IndicTTS. With an average mean opinion score of 3.75 out of 4, speech synthesized by CrossVoice closely rivals human speech on the benchmark, highlighting the efficacy of cascade-based systems and transfer learning in multilingual S2ST with prosody transfer. △ Less

Submitted 18 June, 2024; v1 submitted 23 May, 2024; originally announced June 2024.

Comments: 8 pages, Accepted at ICLR 2024 - Tiny Track

arXiv:2405.20917 [pdf, other]

Learning to Estimate System Specifications in Linear Temporal Logic using Transformers and Mamba

Authors: İlker Işık, Ebru Aydin Gol, Ramazan Gokberk Cinbis

Abstract: Temporal logic is a framework for representing and reasoning about propositions that evolve over time. It is commonly used for specifying requirements in various domains, including hardware and software systems, as well as robotics. Specification mining or formula generation involves extracting temporal logic formulae from system traces and has numerous applications, such as detecting bugs and imp… ▽ More Temporal logic is a framework for representing and reasoning about propositions that evolve over time. It is commonly used for specifying requirements in various domains, including hardware and software systems, as well as robotics. Specification mining or formula generation involves extracting temporal logic formulae from system traces and has numerous applications, such as detecting bugs and improving interpretability. Although there has been a surge of deep learning-based methods for temporal logic satisfiability checking in recent years, the specification mining literature has been lagging behind in adopting deep learning methods despite their many advantages, such as scalability. In this paper, we introduce autoregressive models that can generate linear temporal logic formulae from traces, towards addressing the specification mining problem. We propose multiple architectures for this task: transformer encoder-decoder, decoder-only transformer, and Mamba, which is an emerging alternative to transformer models. Additionally, we devise a metric for quantifying the distinctiveness of the generated formulae and a straightforward algorithm to enforce the syntax constraints. Our experiments show that the proposed architectures yield promising results, generating correct and distinct formulae at a fraction of the compute cost needed for the combinatorial baseline. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 20 pages, 15 figures

arXiv:2405.19631 [pdf, other]

Leveraging Open-Source Large Language Models for encoding Social Determinants of Health using an Intelligent Router

Authors: Akul Goel, Surya Narayanan Hari, Belinda Waltman, Matt Thomson

Abstract: Social Determinants of Health (SDOH) play a significant role in patient health outcomes. The Center of Disease Control (CDC) introduced a subset of ICD-10 codes called Z-codes in an attempt to officially recognize and measure SDOH in the health care system. However, these codes are rarely annotated in a patient's Electronic Health Record (EHR), and instead, in many cases, need to be inferred from… ▽ More Social Determinants of Health (SDOH) play a significant role in patient health outcomes. The Center of Disease Control (CDC) introduced a subset of ICD-10 codes called Z-codes in an attempt to officially recognize and measure SDOH in the health care system. However, these codes are rarely annotated in a patient's Electronic Health Record (EHR), and instead, in many cases, need to be inferred from clinical notes. Previous research has shown that large language models (LLMs) show promise on extracting unstructured data from EHRs. However, with thousands of models to choose from with unique architectures and training sets, it's difficult to choose one model that performs the best on coding tasks. Further, clinical notes contain trusted health information making the use of closed-source language models from commercial vendors difficult, so the identification of open source LLMs that can be run within health organizations and exhibits high performance on SDOH tasks is an urgent problem. Here, we introduce an intelligent routing system for SDOH coding that uses a language model router to direct medical record data to open source LLMs that demonstrate optimal performance on specific SDOH codes. The intelligent routing system exhibits state of the art performance of 97.4% accuracy averaged across 5 codes, including homelessness and food insecurity, on par with closed models such as GPT-4o. In order to train the routing system and validate models, we also introduce a synthetic data generation and validation paradigm to increase the scale of training data without needing privacy protected medical records. Together, we demonstrate an architecture for intelligent routing of inputs to task-optimal language models to achieve high performance across a set of medical coding sub-tasks. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.16355 [pdf, other]

Navigating AI Fallibility: Examining People's Reactions and Perceptions of AI after Encountering Personality Misrepresentations

Authors: Qiaosi Wang, Chidimma L. Anyi, Vedant Das Swain, Ashok K. Goel

Abstract: Many hyper-personalized AI systems profile people's characteristics (e.g., personality traits) to provide personalized recommendations. These systems are increasingly used to facilitate interactions among people, such as providing teammate recommendations. Despite improved accuracy, such systems are not immune to errors when making inferences about people's most personal traits. These errors manif… ▽ More Many hyper-personalized AI systems profile people's characteristics (e.g., personality traits) to provide personalized recommendations. These systems are increasingly used to facilitate interactions among people, such as providing teammate recommendations. Despite improved accuracy, such systems are not immune to errors when making inferences about people's most personal traits. These errors manifested as AI misrepresentations. However, the repercussions of such AI misrepresentations are unclear, especially on people's reactions and perceptions of the AI. We present two studies to examine how people react and perceive the AI after encountering personality misrepresentations in AI-facilitated team matching in a higher education context. Through semi-structured interviews (n=20) and a survey experiment (n=198), we pinpoint how people's existing and newly acquired AI knowledge could shape their perceptions and reactions of the AI after encountering AI misrepresentations. Specifically, we identified three rationales that people adopted through knowledge acquired from AI (mis)representations: AI works like a machine, human, and/or magic. These rationales are highly connected to people's reactions of over-trusting, rationalizing, and forgiving of AI misrepresentations. Finally, we found that people's existing AI knowledge, i.e., AI literacy, could moderate people's changes in their trust in AI after encountering AI misrepresentations, but not changes in people's social perceptions of AI. We discuss the role of people's AI knowledge when facing AI fallibility and implications for designing responsible mitigation and repair strategies. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 37 pages, 11 figures

ACM Class: I.2.0

arXiv:2405.11775 [pdf, other]

Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques

Authors: Siva Rajesh Kasa, Aniket Goel, Karan Gupta, Sumegh Roychowdhury, Anish Bhanushali, Nikhil Pattisapu, Prasanna Srinivasa Murthy

Abstract: Ordinal Classification (OC) is a widely encountered challenge in Natural Language Processing (NLP), with applications in various domains such as sentiment analysis, rating prediction, and more. Previous approaches to tackle OC have primarily focused on modifying existing or creating novel loss functions that \textbf{explicitly} account for the ordinal nature of labels. However, with the advent of… ▽ More Ordinal Classification (OC) is a widely encountered challenge in Natural Language Processing (NLP), with applications in various domains such as sentiment analysis, rating prediction, and more. Previous approaches to tackle OC have primarily focused on modifying existing or creating novel loss functions that \textbf{explicitly} account for the ordinal nature of labels. However, with the advent of Pretrained Language Models (PLMs), it became possible to tackle ordinality through the \textbf{implicit} semantics of the labels as well. This paper provides a comprehensive theoretical and empirical examination of both these approaches. Furthermore, we also offer strategic recommendations regarding the most effective approach to adopt based on specific settings. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: Findings of ACL 2024

arXiv:2405.11070 [pdf, other]

Jill Watson: A Virtual Teaching Assistant powered by ChatGPT

Authors: Karan Taneja, Pratyusha Maiti, Sandeep Kakar, Pranav Guruprasad, Sanjeev Rao, Ashok K. Goel

Abstract: Conversational AI agents often require extensive datasets for training that are not publicly released, are limited to social chit-chat or handling a specific domain, and may not be easily extended to accommodate the latest advances in AI technologies. This paper introduces Jill Watson, a conversational Virtual Teaching Assistant (VTA) leveraging the capabilities of ChatGPT. Jill Watson based on Ch… ▽ More Conversational AI agents often require extensive datasets for training that are not publicly released, are limited to social chit-chat or handling a specific domain, and may not be easily extended to accommodate the latest advances in AI technologies. This paper introduces Jill Watson, a conversational Virtual Teaching Assistant (VTA) leveraging the capabilities of ChatGPT. Jill Watson based on ChatGPT requires no prior training and uses a modular design to allow the integration of new APIs using a skill-based architecture inspired by XiaoIce. Jill Watson is also well-suited for intelligent textbooks as it can process and converse using multiple large documents. We exclusively utilize publicly available resources for reproducibility and extensibility. Comparative analysis shows that our system outperforms the legacy knowledge-based Jill Watson as well as the OpenAI Assistants service. We employ many safety measures that reduce instances of hallucinations and toxicity. The paper also includes real-world examples from a classroom setting that demonstrate different features of Jill Watson and its effectiveness. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.05572 [pdf, other]

From Human Judgements to Predictive Models: Unravelling Acceptability in Code-Mixed Sentences

Authors: Prashant Kodali, Anmol Goel, Likhith Asapu, Vamshi Krishna Bonagiri, Anirudh Govil, Monojit Choudhury, Manish Shrivastava, Ponnurangam Kumaraguru

Abstract: Current computational approaches for analysing or generating code-mixed sentences do not explicitly model "naturalness" or "acceptability" of code-mixed sentences, but rely on training corpora to reflect distribution of acceptable code-mixed sentences. Modelling human judgement for the acceptability of code-mixed text can help in distinguishing natural code-mixed text and enable quality-controlled… ▽ More Current computational approaches for analysing or generating code-mixed sentences do not explicitly model "naturalness" or "acceptability" of code-mixed sentences, but rely on training corpora to reflect distribution of acceptable code-mixed sentences. Modelling human judgement for the acceptability of code-mixed text can help in distinguishing natural code-mixed text and enable quality-controlled generation of code-mixed text. To this end, we construct Cline - a dataset containing human acceptability judgements for English-Hindi (en-hi) code-mixed text. Cline is the largest of its kind with 16,642 sentences, consisting of samples sourced from two sources: synthetically generated code-mixed text and samples collected from online social media. Our analysis establishes that popular code-mixing metrics such as CMI, Number of Switch Points, Burstines, which are used to filter/curate/compare code-mixed corpora have low correlation with human acceptability judgements, underlining the necessity of our dataset. Experiments using Cline demonstrate that simple Multilayer Perceptron (MLP) models trained solely on code-mixing metrics are outperformed by fine-tuned pre-trained Multilingual Large Language Models (MLLMs). Specifically, XLM-Roberta and Bernice outperform IndicBERT across different configurations in challenging data settings. Comparison with ChatGPT's zero and fewshot capabilities shows that MLLMs fine-tuned on larger data outperform ChatGPT, providing scope for improvement in code-mixed tasks. Zero-shot transfer from English-Hindi to English-Telugu acceptability judgments using our model checkpoints proves superior to random baselines, enabling application to other code-mixed language pairs and providing further avenues of research. We publicly release our human-annotated dataset, trained checkpoints, code-mix corpus, and code for data generation and model training. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.03162 [pdf, other]

Advancing Multimodal Medical Capabilities of Gemini

Authors: Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng , et al. (22 additional authors not shown)

Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop… ▽ More Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.07616 [pdf, other]

Audio Dialogues: Dialogues dataset for audio and music understanding

Authors: Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro

Abstract: Existing datasets for audio understanding primarily focus on single-turn interactions (i.e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue. To address this gap, we introduce Audio Dialogues: a multi-turn dialogue dataset containing 163.8k samples for general audio sounds and music. In addition to dial… ▽ More Existing datasets for audio understanding primarily focus on single-turn interactions (i.e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue. To address this gap, we introduce Audio Dialogues: a multi-turn dialogue dataset containing 163.8k samples for general audio sounds and music. In addition to dialogues, Audio Dialogues also has question-answer pairs to understand and compare multiple input audios together. Audio Dialogues leverages a prompting-based approach and caption annotations from existing datasets to generate multi-turn dialogues using a Large Language Model (LLM). We evaluate existing audio-augmented large language models on our proposed dataset to demonstrate the complexity and applicability of Audio Dialogues. Our code for generating the dataset will be made publicly available. Detailed prompts and generated dialogues can be found on the demo website https://audiodialogues.github.io/. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Demo website: https://audiodialogues.github.io/

arXiv:2404.04338 [pdf, ps, other]

doi 10.1016/j.ifacol.2021.08.510

Optimal Policy Synthesis from A Sequence of Goal Sets with An Application to Electric Distribution System Restoration

Authors: İlker Işık, Onur Yigit Arpali, Ebru Aydin Gol

Abstract: Motivated by the post-disaster distribution system restoration problem, in this paper, we study the problem of synthesizing the optimal policy for a Markov Decision Process (MDP) from a sequence of goal sets. For each goal set, our aim is to both maximize the probability to reach and minimize the expected time to reach the goal set. The order of the goal sets represents their priority. In particul… ▽ More Motivated by the post-disaster distribution system restoration problem, in this paper, we study the problem of synthesizing the optimal policy for a Markov Decision Process (MDP) from a sequence of goal sets. For each goal set, our aim is to both maximize the probability to reach and minimize the expected time to reach the goal set. The order of the goal sets represents their priority. In particular, our aim is to generate a policy that is optimal with respect to the first goal set, and it is optimal with respect to the second goal set among the policies that are optimal with respect to the first goal set and so on. To synthesize such a policy, we iteratively filter the applicable actions according to the goal sets. We illustrate the developed method over sample distribution systems and disaster scenarios. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 7th ADHS 2021 Conference Paper

Journal ref: IFAC-PapersOnLine Volume 54, Issue 5, 2021, Pages 271-276

arXiv:2404.04087 [pdf, other]

doi 10.1016/j.ress.2024.110050

Field Teams Coordination for Earthquake-Damaged Distribution System Energization

Authors: İlker Işık, Ebru Aydin Gol

Abstract: The re-energization of electrical distribution systems in a post-disaster scenario is of grave importance as most modern infrastructure systems rely heavily on the presence of electricity. This paper introduces a method to coordinate the field teams for the optimal energization of an electrical distribution system after an earthquake-induced blackout. The proposed method utilizes a Markov Decision… ▽ More The re-energization of electrical distribution systems in a post-disaster scenario is of grave importance as most modern infrastructure systems rely heavily on the presence of electricity. This paper introduces a method to coordinate the field teams for the optimal energization of an electrical distribution system after an earthquake-induced blackout. The proposed method utilizes a Markov Decision Process (MDP) to create an optimal energization strategy, which aims to minimize the expected time to energize each distribution system component. The travel duration of each team and the possible outcomes of the energization attempts are considered in the state transitions. The failure probabilities of the system components are computed using the fragility curves of structures and the Peak Ground Acceleration (PGA) values which are encoded to the MDP model via transition probabilities. Furthermore, the proposed solution offers several methods to determine the non-optimal actions during the construction of the MDP and eliminate them in order to improve the run-time performance without sacrificing the optimality of the solution. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: Accepted manuscript, published in Reliability Engineering & System Safety

Journal ref: Reliability Engineering & System Safety Volume 245, May 2024, 110050

arXiv:2403.18333 [pdf, other]

Quantum gravity of the Heisenberg algebra

Authors: Ahmed Almheiri, Akash Goel, Xu-Yao Hu

Abstract: We consider a simplified model of double scaled SYK (DSSYK) in which the Hamiltonian is the position operator of the Harmonic oscillator. This model captures the high temperature limit of DSSYK but could also be defined as a quantum theory in its own right. We study properties of the emergent geometry including its dynamics in response to inserting matter particles. In particular, we find that the… ▽ More We consider a simplified model of double scaled SYK (DSSYK) in which the Hamiltonian is the position operator of the Harmonic oscillator. This model captures the high temperature limit of DSSYK but could also be defined as a quantum theory in its own right. We study properties of the emergent geometry including its dynamics in response to inserting matter particles. In particular, we find that the model displays de Sitter-like properties such as that infalling matter reduces the rate of growth of geodesic slices between the two boundaries. The simplicity of the model allows us to compute the full generating functional for correlation functions of the length mode or any number of matter operators. We provide evidence that the effective action of the geodesic length between boundary points is non-local. Furthermore, we use the on-shell solution for the geodesic lengths between any two boundary points to reconstruct an effective bulk metric and reverse engineer the dilaton gravity theory that generates this metric as a solution. △ Less

Submitted 16 May, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: 30 pages + appendices; v2: typos corrected, references added

arXiv:2403.03029 [pdf, other]

Socratic Reasoning Improves Positive Text Rewriting

Authors: Anmol Goel, Nico Daheim, Iryna Gurevych

Abstract: Reframing a negative into a positive thought is at the crux of several cognitive approaches to mental health and psychotherapy that could be made more accessible by large language model-based solutions. Such reframing is typically non-trivial and requires multiple rationalization steps to uncover the underlying issue of a negative thought and transform it to be more positive. However, this rationa… ▽ More Reframing a negative into a positive thought is at the crux of several cognitive approaches to mental health and psychotherapy that could be made more accessible by large language model-based solutions. Such reframing is typically non-trivial and requires multiple rationalization steps to uncover the underlying issue of a negative thought and transform it to be more positive. However, this rationalization process is currently neglected by both datasets and models which reframe thoughts in one step. In this work, we address this gap by augmenting open-source datasets for positive text rewriting with synthetically-generated Socratic rationales using a novel framework called \textsc{SocraticReframe}. \textsc{SocraticReframe} uses a sequence of question-answer pairs to rationalize the thought rewriting process. We show that such Socratic rationales significantly improve positive text rewriting for different open-source LLMs according to both automatic and human evaluations guided by criteria from psychotherapy research. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.00826 [pdf, other]

LLMGuard: Guarding Against Unsafe LLM Behavior

Authors: Shubh Goyal, Medha Hira, Shubham Mishra, Sukriti Goyal, Arnav Goel, Niharika Dadu, Kirushikesh DB, Sameep Mehta, Nishtha Madaan

Abstract: Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content aga… ▽ More Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content against specific behaviours or conversation topics. To do this robustly, LLMGuard employs an ensemble of detectors. △ Less

Submitted 27 February, 2024; originally announced March 2024.

Comments: accepted in demonstration track of AAAI-24

arXiv:2402.10567 [pdf, other]

InSaAF: Incorporating Safety through Accuracy and Fairness | Are LLMs ready for the Indian Legal Domain?

Authors: Yogesh Tripathi, Raghav Donakanti, Sahil Girhepuje, Ishan Kavathekar, Bhaskara Hanuma Vedula, Gokul S Krishnan, Shreya Goyal, Anmol Goel, Balaraman Ravindran, Ponnurangam Kumaraguru

Abstract: Recent advancements in language technology and Artificial Intelligence have resulted in numerous Language Models being proposed to perform various tasks in the legal domain ranging from predicting judgments to generating summaries. Despite their immense potential, these models have been proven to learn and exhibit societal biases and make unfair predictions. In this study, we explore the ability o… ▽ More Recent advancements in language technology and Artificial Intelligence have resulted in numerous Language Models being proposed to perform various tasks in the legal domain ranging from predicting judgments to generating summaries. Despite their immense potential, these models have been proven to learn and exhibit societal biases and make unfair predictions. In this study, we explore the ability of Large Language Models (LLMs) to perform legal tasks in the Indian landscape when social factors are involved. We present a novel metric, $β$-weighted $\textit{Legal Safety Score ($LSS_β$)}$, which encapsulates both the fairness and accuracy aspects of the LLM. We assess LLMs' safety by considering its performance in the $\textit{Binary Statutory Reasoning}$ task and its fairness exhibition with respect to various axes of disparities in the Indian society. Task performance and fairness scores of LLaMA and LLaMA--2 models indicate that the proposed $LSS_β$ metric can effectively determine the readiness of a model for safe usage in the legal sector. We also propose finetuning pipelines, utilising specialised legal datasets, as a potential method to mitigate bias and improve model safety. The finetuning procedures on LLaMA and LLaMA--2 models increase the $LSS_β$, improving their usability in the Indian legal domain. Our code is publicly released. △ Less

Submitted 17 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.03717 [pdf, ps, other]

Retrospective Cost-based Extremum Seeking Control with Vanishing Perturbation for Online Output Minimization

Authors: Juan A. Paredes, Jhon Manuel Portella, Dennis S. Bernstein, Ankit Goel

Abstract: Extremum seeking control (ESC) constitutes a powerful technique for online optimization with theoretical guarantees for convergence to the neighborhood of the optimizer under well-understood conditions. However, ESC requires a nonconstant perturbation signal to provide persistent excitation to the target system to yield convergent results, which usually results in steady state oscillations. While… ▽ More Extremum seeking control (ESC) constitutes a powerful technique for online optimization with theoretical guarantees for convergence to the neighborhood of the optimizer under well-understood conditions. However, ESC requires a nonconstant perturbation signal to provide persistent excitation to the target system to yield convergent results, which usually results in steady state oscillations. While certain techniques have been proposed to eliminate perturbations once the neighborhood of the minimizer is reached, system modifications and environmental perturbations can suddenly change the minimizer and nonconstant perturbations would once more be required to convergence to the new minimizer. Hence, this paper develops a retrospective cost-based ESC(RC/ESC) technique for online output minimization with a vanishing perturbation, that is, a perturbation that becomes zero as time increases independently from the state of the controller or the controlled system. The performance of the proposed algorithm is illustrated via numerical examples. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03709 [pdf, ps, other]

Adaptive Backstep** Control of a Bicopter in Pure Feedback Form with Dynamic Extension

Authors: Jhon Manuel Portella Delgado, Mohammad Mirtaba, Ankit Goel

Abstract: This paper presents a model-based, adaptive, nonlinear controller for the bicopter stabilization and trajectory-tracking problem. The nonlinear controller is designed using the backstep** technique. Due to the non-invertibility of the input map, the bicopter system is first dynamically extended. However, the resulting dynamically extended system is in the pure feedback form with the uncertainty… ▽ More This paper presents a model-based, adaptive, nonlinear controller for the bicopter stabilization and trajectory-tracking problem. The nonlinear controller is designed using the backstep** technique. Due to the non-invertibility of the input map, the bicopter system is first dynamically extended. However, the resulting dynamically extended system is in the pure feedback form with the uncertainty appearing in the input map. The adaptive backstep** technique is then extended and applied to design the controller. The proposed controller is validated in simulation for a smooth and nonsmooth trajectory-tracking problem. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:2305.03554

arXiv:2402.01831 [pdf, other]

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Authors: Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei **, Rafael Valle, Bryan Catanzaro

Abstract: Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) stro… ▽ More Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) strong multi-turn dialogue abilities. We introduce a series of training techniques, architecture design, and data strategies to enhance our model with these abilities. Extensive evaluations across various audio understanding tasks confirm the efficacy of our method, setting new state-of-the-art benchmarks. Our demo website is https://audioflamingo.github.io/ and the code is open-sourced at https://github.com/NVIDIA/audio-flamingo. △ Less

Submitted 28 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: ICML 2024

arXiv:2401.16920 [pdf, other]

Sparse Portfolio Selection via Topological Data Analysis based Clustering

Authors: Anubha Goel, Damir Filipović, Puneet Pasricha

Abstract: This paper uses topological data analysis (TDA) tools and introduces a data-driven clustering-based stock selection strategy tailored for sparse portfolio construction. Our asset selection strategy exploits the topological features of stock price movements to select a subset of topologically similar (different) assets for a sparse index tracking (Markowitz) portfolio. We introduce new distance mea… ▽ More This paper uses topological data analysis (TDA) tools and introduces a data-driven clustering-based stock selection strategy tailored for sparse portfolio construction. Our asset selection strategy exploits the topological features of stock price movements to select a subset of topologically similar (different) assets for a sparse index tracking (Markowitz) portfolio. We introduce new distance measures, which serve as an input to the clustering algorithm, on the space of persistence diagrams and landscapes that consider the time component of a time series. We conduct an empirical analysis on the S\&P index from 2009 to 2020, including a study on the COVID-19 data to validate the robustness of our methodology. Our strategy to integrate TDA with the clustering algorithm significantly enhanced the performance of sparse portfolios across various performance measures in diverse market scenarios. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.13092 [pdf, ps, other]

Retrospective Cost Attitude Filtering with Noisy Measurements and Unknown Gyro Bias

Authors: Parham Oveissi, Ankit Goel

Abstract: Attitude filtering is a critical technology with applications in diverse domains such as aerospace engineering, robotics, computer vision, and augmented reality. Although attitude filtering is a particular case of the state estimation problem, attitude filtering is uniquely challenging due to the special geometric structure of the attitude parameterization. This paper presents a novel data-driven… ▽ More Attitude filtering is a critical technology with applications in diverse domains such as aerospace engineering, robotics, computer vision, and augmented reality. Although attitude filtering is a particular case of the state estimation problem, attitude filtering is uniquely challenging due to the special geometric structure of the attitude parameterization. This paper presents a novel data-driven attitude filter, called the retrospective cost attitude filter (RCAF), for the SO(3) attitude representation. Like the multiplicative extended Kalman filter, RCAF uses a multiplicative correction signal, but instead of computing correction gains using Jacobians, RCAF computes the corrective signal using retrospective cost optimization and measured data. The RCAF filter is validated numerically in a scenario with noisy attitude measurements and noisy and biased rate-gyro measurements. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.12423 [pdf, other]

doi 10.1609/icwsm.v18i1.31326

Rank, Pack, or Approve: Voting Methods in Participatory Budgeting

Authors: Lodewijk Gelauff, Ashish Goel

Abstract: Participatory budgeting is a popular method to engage residents in budgeting decisions by local governments. The Stanford Participatory Budgeting platform is an online platform that has been used to engage residents in more than 150 budgeting processes. We present a data set with anonymized budget opinions from these processes with K-approval, K-ranking or knapsack primary ballots. For a subset of… ▽ More Participatory budgeting is a popular method to engage residents in budgeting decisions by local governments. The Stanford Participatory Budgeting platform is an online platform that has been used to engage residents in more than 150 budgeting processes. We present a data set with anonymized budget opinions from these processes with K-approval, K-ranking or knapsack primary ballots. For a subset of the voters, it includes paired votes with a different elicitation method in the same process. This presents a unique data set, as the voters, projects and setting are all related to real-world decisions that the voters have an actual interest in. With data from primary ballots we find that while ballot complexity (number of projects to choose from, number of projects to select and ballot length) is correlated with a higher median time spent by voters, it is not correlated with a higher abandonment rate. We use vote pairs with different voting methods to analyze the effect of voting methods on the cost of selected projects, more comprehensively than was previously possible. In most elections, voters selected significantly more expensive projects using K-approval than using knapsack, although we also find a small number of examples with a significant effect in the opposite direction. This effect happens at the aggregate level as well as for individual voters, and is influenced both by the implicit constraints of the voting method and the explicit constraints of the voting interface. Finally, we validate the use of K-ranking elicitation to offer a paper alternative for knapsack voting. △ Less

Submitted 25 March, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: Accepted for publication at ICWSM. Data set is available through: https://doi.org/10.25740/db709zg9088

Journal ref: Proceedings of the International AAAI Conference on Web and Social Media, 18 (2024) 448-461

arXiv:2401.05467 [pdf, other]

Active Label Correction for Building LLM-based Modular AI Systems

Authors: Karan Taneja, Ashok Goel

Abstract: Large Language Models (LLMs) have been used to build modular AI systems such as HuggingGPT, Microsoft Bing Chat, and more. To improve such systems after deployment using the data collected from human interactions, each module can be replaced by a fine-tuned model but the annotations received from LLMs are low quality. We propose that active label correction can be used to improve the data quality… ▽ More Large Language Models (LLMs) have been used to build modular AI systems such as HuggingGPT, Microsoft Bing Chat, and more. To improve such systems after deployment using the data collected from human interactions, each module can be replaced by a fine-tuned model but the annotations received from LLMs are low quality. We propose that active label correction can be used to improve the data quality by only examining a fraction of the dataset. In this paper, we analyze the noise in datasets annotated by ChatGPT and study denoising it with human feedback. Our results show that active label correction can lead to oracle performance with feedback on fewer examples than the number of noisy examples in the dataset across three different NLP tasks. △ Less

Submitted 17 May, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

arXiv:2312.06871 [pdf, other]

Using Analytics on Student Created Data to Content Validate Pedagogical Tools

Authors: John Kos, Kenneth Eaton, Sareen Zhang, Rahul Dass, Stephen Buckley, Sungeun An, Ashok Goel

Abstract: Conceptual and simulation models can function as useful pedagogical tools, however it is important to categorize different outcomes when evaluating them in order to more meaningfully interpret results. VERA is a ecology-based conceptual modeling software that enables users to simulate interactions between biotics and abiotics in an ecosystem, allowing users to form and then verify hypothesis throu… ▽ More Conceptual and simulation models can function as useful pedagogical tools, however it is important to categorize different outcomes when evaluating them in order to more meaningfully interpret results. VERA is a ecology-based conceptual modeling software that enables users to simulate interactions between biotics and abiotics in an ecosystem, allowing users to form and then verify hypothesis through observing a time series of the species populations. In this paper, we classify this time series into common patterns found in the domain of ecological modeling through two methods, hierarchical clustering and curve fitting, illustrating a general methodology for showing content validity when combining different pedagogical tools. When applied to a diverse sample of 263 models containing 971 time series collected from three different VERA user categories: a Georgia Tech (GATECH), North Georgia Technical College (NGTC), and ``Self Directed Learners'', results showed agreement between both classification methods on 89.38\% of the sample curves in the test set. This serves as a good indication that our methodology for determining content validity was successful. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 16 pages, preprint

arXiv:2312.04994 [pdf, ps, other]

Numerical determination of iron dust laminar flame speeds with the counterflow twin-flame technique

Authors: C. E. A. G. van Gool, T. Hazenberg, J. A. van Oijen, L. P. H. de Goey

Abstract: Iron dust counter-flow flames have been studied with the low-Mach-number combustion approximation. The model considers full coupling between the two phases, including particle/droplet drag. The dispersed phase flow strain relations are derived under the assumption of low Reynolds number conditions. The importance of solving a particle flow strain model is demonstrated by comparing three different… ▽ More Iron dust counter-flow flames have been studied with the low-Mach-number combustion approximation. The model considers full coupling between the two phases, including particle/droplet drag. The dispersed phase flow strain relations are derived under the assumption of low Reynolds number conditions. The importance of solving a particle flow strain model is demonstrated by comparing three different models: a free unstrained flame, a counter-flow flame where particle flow strain is assumed equal to gas flow strain and one case in which the particle flow strain is solved. All three cases showed preferential diffusion effects, due to the lack of diffusion of iron in the fuel mixture, e.g. DFe,m = 0. The preferential diffusion effect causes a peak in the fuel equivalence ratio in the preheat zone. At the burned side, the combined effect of strain and preferential diffusion showed a decrease in fuel equivalence ratio. Inertia effects, which are only included in the resolved particle flow strain case, counteract this effect and result in an increase of the fuel equivalence ratio at the burned side. A laminar flame speed analysis is performed and a recommendation is given on how to experimentally determine the flame speed in a counter-flow set-up. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: 20 pages, 11 figures

arXiv:2312.02296 [pdf, other]

LLMs Accelerate Annotation for Medical Information Extraction

Authors: Akshay Goel, Almog Gueta, Omry Gilon, Chang Liu, Sofia Erell, Lan Huong Nguyen, Xiaohong Hao, Bolous Jaber, Shashir Reddy, Rupesh Kartha, Jean Steiner, Itay Laish, Amir Feder

Abstract: The unstructured nature of clinical notes within electronic health records often conceals vital patient-related information, making it challenging to access or interpret. To uncover this hidden information, specialized Natural Language Processing (NLP) models are required. However, training these models necessitates large amounts of labeled data, a process that is both time-consuming and costly wh… ▽ More The unstructured nature of clinical notes within electronic health records often conceals vital patient-related information, making it challenging to access or interpret. To uncover this hidden information, specialized Natural Language Processing (NLP) models are required. However, training these models necessitates large amounts of labeled data, a process that is both time-consuming and costly when relying solely on human experts for annotation. In this paper, we propose an approach that combines Large Language Models (LLMs) with human expertise to create an efficient method for generating ground truth labels for medical text annotation. By utilizing LLMs in conjunction with human annotators, we significantly reduce the human annotation burden, enabling the rapid creation of labeled datasets. We rigorously evaluate our method on a medical information extraction task, demonstrating that our approach not only substantially cuts down on human intervention but also maintains high accuracy. The results highlight the potential of using LLMs to improve the utilization of unstructured clinical data, allowing for the swift deployment of tailored NLP solutions in healthcare. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Published in proceedings of the Machine Learning for Health (ML4H) Symposium 2023

arXiv:2311.17405 [pdf, other]

Learning and Autonomy for Extraterrestrial Terrain Sampling: An Experience Report from OWLAT Deployment

Authors: Pranay Thangeda, Ashish Goel, Erica Tevere, Yifan Zhu, Erik Kramer, Adriana Daca, Hari Nayar, Kris Hauser, Melkior Ornik

Abstract: Extraterrestrial autonomous lander missions increasingly demand adaptive capabilities to handle the unpredictable and diverse nature of the terrain. This paper discusses the deployment of a Deep Meta-Learning with Controlled Deployment Gaps (CoDeGa) trained model for terrain scoo** tasks in Ocean Worlds Lander Autonomy Testbed (OWLAT) at NASA Jet Propulsion Laboratory. The CoDeGa-powered scoopin… ▽ More Extraterrestrial autonomous lander missions increasingly demand adaptive capabilities to handle the unpredictable and diverse nature of the terrain. This paper discusses the deployment of a Deep Meta-Learning with Controlled Deployment Gaps (CoDeGa) trained model for terrain scoo** tasks in Ocean Worlds Lander Autonomy Testbed (OWLAT) at NASA Jet Propulsion Laboratory. The CoDeGa-powered scoo** strategy is designed to adapt to novel terrains, selecting scoo** actions based on the available RGB-D image data and limited experience. The paper presents our experiences with transferring the scoo** framework with CoDeGa-trained model from a low-fidelity testbed to the high-fidelity OWLAT testbed. Additionally, it validates the method's performance in novel, realistic environments, and shares the lessons learned from deploying learning-based autonomy algorithms for space exploration. Experimental results from OWLAT substantiate the efficacy of CoDeGa in rapidly adapting to unfamiliar terrains and effectively making autonomous decisions under considerable domain shifts, thereby endorsing its potential utility in future extraterrestrial missions. △ Less

Submitted 4 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: Updated references to include recent work on autonomy for ocean worlds

arXiv:2311.07060 [pdf, ps, other]

Arithmetic of semisubtractive semidomains

Authors: Hannah Fox, Agastya Goel, Sophia Liao

Abstract: A subset $S$ of an integral domain is called a semidomain if the pairs $(S,+)$ and $(S\setminus\{0\}, \cdot)$ are commutative and cancellative semigroups with identities. The multiplication of $S$ extends to the group of differences $\mathscr{G}(S)$, turning $\mathscr{G}(S)$ into an integral domain. In this paper, we study the arithmetic of semisubtractive semidomains (i.e., semidomains $S$ for wh… ▽ More A subset $S$ of an integral domain is called a semidomain if the pairs $(S,+)$ and $(S\setminus\{0\}, \cdot)$ are commutative and cancellative semigroups with identities. The multiplication of $S$ extends to the group of differences $\mathscr{G}(S)$, turning $\mathscr{G}(S)$ into an integral domain. In this paper, we study the arithmetic of semisubtractive semidomains (i.e., semidomains $S$ for which either $s \in S$ or $-s \in S$ for every $s \in \mathscr{G}(S)$). Specifically, we provide necessary and sufficient conditions for a semisubtractive semidomain to be atomic, to satisfy the ascending chain condition on principals ideals, to be a bounded factorization semidomain, and to be a finite factorization semidomain, which are subsequent relaxations of the property of having unique factorizations. In addition, we present a characterization of factorial and half-factorial semisubtractive semidomains. Throughout the article, we present examples to provide insight into the arithmetic aspects of semisubtractive semidomains. △ Less

Submitted 28 November, 2023; v1 submitted 12 November, 2023; originally announced November 2023.

Comments: 15 pages

MSC Class: Primary: 16Y60; 11C08; Secondary: 20M13; 13F05

arXiv:2311.05779 [pdf, other]

Language-guided Robot Gras**: CLIP-based Referring Grasp Synthesis in Clutter

Authors: Georgios Tziafas, Yucheng Xu, Arushi Goel, Mohammadreza Kasaei, Zhibin Li, Hamidreza Kasaei

Abstract: Robots operating in human-centric environments require the integration of visual grounding and gras** capabilities to effectively manipulate objects based on user instructions. This work focuses on the task of referring grasp synthesis, which predicts a grasp pose for an object referred through natural language in cluttered scenes. Existing approaches often employ multi-stage pipelines that firs… ▽ More Robots operating in human-centric environments require the integration of visual grounding and gras** capabilities to effectively manipulate objects based on user instructions. This work focuses on the task of referring grasp synthesis, which predicts a grasp pose for an object referred through natural language in cluttered scenes. Existing approaches often employ multi-stage pipelines that first segment the referred object and then propose a suitable grasp, and are evaluated in private datasets or simulators that do not capture the complexity of natural indoor scenes. To address these limitations, we develop a challenging benchmark based on cluttered indoor scenes from OCID dataset, for which we generate referring expressions and connect them with 4-DoF grasp poses. Further, we propose a novel end-to-end model (CROG) that leverages the visual grounding capabilities of CLIP to learn grasp synthesis directly from image-text pairs. Our results show that vanilla integration of CLIP with pretrained models transfers poorly in our challenging benchmark, while CROG achieves significant improvements both in terms of grounding and gras**. Extensive robot experiments in both simulation and hardware demonstrate the effectiveness of our approach in challenging interactive object gras** scenarios that include clutter. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: Poster CoRL 2023. Dataset and code available here: https://github.com/gtziafas/OCID-VLG

arXiv:2310.13619 [pdf, other]

Semi-supervised multimodal coreference resolution in image narrations

Authors: Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

Abstract: In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i.e., a narration is paired with an image. This poses significant challenges due to fine-grained image-text alignment, inherent ambiguity present in narrative language, and unavailability of large annotated training sets. To tackle these challenges, we present a data efficient semi-supervised a… ▽ More In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i.e., a narration is paired with an image. This poses significant challenges due to fine-grained image-text alignment, inherent ambiguity present in narrative language, and unavailability of large annotated training sets. To tackle these challenges, we present a data efficient semi-supervised approach that utilizes image-narration pairs to resolve coreferences and narrative grounding in a multimodal context. Our approach incorporates losses for both labeled and unlabeled data within a cross-modal framework. Our evaluation shows that the proposed approach outperforms strong baselines both quantitatively and qualitatively, for the tasks of coreference resolution and narrative grounding. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: Long paper at EMNLP'23-Main

arXiv:2310.11643 [pdf, other]

Opinion Change or Differential Turnout: Changing Opinions on the Austin Police Department in a Budget Feedback Process

Authors: Lodewijk L. Gelauff, Ashish Goel

Abstract: In 2020 the tragic murder of George Floyd at the hands of law enforcement ignited and intensified nationwide protests, demanding changes in police funding and allocation. This happened during a budgeting feedback exercise where residents of Austin, Texas were invited to share opinions on the budgets of various city service areas, including the Police Department, on an online platform designed by o… ▽ More In 2020 the tragic murder of George Floyd at the hands of law enforcement ignited and intensified nationwide protests, demanding changes in police funding and allocation. This happened during a budgeting feedback exercise where residents of Austin, Texas were invited to share opinions on the budgets of various city service areas, including the Police Department, on an online platform designed by our team. Daily responses increased by a hundredfold and responses registered after the "exogenous shock" overwhelmingly advocated for reducing police funding. This opinion shift far exceeded what we observed in 14 other Participatory Budgeting elections on our Participatory Budgeting Platform, and can't be explained by shifts in the respondent demographics. Analysis of the results from an Austin budgetary feedback exercise in 2021 and a follow-up survey indicates that the opinion shift from 2020 persisted, with the opinion gap on police funding widening. We conclude that there was an actual change of opinion regarding police funding. This study not only sheds light on the enduring impact of the 2020 events and protests on public opinion, but also showcases the value of analysis of clustered opinions as a tool in the evaluation toolkit of survey organizers. △ Less

Submitted 16 January, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: This preprint is an extended version of a previously published conference paper: https://dl.acm.org/doi/10.1145/3551624.3555295

arXiv:2310.09578 [pdf, other]

Sparse Index Tracking via Topological Learning

Authors: Anubha Goel, Puneet Pasricha, Juho Kanniainen

Abstract: In this research, we introduce a novel methodology for the index tracking problem with sparse portfolios by leveraging topological data analysis (TDA). Utilizing persistence homology to measure the riskiness of assets, we introduce a topological method for data-driven learning of the parameters for regularization terms. Specifically, the Vietoris-Rips filtration method is utilized to capture the i… ▽ More In this research, we introduce a novel methodology for the index tracking problem with sparse portfolios by leveraging topological data analysis (TDA). Utilizing persistence homology to measure the riskiness of assets, we introduce a topological method for data-driven learning of the parameters for regularization terms. Specifically, the Vietoris-Rips filtration method is utilized to capture the intricate topological features of asset movements, providing a robust framework for portfolio tracking. Our approach has the advantage of accommodating both $\ell_1$ and $\ell_2$ penalty terms without the requirement for expensive estimation procedures. We empirically validate the performance of our methodology against state-of-the-art sparse index tracking techniques, such as Elastic-Net and SLOPE, using a dataset that covers 23 years of S&P500 index and its constituent data. Our out-of-sample results show that this computationally efficient technique surpasses conventional methods across risk metrics, risk-adjusted performance, and trading expenses in varied market conditions. Furthermore, in turbulent markets, it not only maintains but also enhances tracking performance. △ Less

Submitted 14 October, 2023; originally announced October 2023.

arXiv:2309.13450 [pdf]

Conducting A/B Experiments with a Scalable Architecture

Authors: Andrew Hornback, Sungeun An, Scott Bunin, Stephen Buckley, John Kos, Ashok Goel

Abstract: A/B experiments are commonly used in research to compare the effects of changing one or more variables in two different experimental groups - a control group and a treatment group. While the benefits of using A/B experiments are widely known and accepted, there is less agreement on a principled approach to creating software infrastructure systems to assist in rapidly conducting such experiments. W… ▽ More A/B experiments are commonly used in research to compare the effects of changing one or more variables in two different experimental groups - a control group and a treatment group. While the benefits of using A/B experiments are widely known and accepted, there is less agreement on a principled approach to creating software infrastructure systems to assist in rapidly conducting such experiments. We propose a four-principle approach for develo** a software architecture to support A/B experiments that is domain agnostic and can help alleviate some of the resource constraints currently needed to successfully implement these experiments: the software architecture (i) must retain the typical properties of A/B experiments, (ii) capture problem solving activities and outcomes, (iii) allow researchers to understand the behavior and outcomes of participants in the experiment, and (iv) must enable automated analysis. We successfully developed a software system to encapsulate these principles and implement it in a real-world A/B experiment. △ Less

Submitted 23 September, 2023; originally announced September 2023.

arXiv:2308.00813 [pdf]

Designing a Communication Bridge between Communities: Participatory Design for a Question-Answering AI Agent

Authors: Jeonghyun Lee, Vrinda Nandan, Harshvardhan Sikka, Spencer Rugaber, Ashok Goel

Abstract: How do we design an AI system that is intended to act as a communication bridge between two user communities with different mental models and vocabularies? Skillsync is an interactive environment that engages employers (companies) and training providers (colleges) in a sustained dialogue to help them achieve the goal of building a training proposal that successfully meets the needs of the employer… ▽ More How do we design an AI system that is intended to act as a communication bridge between two user communities with different mental models and vocabularies? Skillsync is an interactive environment that engages employers (companies) and training providers (colleges) in a sustained dialogue to help them achieve the goal of building a training proposal that successfully meets the needs of the employers and employees. We used a variation of participatory design to elicit requirements for develo** AskJill, a question-answering agent that explains how Skillsync works and thus acts as a communication bridge between company and college users. Our study finds that participatory design was useful in guiding the requirements gathering and eliciting user questions for the development of AskJill. Our results also suggest that the two Skillsync user communities perceived glossary assistance as a key feature that AskJill needs to offer, and they would benefit from such a shared vocabulary. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.15275 [pdf, ps, other]

Computing Invariant Zeros of a Linear System Using State-Space Realization

Authors: Jhon Manuel Portella Delgado, Ankit Goel

Abstract: It is well known that zeros and poles of a single-input, single-output system in the transfer function form are the roots of the transfer function's numerator and the denominator polynomial, respectively. However, in the state-space form, where the poles are a subset of the eigenvalue of the dynamics matrix and thus can be computed by solving an eigenvalue problem, the computation of zeros is a no… ▽ More It is well known that zeros and poles of a single-input, single-output system in the transfer function form are the roots of the transfer function's numerator and the denominator polynomial, respectively. However, in the state-space form, where the poles are a subset of the eigenvalue of the dynamics matrix and thus can be computed by solving an eigenvalue problem, the computation of zeros is a non-trivial problem. This paper presents a realization of a linear system that allows the computation of invariant zeros by solving a simple eigenvalue problem. The result is valid for square multi-input, multi-output (MIMO) systems, is unaffected by lack of observability or controllability, and is easily extended to wide MIMO systems. Finally, the paper illuminates the connection between the zero-subspace form and the normal form to conclude that zeros are the poles of the system's zero dynamics △ Less

Submitted 5 February, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

arXiv:2307.05538 [pdf, other]

Advancements in Scientific Controllable Text Generation Methods

Authors: Arnav Goel, Medha Hira, Avinash Anand, Siddhesh Bangar, Dr. Rajiv Ratn Shah

Abstract: The previous work on controllable text generation is organized using a new schema we provide in this study. Seven components make up the schema, and each one is crucial to the creation process. To accomplish controlled generation for scientific literature, we describe the various modulation strategies utilised to modulate each of the seven components. We also offer a theoretical study and qualitat… ▽ More The previous work on controllable text generation is organized using a new schema we provide in this study. Seven components make up the schema, and each one is crucial to the creation process. To accomplish controlled generation for scientific literature, we describe the various modulation strategies utilised to modulate each of the seven components. We also offer a theoretical study and qualitative examination of these methods. This insight makes possible new architectures based on combinations of these components. Future research will compare these methods empirically to learn more about their strengths and utility. △ Less

Submitted 8 July, 2023; originally announced July 2023.

arXiv:2306.17674 [pdf, other]

X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

Authors: Mehrad Moradshahi, Tianhao Shen, Kalika Bali, Monojit Choudhury, Gaël de Chalendar, Anmol Goel, Sungkyun Kim, Prashant Kodali, Ponnurangam Kumaraguru, Nasredine Semmar, Sina J. Semnani, Jiwon Seo, Vivek Seshadri, Manish Shrivastava, Michael Sun, Aditya Yadavalli, Chaobin You, Deyi Xiong, Monica S. Lam

Abstract: Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-H… ▽ More Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-Hindi language. X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents. The many difficulties we encountered in creating X-RiSAWOZ led us to develop a toolset to accelerate the post-editing of a new language dataset after translation. This toolset improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks. We establish strong baselines for X-RiSAWOZ by training dialogue agents in the zero- and few-shot settings where limited gold data is available in the target language. Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset, code, and toolkit are released open-source. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: Accepted by ACL 2023 Findings

arXiv:2306.10243 [pdf, ps, other]

Central limit theorem for the complex eigenvalues of Gaussian random matrices

Authors: Advay Goel, Patrick Lopatto, Xiaoyu Xie

Abstract: We establish a central limit theorem for the eigenvalue counting function of a matrix of real Gaussian random variables. We establish a central limit theorem for the eigenvalue counting function of a matrix of real Gaussian random variables. △ Less

Submitted 8 March, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: 15 pages. To appear in Electronic Communications in Probability

arXiv:2306.09224 [pdf, other]

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

Authors: Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

Abstract: We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evi… ▽ More We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evidence to support each answer. Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13.0% accuracy on our dataset. Moreover, we experimentally show that progress on answering our encyclopedic questions can be achieved by augmenting large models with a mechanism that retrieves relevant information from the knowledge base. An oracle experiment with perfect retrieval achieves 87.0% accuracy on the single-hop portion of our dataset, and an automatic retrieval-augmented prototype yields 48.8%. We believe that our dataset enables future research on retrieval-augmented vision+language models. It is available at https://github.com/google-research/google-research/tree/master/encyclopedic_vqa . △ Less

Submitted 24 July, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: ICCV'23

arXiv:2305.11296 [pdf, other]

A Mechanism for Participatory Budgeting With Funding Constraints and Project Interactions

Authors: Mohak Goyal, Sahasrajit Sarmasarkar, Ashish Goel

Abstract: Participatory budgeting (PB) has been widely adopted and has attracted significant research efforts; however, there is a lack of mechanisms for PB which elicit project interactions, such as substitution and complementarity, from voters. Also, the outcomes of PB in practice are subject to various minimum/maximum funding constraints on 'types' of projects. We propose a novel preference elicitation s… ▽ More Participatory budgeting (PB) has been widely adopted and has attracted significant research efforts; however, there is a lack of mechanisms for PB which elicit project interactions, such as substitution and complementarity, from voters. Also, the outcomes of PB in practice are subject to various minimum/maximum funding constraints on 'types' of projects. We propose a novel preference elicitation scheme for PB which allows voters to express how their utilities from projects within 'groups' interact. We consider preference aggregation done under minimum and maximum funding constraints on 'types' of projects, where a project can have multiple type labels as long as this classification can be defined by a 1-laminar structure (henceforth called 1-laminar funding constraints). Overall, we extend the Knapsack voting model of Goel et al. [26] in two ways - enriching the preference elicitation scheme to include project interactions and generalizing the preference aggregation scheme to include 1-laminar funding constraints. We show that the strategyproofness results of Goel et al. [26] for Knapsack voting continue to hold under 1-laminar funding constraints. Moreover, when the funding constraints cannot be described by a 1-laminar structure, strategyproofness does not hold. Although project interactions often break the strategyproofness, we study a special case of vote profiles where truthful voting is a Nash equilibrium under substitution project interactions. We then study the computational complexity of preference aggregation. Social welfare maximization under project interactions is NP-hard. As a workaround for practical instances, we give a fixed parameter tractable (FPT) algorithm for social welfare maximization with respect to the maximum number of projects in a group when the overall budget is specified in a fixed number of bits. △ Less

Submitted 14 July, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.05015 [pdf, other]

A Low-Mass Helium Star Progenitor Model for the Type Ibn SN 2020nxt

Authors: Qinan Wang, Anika Goel, Luc Dessart, Ori D. Fox, Melissa Shahbandeh, Sofia Rest, Armin Rest, Jose H. Groh, Andrew Allan, Claes Fransson, Nathan Smith, Griffin Hosseinzadeh, Alexei V. Filippenko, Jennifer Andrews, K. Azalee Bostroem, Thomas G. Brink, Peter Brown, Jamison Burke, Roger Chevalier, Geoffrey C. Clayton, Mi Dai, Kyle W. Davis, Ryan J. Foley, Sebastian Gomez, Chelsea Harris , et al. (33 additional authors not shown)

Abstract: A growing number of supernovae (SNe) are now known to exhibit evidence for significant interaction with a dense, pre-existing, circumstellar medium (CSM). SNe Ibn comprise one such class that can be characterised by both rapidly evolving light curves and persistent narrow He I lines. The origin of such a dense CSM in these systems remains a pressing question, specifically concerning the progenitor… ▽ More A growing number of supernovae (SNe) are now known to exhibit evidence for significant interaction with a dense, pre-existing, circumstellar medium (CSM). SNe Ibn comprise one such class that can be characterised by both rapidly evolving light curves and persistent narrow He I lines. The origin of such a dense CSM in these systems remains a pressing question, specifically concerning the progenitor system and mass-loss mechanism. In this paper, we present multi-wavelength data of the Type Ibn SN 2020nxt, including $HST$/STIS ultraviolet spectra. We fit the data with recently updated CMFGEN models designed to handle configurations for SNe Ibn. The UV coverage yields strong constraints on the energetics and, when combined with the CMFGEN models, offer new insight on potential progenitor systems. We find the most successful model is a $\lesssim4 {\rm M}_\odot$ helium star that lost its $\sim 1\,{\rm M}_\odot$ He-rich envelope in the years preceding core collapse. We also consider viable alternatives, such as a He white dwarf merger. Ultimately, we conclude at least some SNe Ibn do not arise from single, massive ($>30 {\rm M}_\odot$) Wolf-Rayet-like stars. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: 17 pages, 13 figures, 1 table, submitted to MNRAS

arXiv:2305.03554 [pdf, ps, other]

Adaptive Nonlinear Control of a Bicopter with Unknown Dynamics

Authors: Jhon Manuel Portella Delgado, Ankit Goel

Abstract: This paper presents an adaptive, model-based, nonlinear controller for the bicopter trajectory-tracking problem. The nonlinear controller is constructed by dynamically extending the bicopter model, stabilizing the extended dynamics using input-output linearization, augmenting the controller with a finite-time convergent parameter estimator, and designing a linear tracking controller. Unlike contro… ▽ More This paper presents an adaptive, model-based, nonlinear controller for the bicopter trajectory-tracking problem. The nonlinear controller is constructed by dynamically extending the bicopter model, stabilizing the extended dynamics using input-output linearization, augmenting the controller with a finite-time convergent parameter estimator, and designing a linear tracking controller. Unlike control systems based on the time separation principle to separate the translational and rotational dynamics, the proposed technique is applied to design a controller for the full nonlinear dynamics of the system to obtain the desired transient performance. The proposed controller is validated in simulation for a smooth and nonsmooth trajectory-tracking problem. △ Less

Submitted 7 February, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

arXiv:2304.10634 [pdf, other]

Experimental Flight Testing of an Adaptive Autopilot with Parameter Drift Mitigation

Authors: Yin Yong Chee, Parham Oveissi, Siyuan Shao, Joonghyun Lee, Juan A. Paredes, Dennis S. Bernstein, Ankit Goel

Abstract: This paper modifies an adaptive multicopter autopilot to mitigate instabilities caused by adaptive parameter drift and presents simulation and experimental results to validate the modified autopilot. The modified adaptive controller is obtained by including a static nonlinearity in the adaptive loop, updated by the retrospective cost adaptive control algorithm. It is shown in simulation and physic… ▽ More This paper modifies an adaptive multicopter autopilot to mitigate instabilities caused by adaptive parameter drift and presents simulation and experimental results to validate the modified autopilot. The modified adaptive controller is obtained by including a static nonlinearity in the adaptive loop, updated by the retrospective cost adaptive control algorithm. It is shown in simulation and physical test experiments that the adaptive autopilot with proposed modifications can continually improve the fixed-gain autopilot as well as prevent the drift of the adaptive parameters, thus improving the robustness of the adaptive autopilot. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 6 pages, 16 figures, submitted to IROS 2023

arXiv:2304.02730 [pdf, ps, other]

Fair Ordering via Streaming Social Choice Theory

Authors: Geoffrey Ramseyer, Ashish Goel

Abstract: Prior work studies the question of ``fairly'' ordering transactions in a replicated state machine. Each of $n$ replicas receives transactions in a possibly different order, and the system must aggregate the observed orderings into a single order. We argue that this problem is best viewed through the lens of social choice theory, in which (in the preference aggregation problem) rankings on candidat… ▽ More Prior work studies the question of ``fairly'' ordering transactions in a replicated state machine. Each of $n$ replicas receives transactions in a possibly different order, and the system must aggregate the observed orderings into a single order. We argue that this problem is best viewed through the lens of social choice theory, in which (in the preference aggregation problem) rankings on candidates are aggregated into an election result. Two features make this problem novel. First, the number of transactions is unbounded, and an ordering must be defined over a countably infinite set. And second, decisions must be made quickly, with only partial information. Additionally, some faulty replicas might alter their reported observations; their influence on the output should be bounded and well understood. Prior work studies a ``$γ$-batch-order-fairness'' property, which divides an ordering into contiguous batches. If a $γ$ fraction of replicas receive $τ$ before $τ^\prime$, then $τ^\prime$ cannot be in an earlier batch than $τ$. We strengthen this definition to require that batches have minimal size ($γ$-batch-order-fairness can be vacuously satisfied by large batches) while accounting for the possibility of faulty replicas. This social choice lens enables an ordering protocol with strictly stronger fairness and liveness properties than prior work. We study the Ranked Pairs method. Analysis of how missing information moves through the algorithm allows our streaming version to know when it can output a transaction. Deliberate construction of a tiebreaking rule ensures our algorithm outputs a transaction after a bounded time (in a synchronous network). Prior work relies on a fixed choice of $γ$ and bound on the number of faulty replicas $f$, but our algorithm satisfies our definition for every $\frac{1}{2}<γ\leq 1$ simultaneously and for any $f$. △ Less

Submitted 27 February, 2024; v1 submitted 5 April, 2023; originally announced April 2023.

arXiv:2303.07476 [pdf, other]

Challenges and Practices of Deep Learning Model Reengineering: A Case Study on Computer Vision

Authors: Wenxin Jiang, Vishnu Banna, Naveen Vivek, Abhinav Goel, Nicholas Synovic, George K. Thiruvathukal, James C. Davis

Abstract: Many engineering organizations are reimplementing and extending deep neural networks from the research community. We describe this process as deep learning model reengineering. Deep learning model reengineering - reusing, reproducing, adapting, and enhancing state-of-the-art deep learning approaches - is challenging for reasons including under-documented reference models, changing requirements, an… ▽ More Many engineering organizations are reimplementing and extending deep neural networks from the research community. We describe this process as deep learning model reengineering. Deep learning model reengineering - reusing, reproducing, adapting, and enhancing state-of-the-art deep learning approaches - is challenging for reasons including under-documented reference models, changing requirements, and the cost of implementation and testing. In addition, individual engineers may lack expertise in software engineering, yet teams must apply knowledge of software engineering and deep learning to succeed. Prior work has examined on DL systems from a "product" view, examining defects from projects regardless of the engineers' purpose. Our study is focused on reengineering activities from a "process" view, and focuses on engineers specifically engaged in the reengineering process. Our goal is to understand the characteristics and challenges of deep learning model reengineering. We conducted a case study of this phenomenon, focusing on the context of computer vision. Our results draw from two data sources: defects reported in open-source reeengineering projects, and interviews conducted with open-source project contributors and the leaders of a reengineering team. Our results describe how deep learning-based computer vision techniques are reengineered, analyze the distribution of defects in this process, and discuss challenges and practices. Integrating our quantitative and qualitative data, we proposed a novel reengineering workflow. Our findings inform several future directions, including: measuring additional unknown aspects of model reengineering; standardizing engineering practices to facilitate reengineering; and develo** tools to support model reengineering and model reuse. △ Less

Submitted 25 August, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: Under submission to EMSE

arXiv:2303.07247 [pdf]

Are Models Trained on Indian Legal Data Fair?

Authors: Sahil Girhepuje, Anmol Goel, Gokul S Krishnan, Shreya Goyal, Satyendra Pandey, Ponnurangam Kumaraguru, Balaraman Ravindran

Abstract: Recent advances and applications of language technology and artificial intelligence have enabled much success across multiple domains like law, medical and mental health. AI-based Language Models, like Judgement Prediction, have recently been proposed for the legal sector. However, these models are strife with encoded social biases picked up from the training data. While bias and fairness have bee… ▽ More Recent advances and applications of language technology and artificial intelligence have enabled much success across multiple domains like law, medical and mental health. AI-based Language Models, like Judgement Prediction, have recently been proposed for the legal sector. However, these models are strife with encoded social biases picked up from the training data. While bias and fairness have been studied across NLP, most studies primarily locate themselves within a Western context. In this work, we present an initial investigation of fairness from the Indian perspective in the legal domain. We highlight the propagation of learnt algorithmic biases in the bail prediction task for models trained on Hindi legal documents. We evaluate the fairness gap using demographic parity and show that a decision tree model trained for the bail prediction task has an overall fairness disparity of 0.237 between input features associated with Hindus and Muslims. Additionally, we highlight the need for further research and studies in the avenues of fairness/bias in applying AI in the legal sector with a specific focus on the Indian context. △ Less

Submitted 14 May, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: Presented at the Symposium on AI and Law (SAIL) 2023

arXiv:2303.05323 [pdf, other]

Controllable Video Generation by Learning the Underlying Dynamical System with Neural ODE

Authors: Yucheng Xu, Li Nanbo, Arushi Goel, Zijian Guo, Zonghai Yao, Hamidreza Kasaei, Mohammadreze Kasaei, Zhibin Li

Abstract: Videos depict the change of complex dynamical systems over time in the form of discrete image sequences. Generating controllable videos by learning the dynamical system is an important yet underexplored topic in the computer vision community. This paper presents a novel framework, TiV-ODE, to generate highly controllable videos from a static image and a text caption. Specifically, our framework le… ▽ More Videos depict the change of complex dynamical systems over time in the form of discrete image sequences. Generating controllable videos by learning the dynamical system is an important yet underexplored topic in the computer vision community. This paper presents a novel framework, TiV-ODE, to generate highly controllable videos from a static image and a text caption. Specifically, our framework leverages the ability of Neural Ordinary Differential Equations~(Neural ODEs) to represent complex dynamical systems as a set of nonlinear ordinary differential equations. The resulting framework is capable of generating videos with both desired dynamics and content. Experiments demonstrate the ability of the proposed method in generating highly controllable and visually consistent videos, and its capability of modeling dynamical systems. Overall, this work is a significant step towards develo** advanced controllable video generation models that can handle complex and dynamic scenes. △ Less

Submitted 4 April, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

Showing 1–50 of 223 results for author: Goel, A