Search | arXiv e-print repository

Situational Instructions Database: Task Guidance in Dynamic Environments

Authors: Muhammad Saif Ullah Khan, Sankalp Sinha, Didier Stricker, Muhammad Zeshan Afzal

Abstract: The Situational Instructions Database (SID) addresses the need for enhanced situational awareness in artificial intelligence (AI) systems operating in dynamic environments. By integrating detailed scene graphs with dynamically generated, task-specific instructions, SID provides a novel dataset that allows AI systems to perform complex, real-world tasks with improved context sensitivity and operati… ▽ More The Situational Instructions Database (SID) addresses the need for enhanced situational awareness in artificial intelligence (AI) systems operating in dynamic environments. By integrating detailed scene graphs with dynamically generated, task-specific instructions, SID provides a novel dataset that allows AI systems to perform complex, real-world tasks with improved context sensitivity and operational accuracy. This dataset leverages advanced generative models to simulate a variety of realistic scenarios based on the 3D Semantic Scene Graphs (3DSSG) dataset, enriching it with scenario-specific information that details environmental interactions and tasks. SID facilitates the development of AI applications that can adapt to new and evolving conditions without extensive retraining, supporting research in autonomous technology and AI-driven decision-making processes. This dataset is instrumental in develo** robust, context-aware AI agents capable of effectively navigating and responding to unpredictable settings. Available for research and development, SID serves as a critical resource for advancing the capabilities of intelligent systems in complex environments. Dataset available at \url{https://github.com/mindgarage/situational-instructions-database}. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 9 pages, 6 figures

arXiv:2406.10764 [pdf, other]

GNOME: Generating Negotiations through Open-Domain Map** of Exchanges

Authors: Darshan Deshpande, Shambhavi Sinha, Anirudh Ravi Kumar, Debaditya Pal, Jonathan May

Abstract: Language Models have previously shown strong negotiation capabilities in closed domains where the negotiation strategy prediction scope is constrained to a specific setup. In this paper, we first show that these models are not generalizable beyond their original training domain despite their wide-scale pretraining. Following this, we propose an automated framework called GNOME, which processes exi… ▽ More Language Models have previously shown strong negotiation capabilities in closed domains where the negotiation strategy prediction scope is constrained to a specific setup. In this paper, we first show that these models are not generalizable beyond their original training domain despite their wide-scale pretraining. Following this, we propose an automated framework called GNOME, which processes existing human-annotated, closed-domain datasets using Large Language Models and produces synthetic open-domain dialogues for negotiation. GNOME improves the generalizability of negotiation systems while reducing the expensive and subjective task of manual data curation. Through our experimental setup, we create a benchmark comparing encoder and decoder models trained on existing datasets against datasets created through GNOME. Our results show that models trained on our dataset not only perform better than previous state of the art models on domain specific strategy prediction, but also generalize better to previously unseen domains. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10247 [pdf, other]

QCQA: Quality and Capacity-aware grouped Query Attention

Authors: Vinay Joshi, Prashant Laddha, Shambhavi Sinha, Om Ji Omer, Sreenivas Subramoney

Abstract: Excessive memory requirements of key and value features (KV-cache) present significant challenges in the autoregressive inference of large language models (LLMs), restricting both the speed and length of text generation. Approaches such as Multi-Query Attention (MQA) and Grouped Query Attention (GQA) mitigate these challenges by grou** query heads and consequently reducing the number of correspo… ▽ More Excessive memory requirements of key and value features (KV-cache) present significant challenges in the autoregressive inference of large language models (LLMs), restricting both the speed and length of text generation. Approaches such as Multi-Query Attention (MQA) and Grouped Query Attention (GQA) mitigate these challenges by grou** query heads and consequently reducing the number of corresponding key and value heads. However, MQA and GQA decrease the KV-cache size requirements at the expense of LLM accuracy (quality of text generation). These methods do not ensure an optimal tradeoff between KV-cache size and text generation quality due to the absence of quality-aware grou** of query heads. To address this issue, we propose Quality and Capacity-Aware Grouped Query Attention (QCQA), which identifies optimal query head grou**s using an evolutionary algorithm with a computationally efficient and inexpensive fitness function. We demonstrate that QCQA achieves a significantly better tradeoff between KV-cache capacity and LLM accuracy compared to GQA. For the Llama2 $7\,$B model, QCQA achieves $\mathbf{20}$\% higher accuracy than GQA with similar KV-cache size requirements in the absence of fine-tuning. After fine-tuning both QCQA and GQA, for a similar KV-cache size, QCQA provides $\mathbf{10.55}\,$\% higher accuracy than GQA. Furthermore, QCQA requires $40\,$\% less KV-cache size than GQA to attain similar accuracy. The proposed quality and capacity-aware grou** of query heads can serve as a new paradigm for KV-cache optimization in autoregressive LLM inference. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.08787 [pdf, ps, other]

A Survey on Compositional Learning of AI Models: Theoretical and Experimetnal Practices

Authors: Sania Sinha, Tanawan Premsri, Parisa Kordjamshidi

Abstract: Compositional learning, mastering the ability to combine basic concepts and construct more intricate ones, is crucial for human cognition, especially in human language comprehension and visual perception. This notion is tightly connected to generalization over unobserved situations. Despite its integral role in intelligence, there is a lack of systematic theoretical and experimental research metho… ▽ More Compositional learning, mastering the ability to combine basic concepts and construct more intricate ones, is crucial for human cognition, especially in human language comprehension and visual perception. This notion is tightly connected to generalization over unobserved situations. Despite its integral role in intelligence, there is a lack of systematic theoretical and experimental research methodologies, making it difficult to analyze the compositional learning abilities of computational models. In this paper, we survey the literature on compositional learning of AI models and the connections made to cognitive studies. We identify abstract concepts of compositionality in cognitive and linguistic studies and connect these to the computational challenges faced by language and vision models in compositional reasoning. We overview the formal definitions, tasks, evaluation benchmarks, variety of computational models, and theoretical findings. We cover modern studies on large language models to provide a deeper understanding of the cutting-edge compositional capabilities exhibited by state-of-the-art AI models and pinpoint important directions for future research. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.19653 [pdf, other]

SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems

Authors: Patrick Emami, Zhaonan Li, Saumya Sinha, Truc Nguyen

Abstract: Data-driven simulation surrogates help computational scientists study complex systems. They can also help inform impactful policy decisions. We introduce a learning framework for surrogate modeling where language is used to interface with the underlying system being simulated. We call a language description of a system a "system caption", or SysCap. To address the lack of datasets of paired natura… ▽ More Data-driven simulation surrogates help computational scientists study complex systems. They can also help inform impactful policy decisions. We introduce a learning framework for surrogate modeling where language is used to interface with the underlying system being simulated. We call a language description of a system a "system caption", or SysCap. To address the lack of datasets of paired natural language SysCaps and simulation runs, we use large language models (LLMs) to synthesize high-quality captions. Using our framework, we train multimodal text and timeseries regression models for two real-world simulators of complex energy systems. Our experiments demonstrate the feasibility of designing language interfaces for real-world surrogate models at comparable accuracy to standard baselines. We qualitatively and quantitatively show that SysCaps unlock text-prompt-style surrogate modeling and new generalization abilities beyond what was previously possible. We will release the generated SysCaps datasets and our code to support follow-on studies. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 17 pages. Under review

arXiv:2405.11446 [pdf, other]

MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning

Authors: Sanchit Sinha, Yuguang Yue, Victor Soto, Mayank Kulkarni, Jianhua Lu, Aidong Zhang

Abstract: Adapting large language models (LLMs) to unseen tasks with in-context training samples without fine-tuning remains an important research problem. To learn a robust LLM that adapts well to unseen tasks, multiple meta-training approaches have been proposed such as MetaICL and MetaICT, which involve meta-training pre-trained LLMs on a wide variety of diverse tasks. These meta-training approaches esse… ▽ More Adapting large language models (LLMs) to unseen tasks with in-context training samples without fine-tuning remains an important research problem. To learn a robust LLM that adapts well to unseen tasks, multiple meta-training approaches have been proposed such as MetaICL and MetaICT, which involve meta-training pre-trained LLMs on a wide variety of diverse tasks. These meta-training approaches essentially perform in-context multi-task fine-tuning and evaluate on a disjointed test set of tasks. Even though they achieve impressive performance, their goal is never to compute a truly general set of parameters. In this paper, we propose MAML-en-LLM, a novel method for meta-training LLMs, which can learn truly generalizable parameters that not only perform well on disjointed tasks but also adapts to unseen tasks. We see an average increase of 2% on unseen domains in the performance while a massive 4% improvement on adaptation performance. Furthermore, we demonstrate that MAML-en-LLM outperforms baselines in settings with limited amount of training data on both seen and unseen domains by an average of 2%. Finally, we discuss the effects of type of tasks, optimizers and task complexity, an avenue barely explored in meta-training literature. Exhaustive experiments across 7 task settings along with two data settings demonstrate that models trained with MAML-en-LLM outperform SOTA meta-training approaches. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: KDD 2024, 11 pages(9 main, 2 ref, 1 App) Openreview https://openreview.net/forum?id=JwecLNhWDy&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DKDD.org%2F2024%2FResearch_Track%2FAuthors%23your-submissions)

arXiv:2405.03660 [pdf, other]

CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification

Authors: Sankalp Sinha, Muhammad Saif Ullah Khan, Talha Uddin Sheikh, Didier Stricker, Muhammad Zeshan Afzal

Abstract: Zero-shot learning has been extensively investigated in the broader field of visual recognition, attracting significant interest recently. However, the current work on zero-shot learning in document image classification remains scarce. The existing studies either focus exclusively on zero-shot inference, or their evaluation does not align with the established criteria of zero-shot evaluation in th… ▽ More Zero-shot learning has been extensively investigated in the broader field of visual recognition, attracting significant interest recently. However, the current work on zero-shot learning in document image classification remains scarce. The existing studies either focus exclusively on zero-shot inference, or their evaluation does not align with the established criteria of zero-shot evaluation in the visual recognition domain. We provide a comprehensive document image classification analysis in Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL) settings to address this gap. Our methodology and evaluation align with the established practices of this domain. Additionally, we propose zero-shot splits for the RVL-CDIP dataset. Furthermore, we introduce CICA (pronounced 'ki-ka'), a framework that enhances the zero-shot learning capabilities of CLIP. CICA consists of a novel 'content module' designed to leverage any generic document-related textual information. The discriminative features extracted by this module are aligned with CLIP's text and image features using a novel 'coupled-contrastive' loss. Our module improves CLIP's ZSL top-1 accuracy by 6.7% and GZSL harmonic mean by 24% on the RVL-CDIP dataset. Our module is lightweight and adds only 3.3% more parameters to CLIP. Our work sets the direction for future research in zero-shot document classification. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 18 Pages, 4 Figures and Accepted in ICDAR 2024

arXiv:2405.00349 [pdf, other]

A Self-explaining Neural Architecture for Generalizable Concept Learning

Authors: Sanchit Sinha, Guangzhi Xiong, Aidong Zhang

Abstract: With the wide proliferation of Deep Neural Networks in high-stake applications, there is a growing demand for explainability behind their decision-making process. Concept learning models attempt to learn high-level 'concepts' - abstract entities that align with human understanding, and thus provide interpretability to DNN architectures. However, in this paper, we demonstrate that present SOTA conc… ▽ More With the wide proliferation of Deep Neural Networks in high-stake applications, there is a growing demand for explainability behind their decision-making process. Concept learning models attempt to learn high-level 'concepts' - abstract entities that align with human understanding, and thus provide interpretability to DNN architectures. However, in this paper, we demonstrate that present SOTA concept learning approaches suffer from two major problems - lack of concept fidelity wherein the models fail to learn consistent concepts among similar classes and limited concept interoperability wherein the models fail to generalize learned concepts to new domains for the same task. Kee** these in mind, we propose a novel self-explaining architecture for concept learning across domains which - i) incorporates a new concept saliency network for representative concept selection, ii) utilizes contrastive learning to capture representative domain invariant concepts, and iii) uses a novel prototype-based concept grounding regularization to improve concept alignment across domains. We demonstrate the efficacy of our proposed approach over current SOTA concept learning approaches on four widely used real-world datasets. Empirical results show that our method improves both concept fidelity measured through concept overlap and concept interoperability measured through domain adaptation performance. △ Less

Submitted 5 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: IJCAI 2024. 16 pages (7 main content, 2 references, 7 Appendix) Code available at https://github.com/sanchit97/secl

arXiv:2404.06405 [pdf, other]

Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry

Authors: Shiven Sinha, Ameya Prabhu, Ponnurangam Kumaraguru, Siddharth Bhat, Matthias Bethge

Abstract: Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 2… ▽ More Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 25 of 30 International Mathematical Olympiad (IMO) problems whereas the reported baseline based on Wu's method solved only ten. In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong. Wu's method alone can solve 15 problems, and some of them are not solved by any of the other methods. This leads to two key findings: (i) Combining Wu's method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem. Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu's method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu's method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist. △ Less

Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: Work in Progress. Released for wider feedback

arXiv:2403.18074 [pdf, other]

Every Shot Counts: Using Exemplars for Repetition Counting in Videos

Authors: Saptarshi Sinha, Alexandros Stergiou, Dima Damen

Abstract: Video repetition counting infers the number of repetitions of recurring actions or motion within a video. We propose an exemplar-based approach that discovers visual correspondence of video exemplars across repetitions within target videos. Our proposed Every Shot Counts (ESCounts) model is an attention-based encoder-decoder that encodes videos of varying lengths alongside exemplars from the same… ▽ More Video repetition counting infers the number of repetitions of recurring actions or motion within a video. We propose an exemplar-based approach that discovers visual correspondence of video exemplars across repetitions within target videos. Our proposed Every Shot Counts (ESCounts) model is an attention-based encoder-decoder that encodes videos of varying lengths alongside exemplars from the same and different videos. In training, ESCounts regresses locations of high correspondence to the exemplars within the video. In tandem, our method learns a latent that encodes representations of general repetitive motions, which we use for exemplar-free, zero-shot inference. Extensive experiments over commonly used datasets (RepCount, Countix, and UCFRep) showcase ESCounts obtaining state-of-the-art performance across all three datasets. On RepCount, ESCounts increases the off-by-one from 0.39 to 0.56 and decreases the mean absolute error from 0.38 to 0.21. Detailed ablations further demonstrate the effectiveness of our method. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Project website: https://sinhasaptarshi.github.io/escounts

arXiv:2402.15589 [pdf, other]

Prompting LLMs to Compose Meta-Review Drafts from Peer-Review Narratives of Scholarly Manuscripts

Authors: Shubhra Kanti Karmaker Santu, Sanjeev Kumar Sinha, Naman Bansal, Alex Knipper, Souvika Sarkar, John Salvador, Yash Mahajan, Sri Guttikonda, Mousumi Akter, Matthew Freestone, Matthew C. Williams Jr

Abstract: One of the most important yet onerous tasks in the academic peer-reviewing process is composing meta-reviews, which involves understanding the core contributions, strengths, and weaknesses of a scholarly manuscript based on peer-review narratives from multiple experts and then summarizing those multiple experts' perspectives into a concise holistic overview. Given the latest major developments in… ▽ More One of the most important yet onerous tasks in the academic peer-reviewing process is composing meta-reviews, which involves understanding the core contributions, strengths, and weaknesses of a scholarly manuscript based on peer-review narratives from multiple experts and then summarizing those multiple experts' perspectives into a concise holistic overview. Given the latest major developments in generative AI, especially Large Language Models (LLMs), it is very compelling to rigorously study the utility of LLMs in generating such meta-reviews in an academic peer-review setting. In this paper, we perform a case study with three popular LLMs, i.e., GPT-3.5, LLaMA2, and PaLM2, to automatically generate meta-reviews by prompting them with different types/levels of prompts based on the recently proposed TELeR taxonomy. Finally, we perform a detailed qualitative study of the meta-reviews generated by the LLMs and summarize our findings and recommendations for prompting LLMs for this complex task. △ Less

Submitted 23 February, 2024; originally announced February 2024.

ACM Class: I.2.7

arXiv:2402.15037 [pdf, other]

Analyzing Games in Maker Protocol Part One: A Multi-Agent Influence Diagram Approach Towards Coordination

Authors: Abhimanyu Nag, Samrat Gupta, Sudipan Sinha, Arka Datta

Abstract: Decentralized Finance (DeFi) ecosystems, exemplified by the Maker Protocol, rely on intricate games to maintain stability and security. Understanding the dynamics of these games is crucial for ensuring the robustness of the system. This motivating research proposes a novel methodology leveraging Multi-Agent Influence Diagrams (MAID), originally proposed by Koller and Milch, to dissect and analyze… ▽ More Decentralized Finance (DeFi) ecosystems, exemplified by the Maker Protocol, rely on intricate games to maintain stability and security. Understanding the dynamics of these games is crucial for ensuring the robustness of the system. This motivating research proposes a novel methodology leveraging Multi-Agent Influence Diagrams (MAID), originally proposed by Koller and Milch, to dissect and analyze the games within the Maker stablecoin protocol. By representing users and governance of the Maker protocol as agents and their interactions as edges in a graph, we capture the complex network of influences governing agent behaviors. Furthermore in the upcoming papers, we will show a Nash Equilibrium model to elucidate strategies that promote coordination and enhance economic security within the ecosystem. Through this approach, we aim to motivate the use of this method to introduce a new method of formal verification of game theoretic security in DeFi platforms. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.12629 [pdf, other]

Television Discourse Decoded: Comprehensive Multimodal Analytics at Scale

Authors: Anmol Agarwal, Pratyush Priyadarshi, Shiven Sinha, Shrey Gupta, Hitkul Jangra, Kiran Garimella, Ponnurangam Kumaraguru

Abstract: In this paper, we tackle the complex task of analyzing televised debates, with a focus on a prime time news debate show from India. Previous methods, which often relied solely on text, fall short in capturing the multimedia essence of these debates. To address this gap, we introduce a comprehensive automated toolkit that employs advanced computer vision and speech-to-text techniques for large-scal… ▽ More In this paper, we tackle the complex task of analyzing televised debates, with a focus on a prime time news debate show from India. Previous methods, which often relied solely on text, fall short in capturing the multimedia essence of these debates. To address this gap, we introduce a comprehensive automated toolkit that employs advanced computer vision and speech-to-text techniques for large-scale multimedia analysis. Utilizing state-of-the-art computer vision algorithms and speech-to-text methods, we transcribe, diarize, and analyze thousands of YouTube videos of prime-time television debates in India. These debates are a central part of Indian media but have been criticized for compromised journalistic integrity and excessive dramatization. Our toolkit provides concrete metrics to assess bias and incivility, capturing a comprehensive multimedia perspective that includes text, audio utterances, and video frames. Our findings reveal significant biases in topic selection and panelist representation, along with alarming levels of incivility. This work offers a scalable, automated approach for future research in multimedia analysis, with profound implications for the quality of public discourse and democratic debate. We will make our data analysis pipeline and collected data publicly available to catalyze further research in this domain. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.08823 [pdf, other]

RanDumb: A Simple Approach that Questions the Efficacy of Continual Representation Learning

Authors: Ameya Prabhu, Shiven Sinha, Ponnurangam Kumaraguru, Philip H. S. Torr, Ozan Sener, Puneet K. Dokania

Abstract: We propose RanDumb to examine the efficacy of continual representation learning. RanDumb embeds raw pixels using a fixed random transform which approximates an RBF-Kernel, initialized before seeing any data, and learns a simple linear classifier on top. We present a surprising and consistent finding: RanDumb significantly outperforms the continually learned representations using deep networks acro… ▽ More We propose RanDumb to examine the efficacy of continual representation learning. RanDumb embeds raw pixels using a fixed random transform which approximates an RBF-Kernel, initialized before seeing any data, and learns a simple linear classifier on top. We present a surprising and consistent finding: RanDumb significantly outperforms the continually learned representations using deep networks across numerous continual learning benchmarks, demonstrating the poor performance of representation learning in these scenarios. RanDumb stores no exemplars and performs a single pass over the data, processing one sample at a time. It complements GDumb, operating in a low-exemplar regime where GDumb has especially poor performance. We reach the same consistent conclusions when RanDumb is extended to scenarios with pretrained models replacing the random transform with pretrained feature extractor. Our investigation is both surprising and alarming as it questions our understanding of how to effectively design and train models that require efficient continual representation learning, and necessitates a principled reinvestigation of the widely explored problem formulation itself. Our code is available at https://github.com/drimpossible/RanDumb. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Tech Report

arXiv:2402.05466 [pdf, other]

Engineering End-to-End Remote Labs using IoT-based Retrofitting

Authors: K. S. Viswanadh, Akshit Gureja, Nagesh Walchatwar, Rishabh Agrawal, Shiven Sinha, Sachin Chaudhari, Karthik Vaidhyanathan, Venkatesh Choppella, Prabhakar Bhimalapuram, Harikumar Kandath, Aftab Hussain

Abstract: Remote labs are a groundbreaking development in the education industry, providing students with access to laboratory education anytime, anywhere. However, most remote labs are costly and difficult to scale, especially in develo** countries. With this as a motivation, this paper proposes a new remote labs (RLabs) solution that includes two use case experiments: Vanishing Rod and Focal Length. The… ▽ More Remote labs are a groundbreaking development in the education industry, providing students with access to laboratory education anytime, anywhere. However, most remote labs are costly and difficult to scale, especially in develo** countries. With this as a motivation, this paper proposes a new remote labs (RLabs) solution that includes two use case experiments: Vanishing Rod and Focal Length. The hardware experiments are built at a low-cost by retrofitting Internet of Things (IoT) components. They are also made portable by designing miniaturised and modular setups. The software architecture designed as part of the solution seamlessly supports the scalability of the experiments, offering compatibility with a wide range of hardware devices and IoT platforms. Additionally, it can live-stream remote experiments without needing dedicated server space for the stream. The software architecture also includes an automation suite that periodically checks the status of the experiments using computer vision (CV). RLabs is qualitatively evaluated against seven non-functional attributes - affordability, portability, scalability, compatibility, maintainability, usability, and universality. Finally, user feedback was collected from a group of students, and the scores indicate a positive response to the students' learning and the platform's usability. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 30 pages, 7 tables and 20 figures. Submitted to ACM Transactions on IoT

arXiv:2402.04466 [pdf, other]

Towards Deterministic End-to-end Latency for Medical AI Systems in NVIDIA Holoscan

Authors: Soham Sinha, Shekhar Dwivedi, Mahdi Azizian

Abstract: The introduction of AI and ML technologies into medical devices has revolutionized healthcare diagnostics and treatments. Medical device manufacturers are keen to maximize the advantages afforded by AI and ML by consolidating multiple applications onto a single platform. However, concurrent execution of several AI applications, each with its own visualization components, leads to unpredictable end… ▽ More The introduction of AI and ML technologies into medical devices has revolutionized healthcare diagnostics and treatments. Medical device manufacturers are keen to maximize the advantages afforded by AI and ML by consolidating multiple applications onto a single platform. However, concurrent execution of several AI applications, each with its own visualization components, leads to unpredictable end-to-end latency, primarily due to GPU resource contentions. To mitigate this, manufacturers typically deploy separate workstations for distinct AI applications, thereby increasing financial, energy, and maintenance costs. This paper addresses these challenges within the context of NVIDIA's Holoscan platform, a real-time AI system for streaming sensor data and images. We propose a system design optimized for heterogeneous GPU workloads, encompassing both compute and graphics tasks. Our design leverages CUDA MPS for spatial partitioning of compute workloads and isolates compute and graphics processing onto separate GPUs. We demonstrate significant performance improvements across various end-to-end latency determinism metrics through empirical evaluation with real-world Holoscan medical device applications. For instance, the proposed design reduces maximum latency by 21-30% and improves latency distribution flatness by 17-25% for up to five concurrent endoscopy tool tracking AI applications, compared to a single-GPU baseline. Against a default multi-GPU setup, our optimizations decrease maximum latency by 35% for up to six concurrent applications by improving GPU utilization by 42%. This paper provides clear design insights for AI applications in the edge-computing domain including medical systems, where performance predictability of concurrent and heterogeneous GPU workloads is a critical requirement. △ Less

Submitted 6 February, 2024; originally announced February 2024.

ACM Class: C.3; J.7; D.2.11; D.2.10; D.4.8

arXiv:2402.01980 [pdf, other]

SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks

Authors: Gourab Dey, Adithya V Ganesan, Yash Kumar Lal, Manal Shah, Shreyashee Sinha, Matthew Matero, Salvatore Giorgi, Vivek Kulkarni, H. Andrew Schwartz

Abstract: Social science NLP tasks, such as emotion or humor detection, are required to capture the semantics along with the implicit pragmatics from text, often with limited amounts of training data. Instruction tuning has been shown to improve the many capabilities of large language models (LLMs) such as commonsense reasoning, reading comprehension, and computer programming. However, little is known about… ▽ More Social science NLP tasks, such as emotion or humor detection, are required to capture the semantics along with the implicit pragmatics from text, often with limited amounts of training data. Instruction tuning has been shown to improve the many capabilities of large language models (LLMs) such as commonsense reasoning, reading comprehension, and computer programming. However, little is known about the effectiveness of instruction tuning on the social domain where implicit pragmatic cues are often needed to be captured. We explore the use of instruction tuning for social science NLP tasks and introduce Socialite-Llama -- an open-source, instruction-tuned Llama. On a suite of 20 social science tasks, Socialite-Llama improves upon the performance of Llama as well as matches or improves upon the performance of a state-of-the-art, multi-task finetuned model on a majority of them. Further, Socialite-Llama also leads to improvement on 5 out of 6 related social tasks as compared to Llama, suggesting instruction tuning can lead to generalized social understanding. All resources including our code, model and dataset can be found through bit.ly/socialitellama. △ Less

Submitted 14 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: Short paper accepted to EACL 2024. 4 pgs, 2 tables

arXiv:2401.18083 [pdf, other]

Improved Scene Landmark Detection for Camera Localization

Authors: Tien Do, Sudipta N. Sinha

Abstract: Camera localization methods based on retrieval, local feature matching, and 3D structure-based pose estimation are accurate but require high storage, are slow, and are not privacy-preserving. A method based on scene landmark detection (SLD) was recently proposed to address these limitations. It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-spe… ▽ More Camera localization methods based on retrieval, local feature matching, and 3D structure-based pose estimation are accurate but require high storage, are slow, and are not privacy-preserving. A method based on scene landmark detection (SLD) was recently proposed to address these limitations. It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks and computing camera pose from the associated 2D-3D correspondences. Although SLD outperformed existing learning-based approaches, it was notably less accurate than 3D structure-based methods. In this paper, we show that the accuracy gap was due to insufficient model capacity and noisy labels during training. To mitigate the capacity issue, we propose to split the landmarks into subgroups and train a separate network for each subgroup. To generate better training labels, we propose using dense reconstructions to estimate visibility of scene landmarks. Finally, we present a compact architecture to improve memory efficiency. Accuracy wise, our approach is on par with state of the art structure based methods on the INDOOR-6 dataset but runs significantly faster and uses less storage. Code and models can be found at https://github.com/microsoft/SceneLandmarkLocalization. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: To be presented at 3DV 2024

arXiv:2401.01596 [pdf, other]

MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries

Authors: Akash Ghosh, Arkadeep Acharya, Prince Jha, Aniket Gaudgaul, Rajdeep Majumdar, Sriparna Saha, Aman Chadha, Raghav Jain, Setu Sinha, Shivani Agarwal

Abstract: In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical quest… ▽ More In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical question summarisation have been limited to the English language. This work introduces the task of multimodal medical question summarization for codemixed input in a low-resource setting. To address this gap, we introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset, which combines Hindi-English codemixed medical queries with visual aids. This integration enriches the representation of a patient's medical condition, providing a more comprehensive perspective. We also propose a framework named MedSumm that leverages the power of LLMs and VLMs for this task. By utilizing our MMCQS dataset, we demonstrate the value of integrating visual information from images to improve the creation of medically detailed summaries. This multimodal strategy not only improves healthcare decision-making but also promotes a deeper comprehension of patient queries, paving the way for future exploration in personalized and responsive medical care. Our dataset, code, and pre-trained models will be made publicly available. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: ECIR 2024

arXiv:2312.11541 [pdf, other]

CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare

Authors: Akash Ghosh, Arkadeep Acharya, Raghav Jain, Sriparna Saha, Aman Chadha, Setu Sinha

Abstract: In the era of modern healthcare, swiftly generating medical question summaries is crucial for informed and timely patient care. Despite the increasing complexity and volume of medical data, existing studies have focused solely on text-based summarization, neglecting the integration of visual information. Recognizing the untapped potential of combining textual queries with visual representations of… ▽ More In the era of modern healthcare, swiftly generating medical question summaries is crucial for informed and timely patient care. Despite the increasing complexity and volume of medical data, existing studies have focused solely on text-based summarization, neglecting the integration of visual information. Recognizing the untapped potential of combining textual queries with visual representations of medical conditions, we introduce the Multimodal Medical Question Summarization (MMQS) Dataset. This dataset, a major contribution to our work, pairs medical queries with visual aids, facilitating a richer and more nuanced understanding of patient needs. We also propose a framework, utilizing the power of Contrastive Language Image Pretraining(CLIP) and Large Language Models(LLMs), consisting of four modules that identify medical disorders, generate relevant context, filter medical concepts, and craft visually aware summaries. Our comprehensive framework harnesses the power of CLIP, a multimodal foundation model, and various general-purpose LLMs, comprising four main modules: the medical disorder identification module, the relevant context generation module, the context filtration module for distilling relevant medical concepts and knowledge, and finally, a general-purpose LLM to generate visually aware medical question summaries. Leveraging our MMQS dataset, we showcase how visual cues from images enhance the generation of medically nuanced summaries. This multimodal approach not only enhances the decision-making process in healthcare but also fosters a more nuanced understanding of patient queries, laying the groundwork for future research in personalized and responsive medical care △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: AAAI 2024

arXiv:2312.00894 [pdf, other]

Leveraging Large Language Models to Improve REST API Testing

Authors: Myeongsoo Kim, Tyler Stennett, Dhruv Shah, Saurabh Sinha, Alessandro Orso

Abstract: The widespread adoption of REST APIs, coupled with their growing complexity and size, has led to the need for automated REST API testing tools. Current tools focus on the structured data in REST API specifications but often neglect valuable insights available in unstructured natural-language descriptions in the specifications, which leads to suboptimal test coverage. Recently, to address this gap,… ▽ More The widespread adoption of REST APIs, coupled with their growing complexity and size, has led to the need for automated REST API testing tools. Current tools focus on the structured data in REST API specifications but often neglect valuable insights available in unstructured natural-language descriptions in the specifications, which leads to suboptimal test coverage. Recently, to address this gap, researchers have developed techniques that extract rules from these human-readable descriptions and query knowledge bases to derive meaningful input values. However, these techniques are limited in the types of rules they can extract and prone to produce inaccurate results. This paper presents RESTGPT, an innovative approach that leverages the power and intrinsic context-awareness of Large Language Models (LLMs) to improve REST API testing. RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification. It then augments the original specification with these rules and values. Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation. Given these promising results, we outline future research directions for advancing REST API testing through LLMs. △ Less

Submitted 29 January, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: To be published in the 46th IEEE/ACM International Conference on Software Engineering - New Ideas and Emerging Results Track (ICSE-NIER 2024)

arXiv:2311.18820 [pdf, other]

Adversarial Attacks and Defenses for Wireless Signal Classifiers using CDI-aware GANs

Authors: Sujata Sinha, Alkan Soysal

Abstract: We introduce a Channel Distribution Information (CDI)-aware Generative Adversarial Network (GAN), designed to address the unique challenges of adversarial attacks in wireless communication systems. The generator in this CDI-aware GAN maps random input noise to the feature space, generating perturbations intended to deceive a target modulation classifier. Its discriminators play a dual role: one en… ▽ More We introduce a Channel Distribution Information (CDI)-aware Generative Adversarial Network (GAN), designed to address the unique challenges of adversarial attacks in wireless communication systems. The generator in this CDI-aware GAN maps random input noise to the feature space, generating perturbations intended to deceive a target modulation classifier. Its discriminators play a dual role: one enforces that the perturbations follow a Gaussian distribution, making them indistinguishable from Gaussian noise, while the other ensures these perturbations account for realistic channel effects and resemble no-channel perturbations. Our proposed CDI-aware GAN can be used as an attacker and a defender. In attack scenarios, the CDI-aware GAN demonstrates its prowess by generating robust adversarial perturbations that effectively deceive the target classifier, outperforming known methods. Furthermore, CDI-aware GAN as a defender significantly improves the target classifier's resilience against adversarial attacks. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2310.04540 [pdf, other]

Multi-decadal Sea Level Prediction using Neural Networks and Spectral Clustering on Climate Model Large Ensembles and Satellite Altimeter Data

Authors: Saumya Sinha, John Fasullo, R. Steven Nerem, Claire Monteleoni

Abstract: Sea surface height observations provided by satellite altimetry since 1993 show a rising rate (3.4 mm/year) for global mean sea level. While on average, sea level has risen 10 cm over the last 30 years, there is considerable regional variation in the sea level change. Through this work, we predict sea level trends 30 years into the future at a 2-degree spatial resolution and investigate the future… ▽ More Sea surface height observations provided by satellite altimetry since 1993 show a rising rate (3.4 mm/year) for global mean sea level. While on average, sea level has risen 10 cm over the last 30 years, there is considerable regional variation in the sea level change. Through this work, we predict sea level trends 30 years into the future at a 2-degree spatial resolution and investigate the future patterns of the sea level change. We show the potential of machine learning (ML) in this challenging application of long-term sea level forecasting over the global ocean. Our approach incorporates sea level data from both altimeter observations and climate model simulations. We develop a supervised learning framework using fully connected neural networks (FCNNs) that can predict the sea level trend based on climate model projections. Alongside this, our method provides uncertainty estimates associated with the ML prediction. We also show the effectiveness of partitioning our spatial dataset and learning a dedicated ML model for each segmented region. We compare two partitioning strategies: one achieved using domain knowledge, and the other employing spectral clustering. Our results demonstrate that segmenting the spatial dataset with spectral clustering improves the ML predictions. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.15782 [pdf]

Joint-YODNet: A Light-weight Object Detector for UAVs to Achieve Above 100fps

Authors: Vipin Gautam, Shitala Prasad, Sharad Sinha

Abstract: Small object detection via UAV (Unmanned Aerial Vehicle) images captured from drones and radar is a complex task with several formidable challenges. This domain encompasses numerous complexities that impede the accurate detection and localization of small objects. To address these challenges, we propose a novel method called JointYODNet for UAVs to detect small objects, leveraging a joint loss fun… ▽ More Small object detection via UAV (Unmanned Aerial Vehicle) images captured from drones and radar is a complex task with several formidable challenges. This domain encompasses numerous complexities that impede the accurate detection and localization of small objects. To address these challenges, we propose a novel method called JointYODNet for UAVs to detect small objects, leveraging a joint loss function specifically designed for this task. Our method revolves around the development of a joint loss function tailored to enhance the detection performance of small objects. Through extensive experimentation on a diverse dataset of UAV images captured under varying environmental conditions, we evaluated different variations of the loss function and determined the most effective formulation. The results demonstrate that our proposed joint loss function outperforms existing methods in accurately localizing small objects. Specifically, our method achieves a recall of 0.971, and a F1Score of 0.975, surpassing state-of-the-art techniques. Additionally, our method achieves a [email protected](%) of 98.6, indicating its robustness in detecting small objects across varying scales △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.15780 [pdf]

AaP-ReID: Improved Attention-Aware Person Re-identification

Authors: Vipin Gautam, Shitala Prasad, Sharad Sinha

Abstract: Person re-identification (ReID) is a well-known problem in the field of computer vision. The primary objective is to identify a specific individual within a gallery of images. However, this task is challenging due to various factors, such as pose variations, illumination changes, obstructions, and the presence ofconfusing backgrounds. Existing ReID methods often fail to capture discriminative feat… ▽ More Person re-identification (ReID) is a well-known problem in the field of computer vision. The primary objective is to identify a specific individual within a gallery of images. However, this task is challenging due to various factors, such as pose variations, illumination changes, obstructions, and the presence ofconfusing backgrounds. Existing ReID methods often fail to capture discriminative features (e.g., head, shoes, backpacks) and instead capture irrelevant features when the target is occluded. Motivated by the success of part-based and attention-based ReID methods, we improve AlignedReID++ and present AaP-ReID, a more effective method for person ReID that incorporates channel-wise attention into a ResNet-based architecture. Our method incorporates the Channel-Wise Attention Bottleneck (CWAbottleneck) block and can learn discriminating features by dynamically adjusting the importance ofeach channel in the feature maps. We evaluated Aap-ReID on three benchmark datasets: Market-1501, DukeMTMC-reID, and CUHK03. When compared with state-of-the-art person ReID methods, we achieve competitive results with rank-1 accuracies of 95.6% on Market-1501, 90.6% on DukeMTMC-reID, and 82.4% on CUHK03. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.13387 [pdf]

YOLORe-IDNet: An Efficient Multi-Camera System for Person-Tracking

Authors: Vipin Gautam, Shitala Prasad, Sharad Sinha

Abstract: The growing need for video surveillance in public spaces has created a demand for systems that can track individuals across multiple cameras feeds in real-time. While existing tracking systems have achieved impressive performance using deep learning models, they often rely on pre-existing images of suspects or historical data. However, this is not always feasible in cases where suspicious individu… ▽ More The growing need for video surveillance in public spaces has created a demand for systems that can track individuals across multiple cameras feeds in real-time. While existing tracking systems have achieved impressive performance using deep learning models, they often rely on pre-existing images of suspects or historical data. However, this is not always feasible in cases where suspicious individuals are identified in real-time and without prior knowledge. We propose a person-tracking system that combines correlation filters and Intersection Over Union (IOU) constraints for robust tracking, along with a deep learning model for cross-camera person re-identification (Re-ID) on top of YOLOv5. The proposed system quickly identifies and tracks suspect in real-time across multiple cameras and recovers well after full or partial occlusion, making it suitable for security and surveillance applications. It is computationally efficient and achieves a high F1-Score of 79% and an IOU of 59% comparable to existing state-of-the-art algorithms, as demonstrated in our evaluation on a publicly available OTB-100 dataset. The proposed system offers a robust and efficient solution for the real-time tracking of individuals across multiple camera feeds. Its ability to track targets without prior knowledge or historical data is a significant improvement over existing systems, making it well-suited for public safety and surveillance applications. △ Less

Submitted 23 September, 2023; originally announced September 2023.

arXiv:2309.04583 [pdf, other]

Adaptive REST API Testing with Reinforcement Learning

Authors: Myeongsoo Kim, Saurabh Sinha, Alessandro Orso

Abstract: Modern web services increasingly rely on REST APIs. Effectively testing these APIs is challenging due to the vast search space to be explored, which involves selecting API operations for sequence creation, choosing parameters for each operation from a potentially large set of parameters, and sampling values from the virtually infinite parameter input space. Current testing tools lack efficient exp… ▽ More Modern web services increasingly rely on REST APIs. Effectively testing these APIs is challenging due to the vast search space to be explored, which involves selecting API operations for sequence creation, choosing parameters for each operation from a potentially large set of parameters, and sampling values from the virtually infinite parameter input space. Current testing tools lack efficient exploration mechanisms, treating all operations and parameters equally (i.e., not considering their importance or complexity) and lacking prioritization strategies. Furthermore, these tools struggle when response schemas are absent in the specification or exhibit variants. To address these limitations, we present an adaptive REST API testing technique that incorporates reinforcement learning to prioritize operations and parameters during exploration. Our approach dynamically analyzes request and response data to inform dependent parameters and adopts a sampling-based strategy for efficient processing of dynamic API feedback. We evaluated our technique on ten RESTful services, comparing it against state-of-the-art REST testing tools with respect to code coverage achieved, requests generated, operations covered, and service failures triggered. Additionally, we performed an ablation study on prioritization, dynamic feedback analysis, and sampling to assess their individual effects. Our findings demonstrate that our approach outperforms existing REST API testing tools in terms of effectiveness, efficiency, and fault-finding ability. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: To be published in the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023)

arXiv:2309.01086 [pdf, other]

MILA: Memory-Based Instance-Level Adaptation for Cross-Domain Object Detection

Authors: Onkar Krishna, Hiroki Ohashi, Saptarshi Sinha

Abstract: Cross-domain object detection is challenging, and it involves aligning labeled source and unlabeled target domains. Previous approaches have used adversarial training to align features at both image-level and instance-level. At the instance level, finding a suitable source sample that aligns with a target sample is crucial. A source sample is considered suitable if it differs from the target sampl… ▽ More Cross-domain object detection is challenging, and it involves aligning labeled source and unlabeled target domains. Previous approaches have used adversarial training to align features at both image-level and instance-level. At the instance level, finding a suitable source sample that aligns with a target sample is crucial. A source sample is considered suitable if it differs from the target sample only in domain, without differences in unimportant characteristics such as orientation and color, which can hinder the model's focus on aligning the domain difference. However, existing instance-level feature alignment methods struggle to find suitable source instances because their search scope is limited to mini-batches. Mini-batches are often so small in size that they do not always contain suitable source instances. The insufficient diversity of mini-batches becomes problematic particularly when the target instances have high intra-class variance. To address this issue, we propose a memory-based instance-level domain adaptation framework. Our method aligns a target instance with the most similar source instance of the same category retrieved from a memory storage. Specifically, we introduce a memory module that dynamically stores the pooled features of all labeled source instances, categorized by their labels. Additionally, we introduce a simple yet effective memory retrieval module that retrieves a set of matching memory slots for target instances. Our experiments on various domain shift scenarios demonstrate that our approach outperforms existing non-memory-based methods significantly. △ Less

Submitted 3 September, 2023; originally announced September 2023.

arXiv:2308.03109 [pdf, other]

doi 10.1145/3597503.3639226

Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code

Authors: Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, Lambert Pouguem Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, Reyhaneh Jabbarvand

Abstract: Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are exploring their potential to automate code translation. The prerequisite for advancing the state of LLM-based code translation is to understand their promises and limitations over existing techniques. To that en… ▽ More Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are exploring their potential to automate code translation. The prerequisite for advancing the state of LLM-based code translation is to understand their promises and limitations over existing techniques. To that end, we present a large-scale empirical study to investigate the ability of general LLMs and code LLMs for code translation across pairs of different languages, including C, C++, Go, Java, and Python. Our study, which involves the translation of 1,700 code samples from three benchmarks and two real-world projects, reveals that LLMs are yet to be reliably used to automate code translation -- with correct translations ranging from 2.1% to 47.3% for the studied LLMs. Further manual investigation of unsuccessful translations identifies 15 categories of translation bugs. We also compare LLM-based code translation with traditional non-LLM-based approaches. Our analysis shows that these two classes of techniques have their own strengths and weaknesses. Finally, insights from our study suggest that providing more context to LLMs during translation can help them produce better results. To that end, we propose a prompt-crafting approach based on the symptoms of erroneous translations; this improves the performance of LLM-based code translation by 5.5% on average. Our study is the first of its kind, in terms of scale and breadth, that provides insights into the current limitations of LLMs in code translation and opportunities for improving them. Our dataset -- consisting of 1,700 code samples in five PLs with 10K+ tests, 43K+ translated code, 1,748 manually labeled bugs, and 1,365 bug-fix pairs -- can help drive research in this area. △ Less

Submitted 16 January, 2024; v1 submitted 6 August, 2023; originally announced August 2023.

Comments: Published in ICSE 2024

arXiv:2308.02460 [pdf, other]

Sea level Projections with Machine Learning using Altimetry and Climate Model ensembles

Authors: Saumya Sinha, John Fasullo, R. Steven Nerem, Claire Monteleoni

Abstract: Satellite altimeter observations retrieved since 1993 show that the global mean sea level is rising at an unprecedented rate (3.4mm/year). With almost three decades of observations, we can now investigate the contributions of anthropogenic climate-change signals such as greenhouse gases, aerosols, and biomass burning in this rising sea level. We use machine learning (ML) to investigate future patt… ▽ More Satellite altimeter observations retrieved since 1993 show that the global mean sea level is rising at an unprecedented rate (3.4mm/year). With almost three decades of observations, we can now investigate the contributions of anthropogenic climate-change signals such as greenhouse gases, aerosols, and biomass burning in this rising sea level. We use machine learning (ML) to investigate future patterns of sea level change. To understand the extent of contributions from the climate-change signals, and to help in forecasting sea level change in the future, we turn to climate model simulations. This work presents a machine learning framework that exploits both satellite observations and climate model simulations to generate sea level rise projections at a 2-degree resolution spatial grid, 30 years into the future. We train fully connected neural networks (FCNNs) to predict altimeter values through a non-linear fusion of the climate model hindcasts (for 1993-2019). The learned FCNNs are then applied to future climate model projections to predict future sea level patterns. We propose segmenting our spatial dataset into meaningful clusters and show that clustering helps to improve predictions of our ML model. △ Less

Submitted 2 August, 2023; originally announced August 2023.

arXiv:2307.00453 [pdf, other]

Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters

Authors: Anshu Bhatia, Sanchit Sinha, Saket Dingliwal, Karthik Gopalakrishnan, Sravan Bodapati, Katrin Kirchhoff

Abstract: Speech representations learned in a self-supervised fashion from massive unlabeled speech corpora have been adapted successfully toward several downstream tasks. However, such representations may be skewed toward canonical data characteristics of such corpora and perform poorly on atypical, non-native accented speaker populations. With the state-of-the-art HuBERT model as a baseline, we propose an… ▽ More Speech representations learned in a self-supervised fashion from massive unlabeled speech corpora have been adapted successfully toward several downstream tasks. However, such representations may be skewed toward canonical data characteristics of such corpora and perform poorly on atypical, non-native accented speaker populations. With the state-of-the-art HuBERT model as a baseline, we propose and investigate self-supervised adaptation of speech representations to such populations in a parameter-efficient way via training accent-specific residual adapters. We experiment with 4 accents and choose automatic speech recognition (ASR) as the downstream task of interest. We obtain strong word error rate reductions (WERR) over HuBERT-large for all 4 accents, with a mean WERR of 22.7% with accent-specific adapters and a mean WERR of 25.1% if the entire encoder is accent-adapted. While our experiments utilize HuBERT and ASR as the downstream task, our proposed approach is both model and task-agnostic. △ Less

Submitted 1 July, 2023; originally announced July 2023.

arXiv:2306.16722 [pdf, other]

Evaluating Paraphrastic Robustness in Textual Entailment Models

Authors: Dhruv Verma, Yash Kumar Lal, Shreyashee Sinha, Benjamin Van Durme, Adam Poliak

Abstract: We present PaRTE, a collection of 1,126 pairs of Recognizing Textual Entailment (RTE) examples to evaluate whether models are robust to paraphrasing. We posit that if RTE models understand language, their predictions should be consistent across inputs that share the same meaning. We use the evaluation set to determine if RTE models' predictions change when examples are paraphrased. In our experime… ▽ More We present PaRTE, a collection of 1,126 pairs of Recognizing Textual Entailment (RTE) examples to evaluate whether models are robust to paraphrasing. We posit that if RTE models understand language, their predictions should be consistent across inputs that share the same meaning. We use the evaluation set to determine if RTE models' predictions change when examples are paraphrased. In our experiments, contemporary models change their predictions on 8-16\% of paraphrased examples, indicating that there is still room for improvement. △ Less

Submitted 29 June, 2023; originally announced June 2023.

arXiv:2306.02618 [pdf, ps, other]

doi 10.1145/3580305.3599333

Enhance Diffusion to Improve Robust Generalization

Authors: Jianhui Sun, Sanchit Sinha, Aidong Zhang

Abstract: Deep neural networks are susceptible to human imperceptible adversarial perturbations. One of the strongest defense mechanisms is \emph{Adversarial Training} (AT). In this paper, we aim to address two predominant problems in AT. First, there is still little consensus on how to set hyperparameters with a performance guarantee for AT research, and customized settings impede a fair comparison between… ▽ More Deep neural networks are susceptible to human imperceptible adversarial perturbations. One of the strongest defense mechanisms is \emph{Adversarial Training} (AT). In this paper, we aim to address two predominant problems in AT. First, there is still little consensus on how to set hyperparameters with a performance guarantee for AT research, and customized settings impede a fair comparison between different model designs in AT research. Second, the robustly trained neural networks struggle to generalize well and suffer from tremendous overfitting. This paper focuses on the primary AT framework - Projected Gradient Descent Adversarial Training (PGD-AT). We approximate the dynamic of PGD-AT by a continuous-time Stochastic Differential Equation (SDE), and show that the diffusion term of this SDE determines the robust generalization. An immediate implication of this theoretical finding is that robust generalization is positively correlated with the ratio between learning rate and batch size. We further propose a novel approach, \emph{Diffusion Enhanced Adversarial Training} (DEAT), to manipulate the diffusion term to improve robust generalization with virtually no extra computational burden. We theoretically show that DEAT obtains a tighter generalization bound than PGD-AT. Our empirical investigation is extensive and firmly attests that DEAT universally outperforms PGD-AT by a significant margin. △ Less

Submitted 17 August, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted at KDD 2023

arXiv:2306.00952 [pdf, other]

Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition

Authors: Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose

Abstract: Speaker identification systems are deployed in diverse environments, often different from the lab conditions on which they are trained and tested. In this paper, first, we show the problem of generalization using fixed thresholds (computed using EER metric) for imposter identification in unseen speaker recognition and then introduce a robust speaker-specific thresholding technique for better perfo… ▽ More Speaker identification systems are deployed in diverse environments, often different from the lab conditions on which they are trained and tested. In this paper, first, we show the problem of generalization using fixed thresholds (computed using EER metric) for imposter identification in unseen speaker recognition and then introduce a robust speaker-specific thresholding technique for better performance. Secondly, inspired by the recent use of meta-learning techniques in speaker verification, we propose an end-to-end meta-learning framework for imposter detection which decouples the problem of imposter detection from unseen speaker identification. Thus, unlike most prior works that use some heuristics to detect imposters, the proposed network learns to detect imposters by leveraging the utterances of the enrolled speakers. Furthermore, we show the efficacy of the proposed techniques on VoxCeleb1, VCTK and the FFSVC 2022 datasets, beating the baselines by up to 10%. △ Less

Submitted 30 September, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.14692 [pdf, other]

Carving UI Tests to Generate API Tests and API Specification

Authors: Rahulkrishna Yandrapally, Saurabh Sinha, Rachel Tzoref-Brill, Ali Mesbah

Abstract: Modern web applications make extensive use of API calls to update the UI state in response to user events or server-side changes. For such applications, API-level testing can play an important role, in-between unit-level testing and UI-level (or end-to-end) testing. Existing API testing tools require API specifications (e.g., OpenAPI), which often may not be available or, when available, be incons… ▽ More Modern web applications make extensive use of API calls to update the UI state in response to user events or server-side changes. For such applications, API-level testing can play an important role, in-between unit-level testing and UI-level (or end-to-end) testing. Existing API testing tools require API specifications (e.g., OpenAPI), which often may not be available or, when available, be inconsistent with the API implementation, thus limiting the applicability of automated API testing to web applications. In this paper, we present an approach that leverages UI testing to enable API-level testing for web applications. Our technique navigates the web application under test and automatically generates an API-level test suite, along with an OpenAPI specification that describes the application's server-side APIs (for REST-based web applications). A key element of our solution is a dynamic approach for inferring API endpoints with path parameters via UI navigation and directed API probing. We evaluated the technique for its accuracy in inferring API specifications and the effectiveness of the "carved" API tests. Our results on seven open-source web applications show that the technique achieves 98% precision and 56% recall in inferring endpoints. The carved API tests, when added to test suites generated by two automated REST API testing tools, increase statement coverage by 52% and 29% and branch coverage by 99% and 75%, on average. The main benefits of our technique are: (1) it enables API-level testing of web applications in cases where existing API testing tools are inapplicable and (2) it creates API-level test suites that cover server-side code efficiently while exercising APIs as they would be invoked from an application's web UI, and that can augment existing API test suites. △ Less

Submitted 23 May, 2023; originally announced May 2023.

ACM Class: D.2.5

arXiv:2305.13650 [pdf, other]

Robust Model-Based Optimization for Challenging Fitness Landscapes

Authors: Saba Ghaffari, Ehsan Saleh, Alexander G. Schwing, Yu-Xiong Wang, Martin D. Burke, Saurabh Sinha

Abstract: Protein design, a grand challenge of the day, involves optimization on a fitness landscape, and leading methods adopt a model-based approach where a model is trained on a training set (protein sequences and fitness) and proposes candidates to explore next. These methods are challenged by sparsity of high-fitness samples in the training set, a problem that has been in the literature. A less recogni… ▽ More Protein design, a grand challenge of the day, involves optimization on a fitness landscape, and leading methods adopt a model-based approach where a model is trained on a training set (protein sequences and fitness) and proposes candidates to explore next. These methods are challenged by sparsity of high-fitness samples in the training set, a problem that has been in the literature. A less recognized but equally important problem stems from the distribution of training samples in the design space: leading methods are not designed for scenarios where the desired optimum is in a region that is not only poorly represented in training data, but also relatively far from the highly represented low-fitness regions. We show that this problem of "separation" in the design space is a significant bottleneck in existing model-based optimization tools and propose a new approach that uses a novel VAE as its search model to overcome the problem. We demonstrate its advantage over prior methods in robustly finding improved samples, regardless of the imbalance and separation between low- and high-fitness samples. Our comprehensive benchmark on real and semi-synthetic protein datasets as well as solution design for physics-informed neural networks, showcases the generality of our approach in discrete and continuous design spaces. Our implementation is available at https://github.com/sabagh1994/PGVAE. △ Less

Submitted 27 June, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2304.12523 [pdf]

CIMLA: Interpretable AI for inference of differential causal networks

Authors: Payam Dibaeinia, Saurabh Sinha

Abstract: The discovery of causal relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model estimates a causal quantity reflecting the influence of one variable on another, under certain assumptions. We lever… ▽ More The discovery of causal relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model estimates a causal quantity reflecting the influence of one variable on another, under certain assumptions. We leverage this insight to implement a new tool, CIMLA, for discovering condition-dependent changes in causal relationships. We then use CIMLA to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data sets, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Finally, we employ CIMLA to analyze a previously published single-cell RNA-seq data set collected from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2304.11262 [pdf, other]

Stochastic MPC Based Attacks on Object Tracking in Autonomous Driving Systems

Authors: Sourav Sinha, Mazen Farhood

Abstract: Decision making in advanced driver assistance systems involves in general the estimated trajectories of the surrounding objects. Multiple object tracking refers to the process of estimating in real time these trajectories, leveraging for this purpose sensors to detect the objects. This paper deals with devising attacks on object tracking in automated vehicles. The vehicle is assumed to have a dete… ▽ More Decision making in advanced driver assistance systems involves in general the estimated trajectories of the surrounding objects. Multiple object tracking refers to the process of estimating in real time these trajectories, leveraging for this purpose sensors to detect the objects. This paper deals with devising attacks on object tracking in automated vehicles. The vehicle is assumed to have a detection-based object tracking system that relies on multiple sensors and uses an estimator such as a Kalman filter for sensor fusion and state estimation. The attack goal is to modify the object's state estimated by the victim vehicle to put the vehicle in an unsafe situation. This goal is achieved by judiciously perturbing some or all of the sensor outputs corresponding to the object of interest over a desired horizon. A stochastic model predictive control (SMPC) problem is formulated to compute the sequence of perturbations, whereby hard constraints on the perturbations and probabilistic chance constraints on the object's state are imposed. The chance constraints ensure that some desired conditions for a successful attack are satisfied with a prespecified probability. Reasonable assumptions are then made to obtain a computationally tractable linear SMPC program. The approach is demonstrated on an adaptive cruise control system in a simulation environment, where successful sequential attacks are generated, leading the victim vehicle into dangerous driving situations including collisions. △ Less

Submitted 1 May, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

Comments: Accepted to IFAC World Congress 2023

arXiv:2304.02925 [pdf]

Computer-aided Diagnosis of Malaria through Transfer Learning using the ResNet50 Backbone

Authors: Sanya Sinha, Nilay Gupta

Abstract: According to the World Malaria Report of 2022, 247 million cases of malaria and 619,000 related deaths were reported in 2021. This highlights the predominance of the disease, especially in the tropical and sub-tropical regions of Africa, parts of South-east Asia, Central and Southern America. Malaria is caused due to the Plasmodium parasite which is circulated through the bites of the female Anoph… ▽ More According to the World Malaria Report of 2022, 247 million cases of malaria and 619,000 related deaths were reported in 2021. This highlights the predominance of the disease, especially in the tropical and sub-tropical regions of Africa, parts of South-east Asia, Central and Southern America. Malaria is caused due to the Plasmodium parasite which is circulated through the bites of the female Anopheles mosquito. Hence, the detection of the parasite in human blood smears could confirm malarial infestation. Since the manual identification of Plasmodium is a lengthy and time-consuming task subject to variability in accuracy, we propose an automated, computer-aided diagnostic method to classify malarial thin smear blood cell images as parasitized and uninfected by using the ResNet50 Deep Neural Network. In this paper, we have used the pre-trained ResNet50 model on the open-access database provided by the National Library of Medicine's Lister Hill National Center for Biomedical Communication for 150 epochs. The results obtained showed accuracy, precision, and recall values of 98.75%, 99.3% and 99.5% on the ResNet50(proposed) model. We have compared these metrics with similar models such as VGG16, Watershed Segmentation and Random Forest, which showed better performance than traditional techniques as well. △ Less

Submitted 6 April, 2023; originally announced April 2023.

ACM Class: I.4.9

arXiv:2304.01143 [pdf, other]

Use Your Head: Improving Long-Tail Video Recognition

Authors: Toby Perrett, Saptarshi Sinha, Tilo Burghardt, Majid Mirmehdi, Dima Damen

Abstract: This paper presents an investigation into long-tail video recognition. We demonstrate that, unlike naturally-collected video datasets and existing long-tail image benchmarks, current video benchmarks fall short on multiple long-tailed properties. Most critically, they lack few-shot classes in their tails. In response, we propose new video benchmarks that better assess long-tail recognition, by sam… ▽ More This paper presents an investigation into long-tail video recognition. We demonstrate that, unlike naturally-collected video datasets and existing long-tail image benchmarks, current video benchmarks fall short on multiple long-tailed properties. Most critically, they lack few-shot classes in their tails. In response, we propose new video benchmarks that better assess long-tail recognition, by sampling subsets from two datasets: SSv2 and VideoLT. We then propose a method, Long-Tail Mixed Reconstruction, which reduces overfitting to instances from few-shot classes by reconstructing them as weighted combinations of samples from head classes. LMR then employs label mixing to learn robust decision boundaries. It achieves state-of-the-art average class accuracy on EPIC-KITCHENS and the proposed SSv2-LT and VideoLT-LT. Benchmarks and code at: tobyperrett.github.io/lmr △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: CVPR 2023

arXiv:2303.14772 [pdf, other]

$Δ$-Patching: A Framework for Rapid Adaptation of Pre-trained Convolutional Networks without Base Performance Loss

Authors: Chaitanya Devaguptapu, Samarth Sinha, K J Joseph, Vineeth N Balasubramanian, Animesh Garg

Abstract: Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time. This process necessitates storing copies of the model over time for each task that the pre-trained model is fine-tuned to. Building on top of recent model patching work, we propose $Δ$-Patching for fine-tuning neural network models in an efficient manner, without the need to s… ▽ More Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time. This process necessitates storing copies of the model over time for each task that the pre-trained model is fine-tuned to. Building on top of recent model patching work, we propose $Δ$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies. We propose a simple and lightweight method called $Δ$-Networks to achieve this objective. Our comprehensive experiments across setting and architecture variants show that $Δ$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained. We also show that this approach can be used for other problem settings such as transfer learning and zero-shot domain adaptation, as well as other tasks such as detection and segmentation. △ Less

Submitted 21 September, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

arXiv:2212.14405 [pdf, other]

Offline Policy Optimization in RL with Variance Regularizaton

Authors: Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Animesh Garg, Zhaoran Wang, Lihong Li, Doina Precup

Abstract: Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications. This is often because off-policy RL algorithms suffer from distributional shift, due to mismatch between dataset and the target policy, leading to high variance and over-estimation of value functions. In this work, we propose variance regularization fo… ▽ More Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications. This is often because off-policy RL algorithms suffer from distributional shift, due to mismatch between dataset and the target policy, leading to high variance and over-estimation of value functions. In this work, we propose variance regularization for offline RL algorithms, using stationary distribution corrections. We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer. The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms. We show that the regularizer leads to a lower bound to the offline policy optimization objective, which can help avoid over-estimation errors, and explains the benefits of our approach across a range of continuous control domains when compared to existing state-of-the-art algorithms. △ Less

Submitted 29 December, 2022; originally announced December 2022.

Comments: Old Draft, Offline RL Workshop, NeurIPS'20;

arXiv:2212.11582 [pdf, other]

doi 10.1145/3543622.3573188

FADO: Floorplan-Aware Directive Optimization for High-Level Synthesis Designs on Multi-Die FPGAs

Authors: Linfeng Du, Tingyuan Liang, Sharad Sinha, Zhiyao Xie, Wei Zhang

Abstract: Multi-die FPGAs are widely adopted to deploy large hardware accelerators. Two factors impede the performance optimization of HLS designs implemented on multi-die FPGAs. On the one hand, the long net delay due to nets crossing die-boundaries results in an NP-hard problem to properly floorplan and pipeline an application. On the other hand, traditional automated searching flow for HLS directive opti… ▽ More Multi-die FPGAs are widely adopted to deploy large hardware accelerators. Two factors impede the performance optimization of HLS designs implemented on multi-die FPGAs. On the one hand, the long net delay due to nets crossing die-boundaries results in an NP-hard problem to properly floorplan and pipeline an application. On the other hand, traditional automated searching flow for HLS directive optimizations targets single-die FPGAs, and hence, it cannot consider the resource constraints on each die and the timing issue incurred by the die-crossings. Further, it leads to an excessively long runtime to legalize the floorplan of HLS designs generated under each group of configurations during directive optimization due to the large design scale. To co-optimize the directives and floorplan of HLS designs on multi-die FPGAs, we propose the FADO framework, which formulates the directive-floorplan co-search problem based on the multi-choice multi-dimensional bin-packing and solves it using an iterative optimization flow. For each step of directive search, a latency-bottleneck-guided greedy algorithm searches for more efficient directive configurations. For floorplanning, instead of repetitively incurring global floorplanning algorithms, we implement a more efficient incremental floorplan legalization algorithm. It mainly applies the worst-fit online bin-packing algorithm to balance the floorplan, together with an offline best-fit-decreasing re-packing to compact the floorplan, followed by pipelining of long wires crossing die-boundaries. Through experiments on HLS designs mixing dataflow and non-dataflow kernels, FADO not only well-automates the co-optimization and finishes within 693X~4925X shorter runtime, compared with DSE assisted by global floorplanning, but also yields an improvement of 1.16X~8.78X in overall workflow execution time after implementation on the Xilinx Alveo U250 FPGA. △ Less

Submitted 5 February, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

Comments: Accepted as a conference paper at FPGA '23. Open source at: https://github.com/RipperJ/FADO

arXiv:2212.05259 [pdf, other]

Online Real-time Learning of Dynamical Systems from Noisy Streaming Data: A Koopman Operator Approach

Authors: S. Sinha, Sai P. Nandanoori, David Barajas-Solano

Abstract: Recent advancements in sensing and communication facilitate obtaining high-frequency real-time data from various physical systems like power networks, climate systems, biological networks, etc. However, since the data are recorded by physical sensors, it is natural that the obtained data is corrupted by measurement noise. In this paper, we present a novel algorithm for online real-time learning of… ▽ More Recent advancements in sensing and communication facilitate obtaining high-frequency real-time data from various physical systems like power networks, climate systems, biological networks, etc. However, since the data are recorded by physical sensors, it is natural that the obtained data is corrupted by measurement noise. In this paper, we present a novel algorithm for online real-time learning of dynamical systems from noisy time-series data, which employs the Robust Koopman operator framework to mitigate the effect of measurement noise. The proposed algorithm has three main advantages: a) it allows for online real-time monitoring of a dynamical system; b) it obtains a linear representation of the underlying dynamical system, thus enabling the user to use linear systems theory for analysis and control of the system; c) it is computationally fast and less intensive than the popular Extended Dynamic Mode Decomposition (EDMD) algorithm. We illustrate the efficiency of the proposed algorithm by applying it to identify the Van der Pol oscillator, the IEEE 68 bus system, and a ring network of Van der Pol oscillators. △ Less

Submitted 24 December, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

arXiv:2211.16991 [pdf, other]

SparsePose: Sparse-View Camera Pose Regression and Refinement

Authors: Samarth Sinha, Jason Y. Zhang, Andrea Tagliasacchi, Igor Gilitschenski, David B. Lindell

Abstract: Camera pose estimation is a key step in standard 3D reconstruction pipelines that operate on a dense set of images of a single object or scene. However, methods for pose estimation often fail when only a few images are available because they rely on the ability to robustly identify and match visual features between image pairs. While these methods can work robustly with dense camera views, capturi… ▽ More Camera pose estimation is a key step in standard 3D reconstruction pipelines that operate on a dense set of images of a single object or scene. However, methods for pose estimation often fail when only a few images are available because they rely on the ability to robustly identify and match visual features between image pairs. While these methods can work robustly with dense camera views, capturing a large set of images can be time-consuming or impractical. We propose SparsePose for recovering accurate camera poses given a sparse set of wide-baseline images (fewer than 10). The method learns to regress initial camera poses and then iteratively refine them after training on a large-scale dataset of objects (Co3D: Common Objects in 3D). SparsePose significantly outperforms conventional and learning-based baselines in recovering accurate camera rotations and translations. We also demonstrate our pipeline for high-fidelity 3D reconstruction using only 5-9 images of an object. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.16080 [pdf, other]

Understanding and Enhancing Robustness of Concept-based Models

Authors: Sanchit Sinha, Mengdi Huai, Jianhui Sun, Aidong Zhang

Abstract: Rising usage of deep neural networks to perform decision making in critical applications like medical diagnosis and financial analysis have raised concerns regarding their reliability and trustworthiness. As automated systems become more mainstream, it is important their decisions be transparent, reliable and understandable by humans for better trust and confidence. To this effect, concept-based m… ▽ More Rising usage of deep neural networks to perform decision making in critical applications like medical diagnosis and financial analysis have raised concerns regarding their reliability and trustworthiness. As automated systems become more mainstream, it is important their decisions be transparent, reliable and understandable by humans for better trust and confidence. To this effect, concept-based models such as Concept Bottleneck Models (CBMs) and Self-Explaining Neural Networks (SENN) have been proposed which constrain the latent space of a model to represent high level concepts easily understood by domain experts in the field. Although concept-based models promise a good approach to both increasing explainability and reliability, it is yet to be shown if they demonstrate robustness and output consistent concepts under systematic perturbations to their inputs. To better understand performance of concept-based models on curated malicious samples, in this paper, we aim to study their robustness to adversarial perturbations, which are also known as the imperceptible changes to the input data that are crafted by an attacker to fool a well-learned concept-based model. Specifically, we first propose and analyze different malicious attacks to evaluate the security vulnerability of concept based models. Subsequently, we propose a potential general adversarial training-based defense mechanism to increase robustness of these systems to the proposed malicious attacks. Extensive experiments on one synthetic and two real-world datasets demonstrate the effectiveness of the proposed attacks and the defense approach. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: Accepted at AAAI 2023. Extended Version

arXiv:2211.11872 [pdf]

doi 10.1109/ICCCIS56430.2022.10037627

Classification of Melanocytic Nevus Images using BigTransfer (BiT)

Authors: Sanya Sinha, Nilay Gupta

Abstract: Skin cancer is a fatal disease that takes a heavy toll over human lives annually. The colored skin images show a significant degree of resemblance between different skin lesions such as melanoma and nevus, making identification and diagnosis more challenging. Melanocytic nevi may mature to cause fatal melanoma. Therefore, the current management protocol involves the removal of those nevi that appe… ▽ More Skin cancer is a fatal disease that takes a heavy toll over human lives annually. The colored skin images show a significant degree of resemblance between different skin lesions such as melanoma and nevus, making identification and diagnosis more challenging. Melanocytic nevi may mature to cause fatal melanoma. Therefore, the current management protocol involves the removal of those nevi that appear intimidating. However, this necessitates resilient classification paradigms for classifying benign and malignant melanocytic nevi. Early diagnosis necessitates a dependable automated system for melanocytic nevi classification to render diagnosis efficient, timely, and successful. An automated classification algorithm is proposed in the given research. A neural network previously-trained on a separate problem statement is leveraged in this technique for classifying melanocytic nevus images. The suggested method uses BigTransfer (BiT), a ResNet-based transfer learning approach for classifying melanocytic nevi as malignant or benign. The results obtained are compared to that of current techniques, and the new method's classification rate is proven to outperform that of existing methods. △ Less

Submitted 6 April, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: 5 pages, 3 figures

ACM Class: I.4.9

arXiv:2211.10972 [pdf]

A Comparative Analysis of Transfer Learning-based Techniques for the Classification of Melanocytic Nevi

Authors: Sanya Sinha, Nilay Gupta

Abstract: Skin cancer is a fatal manifestation of cancer. Unrepaired deoxyribo-nucleic acid (DNA) in skin cells, causes genetic defects in the skin and leads to skin cancer. To deal with lethal mortality rates coupled with skyrocketing costs of medical treatment, early diagnosis is mandatory. To tackle these challenges, researchers have developed a variety of rapid detection tools for skin cancer. Lesion-sp… ▽ More Skin cancer is a fatal manifestation of cancer. Unrepaired deoxyribo-nucleic acid (DNA) in skin cells, causes genetic defects in the skin and leads to skin cancer. To deal with lethal mortality rates coupled with skyrocketing costs of medical treatment, early diagnosis is mandatory. To tackle these challenges, researchers have developed a variety of rapid detection tools for skin cancer. Lesion-specific criteria are utilized to distinguish benign skin cancer from malignant melanoma. In this study, a comparative analysis has been performed on five Transfer Learning-based techniques that have the potential to be leveraged for the classification of melanocytic nevi. These techniques are based on deep convolutional neural networks (DCNNs) that have been pre-trained on thousands of open-source images and are used for day-to-day classification tasks in many instances. △ Less

Submitted 20 November, 2022; originally announced November 2022.

Comments: 12 pages, 5 figures, submitted to International Conference on Advances and Applications of Artificial Intelligence and Machine Learning (ICAAAIML) 2022, to be published in Springer's Lecture Notes in Electrical Engineering

ACM Class: I.4.9

arXiv:2211.03889 [pdf, other]

Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories

Authors: Samarth Sinha, Roman Shapovalov, Jeremy Reizenstein, Ignacio Rocco, Natalia Neverova, Andrea Vedaldi, David Novotny

Abstract: Obtaining photorealistic reconstructions of objects from sparse views is inherently ambiguous and can only be achieved by learning suitable reconstruction priors. Earlier works on sparse rigid object reconstruction successfully learned such priors from large datasets such as CO3D. In this paper, we extend this approach to dynamic objects. We use cats and dogs as a representative example and introd… ▽ More Obtaining photorealistic reconstructions of objects from sparse views is inherently ambiguous and can only be achieved by learning suitable reconstruction priors. Earlier works on sparse rigid object reconstruction successfully learned such priors from large datasets such as CO3D. In this paper, we extend this approach to dynamic objects. We use cats and dogs as a representative example and introduce Common Pets in 3D (CoP3D), a collection of crowd-sourced videos showing around 4,200 distinct pets. CoP3D is one of the first large-scale datasets for benchmarking non-rigid 3D reconstruction "in the wild". We also propose Tracker-NeRF, a method for learning 4D reconstruction from our dataset. At test time, given a small number of video frames of an unseen object, Tracker-NeRF predicts the trajectories of its 3D points and generates new views, interpolating viewpoint and time. Results on CoP3D reveal significantly better non-rigid new-view synthesis performance than existing baselines. △ Less

Submitted 7 November, 2022; originally announced November 2022.

arXiv:2211.03292 [pdf, ps, other]

Approximate Trace Reconstruction from a Single Trace

Authors: Xi Chen, Anindya De, Chin Ho Lee, Rocco A. Servedio, Sandip Sinha

Abstract: The well-known trace reconstruction problem is the problem of inferring an unknown source string $x \in \{0,1\}^n$ from independent "traces", i.e. copies of $x$ that have been corrupted by a $δ$-deletion channel which independently deletes each bit of $x$ with probability $δ$ and concatenates the surviving bits. The current paper considers the extreme data-limited regime in which only a single tra… ▽ More The well-known trace reconstruction problem is the problem of inferring an unknown source string $x \in \{0,1\}^n$ from independent "traces", i.e. copies of $x$ that have been corrupted by a $δ$-deletion channel which independently deletes each bit of $x$ with probability $δ$ and concatenates the surviving bits. The current paper considers the extreme data-limited regime in which only a single trace is provided to the reconstruction algorithm. In this setting exact reconstruction is of course impossible, and the question is to what accuracy the source string $x$ can be approximately reconstructed. We give a detailed study of this question, providing algorithms and lower bounds for the high, intermediate, and low deletion rate regimes in both the worst-case ($x$ is arbitrary) and average-case ($x$ is drawn uniformly from $\{0,1\}^n$) models. In several cases the lower bounds we establish are matched by computationally efficient algorithms that we provide. We highlight our results for the high deletion rate regime: roughly speaking, they show that - Having access to a single trace is already quite useful for worst-case trace reconstruction: an efficient algorithm can perform much more accurate reconstruction, given one trace that is even only a few bits long, than it could given no traces at all. But in contrast, - in the average-case setting, having access to a single trace is provably not very useful: no algorithm, computationally efficient or otherwise, can achieve significantly higher accuracy given one trace that is $o(n)$ bits long than it could with no traces. △ Less

Submitted 6 November, 2022; originally announced November 2022.

Showing 1–50 of 196 results for author: Sinha, S