Search | arXiv e-print repository

Quantum noise induced nonreciprocity for single photon transport in parity-time symmetric systems

Abstract: We show nonreciprocal light propagation for single-photon inputs due to quantum noise in coupled optical systems with gain and loss. We consider two parity-time ($\mathcal{PT}$) symmetric linear optical systems consisting of either two directly coupled resonators or two finite-length waveguides evanescently coupled in parallel. One resonator or waveguide is filled with an active gain medium and th… ▽ More We show nonreciprocal light propagation for single-photon inputs due to quantum noise in coupled optical systems with gain and loss. We consider two parity-time ($\mathcal{PT}$) symmetric linear optical systems consisting of either two directly coupled resonators or two finite-length waveguides evanescently coupled in parallel. One resonator or waveguide is filled with an active gain medium and the other with a passive loss medium. The light propagation is reciprocal in such $\mathcal{PT}$ symmetric linear systems without quantum noise. We show here that light transmission becomes nonreciprocal when we include quantum noises in our modeling, which is essential for a proper physical description. The quantum nonreciprocity is especially pronounced in the $\mathcal{PT}$ broken phase. Transmitted light intensity in the waveguide of incidence is asymmetric for two waveguides even without noise. Quantum noise significantly enhances such asymmetry in the broken phase. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 12 pages, 4 figures

arXiv:2406.12313 [pdf]

A framework for develo** a knowledge management platform

Authors: Marie Lisandra Zepeda Mendoza, Sonali Agarwal, James A. Blackshaw, Vanesa Bol, Audrey Fazzi, Filippo Fiorini, Amy Louise Foreman, Nancy George, Brett R. Johnson, Brian Martin, Dave McComb, Euphemia Mutasa-Gottgens, Helen Parkinson, Martin Romacker, Rolf Russell, Valérien Ségard, Shawn Zheng Kai Tan, Wei Kheng Teh, F. P. Winstanley, Benedict Wong, Adrian M. Smith

Abstract: Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide gu… ▽ More Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide guidance on envisioning, executing, evaluating, and evolving knowledge management platforms. We emphasize essential considerations such as setting knowledge domain boundaries and measuring success, as well as the importance of making knowledge accessible for downstream applications and non-computational users and highlights necessary personal and organizational skills for success. We stress the importance of collaboration and the need for convergence on shared principles and commitment to provide or seek resources to advance KM. The community is invited to join the journey of KM and contribute to the advancement of the field by applying and improving on the guidelines described. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 18 pages, 1 figure

arXiv:2406.05276 [pdf, other]

VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning

Authors: Oshin Dutta, Ritvik Gupta, Sumeet Agarwal

Abstract: In recent years, there has been a growing emphasis on compressing large pre-trained transformer models for resource-constrained devices. However, traditional pruning methods often leave the embedding layer untouched, leading to model over-parameterization. Additionally, they require extensive compression time with large datasets to maintain performance in pruned models. To address these challenges… ▽ More In recent years, there has been a growing emphasis on compressing large pre-trained transformer models for resource-constrained devices. However, traditional pruning methods often leave the embedding layer untouched, leading to model over-parameterization. Additionally, they require extensive compression time with large datasets to maintain performance in pruned models. To address these challenges, we propose VTrans, an iterative pruning framework guided by the Variational Information Bottleneck (VIB) principle. Our method compresses all structural components, including embeddings, attention heads, and layers using VIB-trained masks. This approach retains only essential weights in each layer, ensuring compliance with specified model size or computational constraints. Notably, our method achieves upto 70% more compression than prior state-of-the-art approaches, both task-agnostic and task-specific. We further propose faster variants of our method: Fast-VTrans utilizing only 3% of the data and Faster-VTrans, a time efficient alternative that involves exclusive finetuning of VIB masks, accelerating compression by upto 25 times with minimal performance loss compared to previous methods. Extensive experiments on BERT, ROBERTa, and GPT-2 models substantiate the efficacy of our method. Moreover, our method demonstrates scalability in compressing large models such as LLaMA-2-7B, achieving superior performance compared to previous pruning methods. Additionally, we use attention-based probing to qualitatively assess model redundancy and interpret the efficiency of our approach. Notably, our method considers heads with high attention to special and current tokens in un-pruned model as foremost candidates for pruning while retained heads are observed to attend more to task-critical keywords. △ Less

Submitted 11 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.03142 [pdf, ps, other]

On the Power of Randomization in Fair Classification and Representation

Authors: Sushant Agarwal, Amit Deshpande

Abstract: Fair classification and fair representation learning are two important problems in supervised and unsupervised fair machine learning, respectively. Fair classification asks for a classifier that maximizes accuracy on a given data distribution subject to fairness constraints. Fair representation maps a given data distribution over the original feature space to a distribution over a new representati… ▽ More Fair classification and fair representation learning are two important problems in supervised and unsupervised fair machine learning, respectively. Fair classification asks for a classifier that maximizes accuracy on a given data distribution subject to fairness constraints. Fair representation maps a given data distribution over the original feature space to a distribution over a new representation space such that all classifiers over the representation satisfy fairness. In this paper, we examine the power of randomization in both these problems to minimize the loss of accuracy that results when we impose fairness constraints. Previous work on fair classification has characterized the optimal fair classifiers on a given data distribution that maximize accuracy subject to fairness constraints, e.g., Demographic Parity (DP), Equal Opportunity (EO), and Predictive Equality (PE). We refine these characterizations to demonstrate when the optimal randomized fair classifiers can surpass their deterministic counterparts in accuracy. We also show how the optimal randomized fair classifier that we characterize can be obtained as a solution to a convex optimization problem. Recent work has provided techniques to construct fair representations for a given data distribution such that any classifier over this representation satisfies DP. However, the classifiers on these fair representations either come with no or weak accuracy guarantees when compared to the optimal fair classifier on the original data distribution. Extending our ideas for randomized fair classification, we improve on these works, and construct DP-fair, EO-fair, and PE-fair representations that have provably optimal accuracy and suffer no accuracy loss compared to the optimal DP-fair, EO-fair, and PE-fair classifiers respectively on the original data distribution. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Appeared in ACM FAccT 2022

arXiv:2405.20405 [pdf, other]

Private Mean Estimation with Person-Level Differential Privacy

Authors: Sushant Agarwal, Gautam Kamath, Mahbod Majid, Argyris Mouzakis, Rose Silver, Jonathan Ullman

Abstract: We study differentially private (DP) mean estimation in the case where each person holds multiple samples. Commonly referred to as the "user-level" setting, DP here requires the usual notion of distributional stability when all of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show… ▽ More We study differentially private (DP) mean estimation in the case where each person holds multiple samples. Commonly referred to as the "user-level" setting, DP here requires the usual notion of distributional stability when all of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that \[n = \tilde Θ\left(\frac{d}{α^2 m} + \frac{d }{ αm^{1/2} \varepsilon} + \frac{d}{α^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\] people are necessary and sufficient to estimate the mean up to distance $α$ in $\ell_2$-norm under $\varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate DP (with slightly degraded sample complexity) and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the well known noisy-clipped-mean approach, but the analysis for our setting requires new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables, and a new argument for bounding the bias introduced by clip**. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 67 pages, 3 figures

arXiv:2405.16101 [pdf, other]

Entanglement generation in weakly-driven arrays of multilevel atoms via dipolar interactions

Authors: Sanaa Agarwal, A. Piñeiro Orioli, J. K. Thompson, A. M. Rey

Abstract: We investigate the driven-dissipative dynamics of 1D and 2D arrays of multilevel atoms interacting via dipole-dipole interactions and trapped at subwavelength scales. Here we show that in the weakly driven low excitation regime, multilevel atoms, in contrast to two-level atoms, can become strongly entangled. The entanglement manifests as the growth of collective spin-waves in the ground state mani… ▽ More We investigate the driven-dissipative dynamics of 1D and 2D arrays of multilevel atoms interacting via dipole-dipole interactions and trapped at subwavelength scales. Here we show that in the weakly driven low excitation regime, multilevel atoms, in contrast to two-level atoms, can become strongly entangled. The entanglement manifests as the growth of collective spin-waves in the ground state manifold, and survives even after turning off the drive. We propose to use the $\sim 2.9~μ$m transition between $\rm ^3{\rm P}_2 \leftrightarrow \, ^3{\rm D}_3$ in $\rm ^{88}Sr$ with $\rm 389~nm$ trap** light as an ideal experimental platform for validating our predictions and as a novel quantum interface for the exploration of complex many-body phenomena emerging from light-matter interactions. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 7+11 pages, 4+14 figures

arXiv:2405.15947 [pdf, other]

Mitigating scattering in a quantum system using only an integrating sphere

Authors: Zhenfei Jiang, Tian Li, Matthew L. Boone, Zhenhuan Yi, Alexei V. Sokolov, Girish S. Agarwal, Marlan O. Scully

Abstract: Strong quantum-correlated sources are essential but delicate resources for quantum information science and engineering protocols. Decoherence and loss are the two main disruptive processes that lead to the loss of nonclassical behavior in quantum correlations. In quantum systems, scattering can contribute to both decoherence and loss. In this work, we present an experimental scheme capable of sign… ▽ More Strong quantum-correlated sources are essential but delicate resources for quantum information science and engineering protocols. Decoherence and loss are the two main disruptive processes that lead to the loss of nonclassical behavior in quantum correlations. In quantum systems, scattering can contribute to both decoherence and loss. In this work, we present an experimental scheme capable of significantly mitigating the adverse impact of scattering in quantum systems. Our quantum system is composed of a two-mode squeezed light generated with the four-wave mixing process in hot rubidium vapor, and a scatterer is introduced to one of the two modes. An integrating sphere is then placed after the scatterer to recollect the scattered photons. We use mutual information between the two modes as the measure of quantum correlations, and demonstrate a 47.5% mutual information recovery from scattering, despite an enormous photon loss of greater than 85%. Our scheme is a pioneering step towards recovering quantum correlations from disruptive random processes, thus has the potential to bridge the gap between proof-of-principle demonstrations and practical real-world deployments of quantum protocols. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 7 pages, 4 figures

arXiv:2405.15152 [pdf, other]

Machine Unlearning in Large Language Models

Authors: Saaketh Koundinya Gundavarapu, Shreya Agarwal, Arushi Arora, Chandana Thimmalapura Jagadeeshaiah

Abstract: Machine unlearning, a novel area within artificial intelligence, focuses on addressing the challenge of selectively forgetting or reducing undesirable knowledge or behaviors in machine learning models, particularly in the context of large language models (LLMs). This paper introduces a methodology to align LLMs, such as Open Pre-trained Transformer Language Models, with ethical, privacy, and safet… ▽ More Machine unlearning, a novel area within artificial intelligence, focuses on addressing the challenge of selectively forgetting or reducing undesirable knowledge or behaviors in machine learning models, particularly in the context of large language models (LLMs). This paper introduces a methodology to align LLMs, such as Open Pre-trained Transformer Language Models, with ethical, privacy, and safety standards by leveraging the gradient ascent algorithm for knowledge unlearning. Our approach aims to selectively erase or modify learned information in LLMs, targeting harmful responses and copyrighted content. This paper presents a dual-pronged approach to enhance the ethical and safe behavior of large language models (LLMs) by addressing the issues of harmful responses and copyrighted content. To mitigate harmful responses, we applied gradient ascent on the PKU dataset, achieving a 75\% reduction in harmful responses for Open Pre-trained Transformer Language Models (OPT1.3b and OPT2.7b) \citet{zhang2022opt} while retaining previous knowledge using the TruthfulQA dataset \citet{DBLP:journals/corr/abs-2109-07958}. For handling copyrighted content, we constructed a custom dataset based on the Lord of the Rings corpus and aligned LLMs (OPT1.3b and OPT2.7b) \citet{zhang2022opt} through LoRA: Low-Rank Adaptation of Large Language Models \citet{DBLP:journals/corr/abs-2106-09685} finetuning. Subsequently, we employed gradient ascent to unlearn the Lord of the Rings content, resulting in a remarkable reduction in the presence of copyrighted material. To maintain a diverse knowledge base, we utilized the Book Corpus dataset. Additionally, we propose a new evaluation technique for assessing the effectiveness of harmful unlearning. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 10 pages

arXiv:2405.12433 [pdf, other]

LLM+Reasoning+Planning for supporting incomplete user queries in presence of APIs

Authors: Sudhir Agarwal, Anu Sreepathy, David H. Alonso, Prarit Lamba

Abstract: Recent availability of Large Language Models (LLMs) has led to the development of numerous LLM-based approaches aimed at providing natural language interfaces for various end-user tasks. These end-user tasks in turn can typically be accomplished by orchestrating a given set of APIs. In practice, natural language task requests (user queries) are often incomplete, i.e., they may not contain all the… ▽ More Recent availability of Large Language Models (LLMs) has led to the development of numerous LLM-based approaches aimed at providing natural language interfaces for various end-user tasks. These end-user tasks in turn can typically be accomplished by orchestrating a given set of APIs. In practice, natural language task requests (user queries) are often incomplete, i.e., they may not contain all the information required by the APIs. While LLMs excel at natural language processing (NLP) tasks, they frequently hallucinate on missing information or struggle with orchestrating the APIs. The key idea behind our proposed approach is to leverage logical reasoning and classical AI planning along with an LLM for accurately answering user queries including identification and gathering of any missing information in these queries. Our approach uses an LLM and ASP (Answer Set Programming) solver to translate a user query to a representation in Planning Domain Definition Language (PDDL) via an intermediate representation in ASP. We introduce a special API "get_info_api" for gathering missing information. We model all the APIs as PDDL actions in a way that supports dataflow between the APIs. Our approach then uses a classical AI planner to generate an orchestration of API calls (including calls to get_info_api) to answer the user query. Our evaluation results show that our approach significantly outperforms a pure LLM based approach by achieving over 95\% success rate in most cases on a dataset containing complete and incomplete single goal and multi-goal queries where the multi-goal queries may or may not require dataflow among the APIs. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 9 pages main content, 2 pages references, 12 pages appendix, 5 figures

arXiv:2405.12102 [pdf, other]

Collective Quantum Entanglement in Molecular Cavity Optomechanics

Authors: Jian Huang, Dangyuan Lei, Girish S. Agarwal, Zhedong Zhang

Abstract: We propose an optomechanical scheme for reaching quantum entanglement in vibration polaritons. The system involves $N$ molecules, whose vibrations can be fairly entangled with plasmonic cavities. We find that the vibration-photon entanglement can exist at room temperature and is robust against thermal noise. We further demonstrate the quantum entanglement between the vibrational modes through the… ▽ More We propose an optomechanical scheme for reaching quantum entanglement in vibration polaritons. The system involves $N$ molecules, whose vibrations can be fairly entangled with plasmonic cavities. We find that the vibration-photon entanglement can exist at room temperature and is robust against thermal noise. We further demonstrate the quantum entanglement between the vibrational modes through the plasmonic cavities, which shows a delocalized nature and an incredible enhancement with the number of molecules. The underlying mechanism for the entanglement is attributed to the strong vibration-cavity coupling which possesses collectivity. Our results provide a molecular optomechanical scheme which offers a promising platform for the study of noise-free quantum resources and macroscopic quantum phenomena. △ Less

Submitted 25 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 10 pages, 4 figures

arXiv:2405.11346 [pdf]

Decision support system for Forest fire management using Ontology with Big Data and LLMs

Authors: Ritesh Chandra, Shashi Shekhar Kumar, Rushil Patra, Sonali Agarwal

Abstract: Forests are crucial for ecological balance, but wildfires, a major cause of forest loss, pose significant risks. Fire weather indices, which assess wildfire risk and predict resource demands, are vital. With the rise of sensor networks in fields like healthcare and environmental monitoring, semantic sensor networks are increasingly used to gather climatic data such as wind speed, temperature, and… ▽ More Forests are crucial for ecological balance, but wildfires, a major cause of forest loss, pose significant risks. Fire weather indices, which assess wildfire risk and predict resource demands, are vital. With the rise of sensor networks in fields like healthcare and environmental monitoring, semantic sensor networks are increasingly used to gather climatic data such as wind speed, temperature, and humidity. However, processing these data streams to determine fire weather indices presents challenges, underscoring the growing importance of effective forest fire detection. This paper discusses using Apache Spark for early forest fire detection, enhancing fire risk prediction with meteorological and geographical data. Building on our previous development of Semantic Sensor Network (SSN) ontologies and Semantic Web Rules Language (SWRL) for managing forest fires in Monesterial Natural Park, we expanded SWRL to improve a Decision Support System (DSS) using a Large Language Models (LLMs) and Spark framework. We implemented real-time alerts with Spark streaming, tailored to various fire scenarios, and validated our approach using ontology metrics, query-based evaluations, LLMs score precision, F1 score, and recall measures. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.11215 [pdf, other]

MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing

Authors: Siddhant Agarwal, Shivam Sharma, Preslav Nakov, Tanmoy Chakraborty

Abstract: Memes have evolved as a prevalent medium for diverse communication, ranging from humour to propaganda. With the rising popularity of image-focused content, there is a growing need to explore its potential harm from different aspects. Previous studies have analyzed memes in closed settings - detecting harm, applying semantic labels, and offering natural language explanations. To extend this researc… ▽ More Memes have evolved as a prevalent medium for diverse communication, ranging from humour to propaganda. With the rising popularity of image-focused content, there is a growing need to explore its potential harm from different aspects. Previous studies have analyzed memes in closed settings - detecting harm, applying semantic labels, and offering natural language explanations. To extend this research, we introduce MemeMQA, a multimodal question-answering framework aiming to solicit accurate responses to structured questions while providing coherent explanations. We curate MemeMQACorpus, a new dataset featuring 1,880 questions related to 1,122 memes with corresponding answer-explanation pairs. We further propose ARSENAL, a novel two-stage multimodal framework that leverages the reasoning capabilities of LLMs to address MemeMQA. We benchmark MemeMQA using competitive baselines and demonstrate its superiority - ~18% enhanced answer prediction accuracy and distinct text generation lead across various metrics measuring lexical and semantic alignment over the best baseline. We analyze ARSENAL's robustness through diversification of question-set, confounder-based evaluation regarding MemeMQA's generalizability, and modality-specific assessment, enhancing our understanding of meme interpretation in the multimodal communication landscape. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: The paper has been accepted in ACL'24 (Findings)

arXiv:2405.09612 [pdf, other]

Imprint of 'local opacity' effect in gamma-ray spectrum of blazar jet

Authors: Sushmita Agarwal, Amit Shukla, Karl Mannheim, Bhargav Vaidya, Biswajit Banerjee

Abstract: Relativistic jets from accreting supermassive black holes at cosmological distances can be powerful emitters of $γ$-rays. However, the precise mechanisms and locations responsible for the dissipation of energy within these jets, leading to observable $γ$-ray radiation, remain elusive. We detect evidence for an intrinsic absorption feature in the $γ$-ray spectrum at energies exceeding $10\,$GeV, pr… ▽ More Relativistic jets from accreting supermassive black holes at cosmological distances can be powerful emitters of $γ$-rays. However, the precise mechanisms and locations responsible for the dissipation of energy within these jets, leading to observable $γ$-ray radiation, remain elusive. We detect evidence for an intrinsic absorption feature in the $γ$-ray spectrum at energies exceeding $10\,$GeV, presumably due to the photon-photon pair production of $γ$-rays with low ionization lines at the outer edge of Broad-line region (BLR), during the high-flux state of the flat-spectrum radio quasar PKS 1424$-$418. The feature can be discriminated from the turnover at higher energies resulting from $γ$-ray absorption in the extragalactic background light. It is absent in the low-flux states supporting the interpretation that powerful dissipation events within or at the edge of the BLR evolve into fainter $γ$-ray emitting zones outside the BLR, possibly associated with the moving VLBI radio knots. The inferred location of $γ$-ray emission zone is consistent with the observed variability time scale of the brightest flare, provided that the flare is attributed to external Compton scattering with BLR photons. △ Less

Submitted 18 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

Comments: 10 pages, 3 figures, 1 table, Accepted for publication in ApJL

arXiv:2405.08015 [pdf, other]

A Methodology-Oriented Study of Catastrophic Forgetting in Incremental Deep Neural Networks

Authors: Ashutosh Kumar, Sonali Agarwal, D Jude Hemanth

Abstract: Human being and different species of animals having the skills to gather, transferring knowledge, processing, fine-tune and generating information throughout their lifetime. The ability of learning throughout their lifespan is referred as continuous learning which is using neurocognition mechanism. Consequently, in real world computational system of incremental learning autonomous agents also need… ▽ More Human being and different species of animals having the skills to gather, transferring knowledge, processing, fine-tune and generating information throughout their lifetime. The ability of learning throughout their lifespan is referred as continuous learning which is using neurocognition mechanism. Consequently, in real world computational system of incremental learning autonomous agents also needs such continuous learning mechanism which provide retrieval of information and long-term memory consolidation. However, the main challenge in artificial intelligence is that the incremental learning of the autonomous agent when new data confronted. In such scenarios, the main concern is catastrophic forgetting(CF), i.e., while learning the sequentially, neural network underfits the old data when it confronted with new data. To tackle this CF problem many numerous studied have been proposed, however it is very difficult to compare their performance due to dissimilarity in their evaluation mechanism. Here we focus on the comparison of all algorithms which are having similar type of evaluation mechanism. Here we are comparing three types of incremental learning methods: (1) Exemplar based methods, (2) Memory based methods, and (3) Network based method. In this survey paper, methodology oriented study for catastrophic forgetting in incremental deep neural network is addressed. Furthermore, it contains the mathematical overview of impact-full methods which can be help researchers to deal with CF. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.07284 [pdf]

Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP)

Authors: Saaketh Koundinya Gundavarapu, Arushi Arora, Shreya Agarwal

Abstract: We present SLIP (SAM+CLIP), an enhanced architecture for zero-shot object segmentation. SLIP combines the Segment Anything Model (SAM) \cite{kirillov2023segment} with the Contrastive Language-Image Pretraining (CLIP) \cite{radford2021learning}. By incorporating text prompts into SAM using CLIP, SLIP enables object segmentation without prior training on specific classes or categories. We fine-tune… ▽ More We present SLIP (SAM+CLIP), an enhanced architecture for zero-shot object segmentation. SLIP combines the Segment Anything Model (SAM) \cite{kirillov2023segment} with the Contrastive Language-Image Pretraining (CLIP) \cite{radford2021learning}. By incorporating text prompts into SAM using CLIP, SLIP enables object segmentation without prior training on specific classes or categories. We fine-tune CLIP on a Pokemon dataset, allowing it to learn meaningful image-text representations. SLIP demonstrates the ability to recognize and segment objects in images based on contextual information from text prompts, expanding the capabilities of SAM for versatile object segmentation. Our experiments demonstrate the effectiveness of the SLIP architecture in segmenting objects in images based on textual cues. The integration of CLIP's text-image understanding capabilities into SAM expands the capabilities of the original architecture and enables more versatile and context-aware object segmentation. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 5 pages, 3 figures

arXiv:2405.05658 [pdf]

Artificial intelligence for abnormality detection in high volume neuroimaging: a systematic review and meta-analysis

Authors: Siddharth Agarwal, David A. Wood, Mariusz Grzeda, Chandhini Suresh, Munaib Din, James Cole, Marc Modat, Thomas C Booth

Abstract: Purpose: Most studies evaluating artificial intelligence (AI) models that detect abnormalities in neuroimaging are either tested on unrepresentative patient cohorts or are insufficiently well-validated, leading to poor generalisability to real-world tasks. The aim was to determine the diagnostic test accuracy and summarise the evidence supporting the use of AI models performing first-line, high-vo… ▽ More Purpose: Most studies evaluating artificial intelligence (AI) models that detect abnormalities in neuroimaging are either tested on unrepresentative patient cohorts or are insufficiently well-validated, leading to poor generalisability to real-world tasks. The aim was to determine the diagnostic test accuracy and summarise the evidence supporting the use of AI models performing first-line, high-volume neuroimaging tasks. Methods: Medline, Embase, Cochrane library and Web of Science were searched until September 2021 for studies that temporally or externally validated AI capable of detecting abnormalities in first-line CT or MR neuroimaging. A bivariate random-effects model was used for meta-analysis where appropriate. PROSPERO: CRD42021269563. Results: Only 16 studies were eligible for inclusion. Included studies were not compromised by unrepresentative datasets or inadequate validation methodology. Direct comparison with radiologists was available in 4/16 studies. 15/16 had a high risk of bias. Meta-analysis was only suitable for intracranial haemorrhage detection in CT imaging (10/16 studies), where AI systems had a pooled sensitivity and specificity 0.90 (95% CI 0.85 - 0.94) and 0.90 (95% CI 0.83 - 0.95) respectively. Other AI studies using CT and MRI detected target conditions other than haemorrhage (2/16), or multiple target conditions (4/16). Only 3/16 studies implemented AI in clinical pathways, either for pre-read triage or as post-read discrepancy identifiers. Conclusion: The paucity of eligible studies reflects that most abnormality detection AI studies were not adequately validated in representative clinical cohorts. The few studies describing how abnormality detection AI could impact patients and clinicians did not explore the full ramifications of clinical implementation. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05647 [pdf]

Letter to the Editor: What are the legal and ethical considerations of submitting radiology reports to ChatGPT?

Authors: Siddharth Agarwal, David Wood, Robin Carpenter, Yiran Wei, Marc Modat, Thomas C Booth

Abstract: This letter critically examines the recent article by Infante et al. assessing the utility of large language models (LLMs) like GPT-4, Perplexity, and Bard in identifying urgent findings in emergency radiology reports. While acknowledging the potential of LLMs in generating labels for computer vision, concerns are raised about the ethical implications of using patient data without explicit approva… ▽ More This letter critically examines the recent article by Infante et al. assessing the utility of large language models (LLMs) like GPT-4, Perplexity, and Bard in identifying urgent findings in emergency radiology reports. While acknowledging the potential of LLMs in generating labels for computer vision, concerns are raised about the ethical implications of using patient data without explicit approval, highlighting the necessity of stringent data protection measures under GDPR. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.03113 [pdf, other]

Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

Authors: Caleb Chuck, Carl Qi, Michael J. Munje, Shuozhe Li, Max Rudolph, Chang Shi, Siddhant Agarwal, Harshit Sikchi, Abhinav Peri, Sarthak Dayal, Evan Kuo, Kavan Mehta, Anthony Wang, Peter Stone, Amy Zhang, Scott Niekum

Abstract: Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like… ▽ More Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like reaching, to challenging ones like pushing a block by hitting it with a puck, as well as goal-based and human-interactive tasks, our testbed allows a varied assessment of RL capabilities. The robot air hockey testbed also supports sim-to-real transfer with three domains: two simulators of increasing fidelity and a real robot system. Using a dataset of demonstration data gathered through two teleoperation systems: a virtualized control environment, and human shadowing, we assess the testbed with behavior cloning, offline RL, and RL from scratch. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.02782 [pdf]

A self-supervised text-vision framework for automated brain abnormality detection

Authors: David A. Wood, Emily Guilhem, Sina Kafiabadi, Ayisha Al Busaidi, Kishan Dissanayake, Ahmed Hammam, Nina Mansoor, Matthew Townend, Siddharth Agarwal, Yiran Wei, Asif Mazumder, Gareth J. Barker, Peter Sasieni, Sebastien Ourselin, James H. Cole, Thomas C. Booth

Abstract: Artificial neural networks trained on large, expert-labelled datasets are considered state-of-the-art for a range of medical image recognition tasks. However, categorically labelled datasets are time-consuming to generate and constrain classification to a pre-defined, fixed set of classes. For neuroradiological applications in particular, this represents a barrier to clinical adoption. To address… ▽ More Artificial neural networks trained on large, expert-labelled datasets are considered state-of-the-art for a range of medical image recognition tasks. However, categorically labelled datasets are time-consuming to generate and constrain classification to a pre-defined, fixed set of classes. For neuroradiological applications in particular, this represents a barrier to clinical adoption. To address these challenges, we present a self-supervised text-vision framework that learns to detect clinically relevant abnormalities in brain MRI scans by directly leveraging the rich information contained in accompanying free-text neuroradiology reports. Our training approach consisted of two-steps. First, a dedicated neuroradiological language model - NeuroBERT - was trained to generate fixed-dimensional vector representations of neuroradiology reports (N = 50,523) via domain-specific self-supervised learning tasks. Next, convolutional neural networks (one per MRI sequence) learnt to map individual brain scans to their corresponding text vector representations by optimising a mean square error loss. Once trained, our text-vision framework can be used to detect abnormalities in unreported brain MRI examinations by scoring scans against suitable query sentences (e.g., 'there is an acute stroke', 'there is hydrocephalus' etc.), enabling a range of classification-based applications including automated triage. Potentially, our framework could also serve as a clinical decision support tool, not only by suggesting findings to radiologists and detecting errors in provisional reports, but also by retrieving and displaying examples of pathologies from historical examinations that could be relevant to the current case based on textual descriptors. △ Less

Submitted 11 June, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

Comments: Under Review

arXiv:2404.17910 [pdf, other]

doi 10.1109/CVPRW59228.2023.00526

Reliable Student: Addressing Noise in Semi-Supervised 3D Object Detection

Authors: Farzad Nozarian, Shashank Agarwal, Farzaneh Rezaeianaran, Danish Shahzad, Atanas Poibrenski, Christian Müller, Philipp Slusallek

Abstract: Semi-supervised 3D object detection can benefit from the promising pseudo-labeling technique when labeled data is limited. However, recent approaches have overlooked the impact of noisy pseudo-labels during training, despite efforts to enhance pseudo-label quality through confidence-based filtering. In this paper, we examine the impact of noisy pseudo-labels on IoU-based target assignment and prop… ▽ More Semi-supervised 3D object detection can benefit from the promising pseudo-labeling technique when labeled data is limited. However, recent approaches have overlooked the impact of noisy pseudo-labels during training, despite efforts to enhance pseudo-label quality through confidence-based filtering. In this paper, we examine the impact of noisy pseudo-labels on IoU-based target assignment and propose the Reliable Student framework, which incorporates two complementary approaches to mitigate errors. First, it involves a class-aware target assignment strategy that reduces false negative assignments in difficult classes. Second, it includes a reliability weighting strategy that suppresses false positive assignment errors while also addressing remaining false negatives from the first step. The reliability weights are determined by querying the teacher network for confidence scores of the student-generated proposals. Our work surpasses the previous state-of-the-art on KITTI 3D object detection benchmark on point clouds in the semi-supervised setting. On 1% labeled data, our approach achieves a 6.2% AP improvement for the pedestrian class, despite having only 37 labeled samples available. The improvements become significant for the 2% setting, achieving 6.0% AP and 5.7% AP improvements for the pedestrian and cyclist classes, respectively. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: Accepted at CVPR Workshop L3D-IVU 2023. Code: https://github.com/fnozarian/ReliableStudent

arXiv:2404.17510 [pdf, other]

Kerr Nonlinearity Induced Nonreciprocity in dissipatively coupled resonators

Authors: Qingtian Miao, G. S. Agarwal

Abstract: Nonlinearity induced nonreciprocity is studied in a system comprising two resonators coupled to a one-dimensional waveguide when the linear system does not exhibit nonreciprocity. The analysis is based on the Hamiltonian of the coupled system and includes the dissipative coupling between the waveguide and resonators, along with the input-output relations. We consider a large number of scenarios wh… ▽ More Nonlinearity induced nonreciprocity is studied in a system comprising two resonators coupled to a one-dimensional waveguide when the linear system does not exhibit nonreciprocity. The analysis is based on the Hamiltonian of the coupled system and includes the dissipative coupling between the waveguide and resonators, along with the input-output relations. We consider a large number of scenarios which can lead to nonreciprocity. We pay special attention to the case when the linear system does not exhibit nonreciprocal behavior. In this case, we show how very significant nonreciprocal behavior can result from Kerr nonlinearities. We find that the bistability of the nonlinear system can aid in achieving large nonreciprocity. Additionally, We bring out nonreciprocity in the excitation of each resonator, which can be monitored independently. Our results highlight the profound influence of nonlinearity on nonreciprocal behavior, offering a new avenue for controlling light propagation in integrated photonic circuits. Nonlinearity induced nonreciprocity would lead to significant nonreciprocity in quantum fluctuations when our system is treated quantum mechanically. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16710 [pdf, other]

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Authors: Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, Ahmed A Aly, Beidi Chen, Carole-Jean Wu

Abstract: We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exi… ▽ More We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exit at earlier layers, without adding any auxiliary layers or modules to the model. Third, we present a novel self-speculative decoding solution where we exit at early layers and verify and correct with remaining layers of the model. Our proposed self-speculative decoding approach has less memory footprint than other speculative decoding approaches and benefits from shared compute and activations of the draft and verification stages. We run experiments on different Llama model sizes on different types of training: pretraining from scratch, continual pretraining, finetuning on specific data domain, and finetuning on specific task. We implement our inference solution and show speedups of up to 2.16x on summarization for CNN/DM documents, 1.82x on coding, and 2.0x on TOPv2 semantic parsing task. △ Less

Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: Code open sourcing is in progress

arXiv:2404.14995 [pdf, other]

Solar flare observations with the Radio Neutrino Observatory Greenland (RNO-G)

Authors: S. Agarwal, J. A. Aguilar, S. Ali, P. Allison, M. Betts, D. Besson, A. Bishop, O. Botner, S. Bouma, S. Buitink, M. Cataldo, B. A. Clark, A. Coleman, K. Couberly, S. de Kockere, K. D. de Vries, C. Deaconu, M. A. DuVernois, C. Glaser, T. Glüsenkamp, A. Hallgren, S. Hallmann, J. C. Hanson, B. Hendricks, J. Henrichs , et al. (47 additional authors not shown)

Abstract: The science program of the Radio Neutrino Observatory-Greenland (RNO-G) extends beyond particle astrophysics to include radioglaciology and, as we show herein, solar physics, as well. Impulsive solar flare observations not only permit direct measurements of light curves, spectral content, and polarization on time scales significantly shorter than most extant dedicated solar observatories, but also… ▽ More The science program of the Radio Neutrino Observatory-Greenland (RNO-G) extends beyond particle astrophysics to include radioglaciology and, as we show herein, solar physics, as well. Impulsive solar flare observations not only permit direct measurements of light curves, spectral content, and polarization on time scales significantly shorter than most extant dedicated solar observatories, but also offer an extremely useful above-surface calibration source, with pointing precision of order tens of arc-minutes. Using the early RNO-G data from 2022-2023, observed flare characteristics are compared to well-established solar observatories. Also, a number of individual flares are used to highlight angular reconstruction and calibration methods. RNO-G observes signal excesses during solar flares reported by the solar-observing Callisto network and in coincidence with about 60% of the brightest excesses recorded by the SWAVES satellite, when the Sun is above the horizon for RNO-G. In these observed flares, there is significant impulsivity in the time-domain. In addition, the solar flares are used to calibrate the RNO-G absolute pointing on the radio signal arrival direction to sub-degree resolution. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.04392 [pdf, other]

Increased LLM Vulnerabilities from Fine-tuning and Quantization

Authors: Divyanshu Kumar, Anurakt Kumar, Sahil Agarwal, Prashanth Harshangi

Abstract: Large Language Models (LLMs) have become very popular and have found use cases in many domains, such as chatbots, auto-task completion agents, and much more. However, LLMs are vulnerable to different types of attacks, such as jailbreaking, prompt injection attacks, and privacy leakage attacks. Foundational LLMs undergo adversarial and alignment training to learn not to generate malicious and toxic… ▽ More Large Language Models (LLMs) have become very popular and have found use cases in many domains, such as chatbots, auto-task completion agents, and much more. However, LLMs are vulnerable to different types of attacks, such as jailbreaking, prompt injection attacks, and privacy leakage attacks. Foundational LLMs undergo adversarial and alignment training to learn not to generate malicious and toxic content. For specialized use cases, these foundational LLMs are subjected to fine-tuning or quantization for better performance and efficiency. We examine the impact of downstream tasks such as fine-tuning and quantization on LLM vulnerability. We test foundation models like Mistral, Llama, MosaicML, and their fine-tuned versions. Our research shows that fine-tuning and quantization reduces jailbreak resistance significantly, leading to increased LLM vulnerabilities. Finally, we demonstrate the utility of external guardrails in reducing LLM vulnerabilities. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.02912 [pdf, ps, other]

Probabilistic Generating Circuits -- Demystified

Authors: Sanyam Agarwal, Markus Bläser

Abstract: Zhang et al. (ICML 2021, PLMR 139, pp. 12447-1245) introduced probabilistic generating circuits (PGCs) as a probabilistic model to unify probabilistic circuits (PCs) and determinantal point processes (DPPs). At a first glance, PGCs store a distribution in a very different way, they compute the probability generating polynomial instead of the probability mass function and it seems that this is the… ▽ More Zhang et al. (ICML 2021, PLMR 139, pp. 12447-1245) introduced probabilistic generating circuits (PGCs) as a probabilistic model to unify probabilistic circuits (PCs) and determinantal point processes (DPPs). At a first glance, PGCs store a distribution in a very different way, they compute the probability generating polynomial instead of the probability mass function and it seems that this is the main reason why PGCs are more powerful than PCs or DPPs. However, PGCs also allow for negative weights, whereas classical PCs assume that all weights are nonnegative. One of the main insights of our paper is that the negative weights are responsible for the power of PGCs and not the different representation. PGCs are PCs in disguise, in particular, we show how to transform any PGC into a PC with negative weights with only polynomial blowup. PGCs were defined by Zhang et al. only for binary random variables. As our second main result, we show that there is a good reason for this: we prove that PGCs for categorial variables with larger image size do not support tractable marginalization unless NP = P. On the other hand, we show that we can model categorial variables with larger image size as PC with negative weights computing set-multilinear polynomials. These allow for tractable marginalization. In this sense, PCs with negative weights strictly subsume PGCs. △ Less

Submitted 4 March, 2024; originally announced April 2024.

arXiv:2404.02139 [pdf, other]

Lensed Type Ia Supernova "Encore" at z=2: The First Instance of Two Multiply-Imaged Supernovae in the Same Host Galaxy

Authors: J. D. R. Pierel, A. B. Newman, S. Dhawan, M. Gu, B. A. Joshi, T. Li, S. Schuldt, L. G. Strolger, S. H. Suyu, G. B. Caminha, S. H. Cohen, J. M. Diego, J. C. J. Dsilva, S. Ertl, B. L. Frye, G. Granata, C. Grillo, A. M. Koekemoer, J. Li, A. Robotham, J. Summers, T. Treu, R. A. Windhorst, A. Zitrin, S. Agarwal , et al. (38 additional authors not shown)

Abstract: A bright ($m_{\rm F150W,AB}$=24 mag), $z=1.95$ supernova (SN) candidate was discovered in JWST/NIRCam imaging acquired on 2023 November 17. The SN is quintuply-imaged as a result of strong gravitational lensing by a foreground galaxy cluster, detected in three locations, and remarkably is the second lensed SN found in the same host galaxy. The previous lensed SN was called "Requiem", and therefore… ▽ More A bright ($m_{\rm F150W,AB}$=24 mag), $z=1.95$ supernova (SN) candidate was discovered in JWST/NIRCam imaging acquired on 2023 November 17. The SN is quintuply-imaged as a result of strong gravitational lensing by a foreground galaxy cluster, detected in three locations, and remarkably is the second lensed SN found in the same host galaxy. The previous lensed SN was called "Requiem", and therefore the new SN is named "Encore". This makes the MACS J0138.0$-$2155 cluster the first known system to produce more than one multiply-imaged SN. Moreover, both SN Requiem and SN Encore are Type Ia SNe (SNe Ia), making this the most distant case of a galaxy hosting two SNe Ia. Using parametric host fitting, we determine the probability of detecting two SNe Ia in this host galaxy over a $\sim10$ year window to be $\approx3\%$. These observations have the potential to yield a Hubble Constant ($H_0$) measurement with $\sim10\%$ precision, only the third lensed SN capable of such a result, using the three visible images of the SN. Both SN Requiem and SN Encore have a fourth image that is expected to appear within a few years of $\sim2030$, providing an unprecedented baseline for time-delay cosmography. △ Less

Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: Submitted to ApJL

arXiv:2404.00871 [pdf, other]

Quantum Metrology of Absorption and Gain Parameters using Two-Mode Bright Squeezed Light

Authors: Mrunal Kamble, Jiaxuan Wang, Girish S. Agarwal

Abstract: Absorption and gain processes are fundamental to any light-matter interaction and a precise measurement of these parameters is important for various scientific and technological applications. Quantum probes, specifically the squeezed states have proved very successful, particularly in the applications that deal with phase shift and force measurements. In this paper, we focus on improving the sensi… ▽ More Absorption and gain processes are fundamental to any light-matter interaction and a precise measurement of these parameters is important for various scientific and technological applications. Quantum probes, specifically the squeezed states have proved very successful, particularly in the applications that deal with phase shift and force measurements. In this paper, we focus on improving the sensitivity of the estimation of the photon loss coefficient of a weakly absorbing medium as well as the estimation of the gain parameter using a two-mode bright squeezed state. The generation of this state combines the advantage of a coherent beam for its large photon number with the quantum properties of the two-mode squeezing operation in an optical parametric amplifier. We present two measurement schemes: balanced photodetection and time-reversed metrology, both utilizing two-mode bright squeezed light. The maximum quantum advantage we can achieve using two-mode bright squeezed light is 3.7 times for the absorption parameter $α= 0.05$ and 8.4 times for $α= 0.01$ as compared to using only the coherent state. Similarly, the maximum quantum advantage for the estimation of optical gain is found around 2.81 times for the gain coefficient $G=1.05$ and around 6.28 times for $G=1.01$. We discuss the significance of using one measurement scheme over the other under different squeezing conditions. We compare our results with the Cramér-Rao bound for a two-mode bright squeezed state to assess the quality of the proposed methodologies. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: 9 pages, 4 figures

arXiv:2403.15556 [pdf, other]

Directional superradiance in a driven ultracold atomic gas in free-space

Authors: Sanaa Agarwal, Edwin Chaparro, Diego Barberena, A. Piñeiro Orioli, G. Ferioli, S. Pancaldi, I. Ferrier-Barbut, A. Browaeys, A. M. Rey

Abstract: Ultra-cold atomic systems are among the most promising platforms that have the potential to shed light on the complex behavior of many-body quantum systems. One prominent example is the case of a dense ensemble illuminated by a strong coherent drive while interacting via dipole-dipole interactions. Despite being subjected to intense investigations, this system retains many open questions. A recent… ▽ More Ultra-cold atomic systems are among the most promising platforms that have the potential to shed light on the complex behavior of many-body quantum systems. One prominent example is the case of a dense ensemble illuminated by a strong coherent drive while interacting via dipole-dipole interactions. Despite being subjected to intense investigations, this system retains many open questions. A recent experiment carried out in a pencil-shaped geometry reported measurements that seemed consistent with the emergence of strong collective effects in the form of a ``superradiant'' phase transition in free space, when looking at the light emission properties in the forward direction. Motivated by the experimental observations, we carry out a systematic theoretical analysis of the system's steady-state properties as a function of the driving strength and atom number, $N$. We observe signatures of collective effects in the weak drive regime, which disappear with increasing drive strength as the system evolves into a single-particle-like mixed state comprised of randomly aligned dipoles. Although the steady-state features some similarities to the reported superradiant to normal non-equilibrium transition, also known as cooperative resonance fluorescence, we observe significant qualitative and quantitative differences, including a different scaling of the critical drive parameter (from $N$ to $\sqrt{N}$). We validate the applicability of a mean-field treatment to capture the steady-state dynamics under currently accessible conditions. Furthermore, we develop a simple theoretical model that explains the scaling properties by accounting for interaction-induced inhomogeneous effects and spontaneous emission, which are intrinsic features of interacting disordered arrays in free space. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 25 pages, 19 figures

arXiv:2403.14701 [pdf]

Rule based Complex Event Processing for an Air Quality Monitoring System in Smart City

Authors: Shashi Shekhar Kumar, Ritesh Chandra, Sonali Agarwal

Abstract: In recent years, smart city-based development has gained momentum due to its versatile nature in architecture and planning for the systematic habitation of human beings. According to World Health Organization (WHO) report, air pollution causes serious respiratory diseases. Hence, it becomes necessary to real-time monitoring of air quality to minimize effect by taking time-bound decisions by the st… ▽ More In recent years, smart city-based development has gained momentum due to its versatile nature in architecture and planning for the systematic habitation of human beings. According to World Health Organization (WHO) report, air pollution causes serious respiratory diseases. Hence, it becomes necessary to real-time monitoring of air quality to minimize effect by taking time-bound decisions by the stakeholders. The air pollution comprises various compositions such as NH3, O3, SO2, NO2, etc., and their concentrations vary from location to location.The research work proposes an integrated framework for monitoring air quality using rule-based Complex Event Processing (CEP) and SPARQL queries. CEP works with the data stream based on predefined rules to detect the complex pattern, which helps in decision support for stakeholders. Initially, the dataset was collected from the Central Pollution Control Board (CPCB) of India and this data was then preprocessed and passed through Apache Kafka. Then a knowledge graph developed based on the air quality paradigm. Consequently, convert preprocessed data into Resource Description Framework (RDF) data, and integrate with Knowledge graph which is ingested to CEP engine using Apache Jena for enhancing the decision support . Simultaneously, rules are extracted using a decision tree, and some ground truth parameters of CPCB are added and ingested to the CEP engine to determine the complex patterns. Consequently, the SPARQL query is used on real-time RDF dataset for fetching the condition of air quality as good, poor, severe, hazardous etc based on complex events detection. For validating the proposed approach various chunks of RDF are used for the deployment of events to the CEP engine, and its performance is examined over time while performing simple and complex queries. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10431 [pdf, other]

Spatial characterization of debris ejection from the interaction of a tightly focused PW-laser pulse with metal targets

Authors: I. -M. Vladisavlevici, C. Vlachos, J. -L. Dubois, A. Huerta, S. Agarwal, H. Ahmed, J. I. Apiñaniz, M. Cernaianu, M. Gugiu, M. Krupka, R. Lera, A. Morabito, D. Sangwan, D. Ursescu, A. Curcio, N. Fefeu, J. A. Pérez-Hernández, T. Vacek, P. Vicente, N. Woolsey, G. Gatti, M. D. Rodríguez-Frías, J. J. Santos, P. W. Bradford, M. Ehret

Abstract: We present a novel scheme for rapid quantitative analysis of debris generated during experiments with solid targets following relativistic laser-plasma interaction at high-power laser facilities. Experimental data indicates that predictions by available modelling for non-mass-limited targets are reasonable, with debris on the order of hundreds ug-per-shot. We detect for the first time that several… ▽ More We present a novel scheme for rapid quantitative analysis of debris generated during experiments with solid targets following relativistic laser-plasma interaction at high-power laser facilities. Experimental data indicates that predictions by available modelling for non-mass-limited targets are reasonable, with debris on the order of hundreds ug-per-shot. We detect for the first time that several % of the debris is ejected directional following the target normal (rear- and interaction side); and confirm previous work that found the debris ejection in direction of the interaction side to be larger than on the side of the target rear. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.09914 [pdf, other]

ProMark: Proactive Diffusion Watermarking for Causal Attribution

Authors: Vishal Asnani, John Collomosse, Tu Bui, Xiaoming Liu, Shruti Agarwal

Abstract: Generative AI (GenAI) is transforming creative workflows through the capability to synthesize and manipulate images via high-level prompts. Yet creatives are not well supported to receive recognition or reward for the use of their content in GenAI training. To this end, we propose ProMark, a causal attribution technique to attribute a synthetically generated image to its training data concepts lik… ▽ More Generative AI (GenAI) is transforming creative workflows through the capability to synthesize and manipulate images via high-level prompts. Yet creatives are not well supported to receive recognition or reward for the use of their content in GenAI training. To this end, we propose ProMark, a causal attribution technique to attribute a synthetically generated image to its training data concepts like objects, motifs, templates, artists, or styles. The concept information is proactively embedded into the input training images using imperceptible watermarks, and the diffusion models (unconditional or conditional) are trained to retain the corresponding watermarks in generated images. We show that we can embed as many as $2^{16}$ unique watermarks into the training data, and each training image can contain more than one watermark. ProMark can maintain image quality whilst outperforming correlation-based attribution. Finally, several qualitative examples are presented, providing the confidence that the presence of the watermark conveys a causative relationship between training data and synthetic images. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.08058 [pdf, other]

CHAI: Clustered Head Attention for Efficient LLM Inference

Authors: Saurabh Agarwal, Bilge Acun, Basil Hosmer, Mostafa Elhoushi, Ye** Lee, Shivaram Venkataraman, Dimitris Papailiopoulos, Carole-Jean Wu

Abstract: Large Language Models (LLMs) with hundreds of billions of parameters have transformed the field of machine learning. However, serving these models at inference time is both compute and memory intensive, where a single request can require multiple GPUs and tens of Gigabytes of memory. Multi-Head Attention is one of the key components of LLMs, which can account for over 50% of LLMs memory and comput… ▽ More Large Language Models (LLMs) with hundreds of billions of parameters have transformed the field of machine learning. However, serving these models at inference time is both compute and memory intensive, where a single request can require multiple GPUs and tens of Gigabytes of memory. Multi-Head Attention is one of the key components of LLMs, which can account for over 50% of LLMs memory and compute requirement. We observe that there is a high amount of redundancy across heads on which tokens they pay attention to. Based on this insight, we propose Clustered Head Attention (CHAI). CHAI combines heads with a high amount of correlation for self-attention at runtime, thus reducing both memory and compute. In our experiments, we show that CHAI is able to reduce the memory requirements for storing K,V cache by up to 21.4% and inference time latency by up to 1.73x without any fine-tuning required. CHAI achieves this with a maximum 3.2% deviation in accuracy across 3 different models (i.e. OPT-66B, LLAMA-7B, LLAMA-33B) and 5 different evaluation datasets. △ Less

Submitted 27 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.08043 [pdf, other]

Authorship Style Transfer with Policy Optimization

Authors: Shuai Liu, Shantanu Agarwal, Jonathan May

Abstract: Authorship style transfer aims to rewrite a given text into a specified target while preserving the original meaning in the source. Existing approaches rely on the availability of a large number of target style exemplars for model training. However, these overlook cases where a limited number of target style examples are available. The development of parameter-efficient transfer learning technique… ▽ More Authorship style transfer aims to rewrite a given text into a specified target while preserving the original meaning in the source. Existing approaches rely on the availability of a large number of target style exemplars for model training. However, these overlook cases where a limited number of target style examples are available. The development of parameter-efficient transfer learning techniques and policy optimization (PO) approaches suggest lightweight PO is a feasible approach to low-resource style transfer. In this work, we propose a simple two step tune-and-optimize technique for low-resource textual style transfer. We apply our technique to authorship transfer as well as a larger-data native language style task and in both cases find it outperforms state-of-the-art baseline models. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.06938 [pdf, other]

TCAM-SSD: A Framework for Search-Based Computing in Solid-State Drives

Authors: Ryan Wong, Nikita Kim, Kevin Higgs, Sapan Agarwal, Engin Ipek, Saugata Ghose, Ben Feinberg

Abstract: As the amount of data produced in society continues to grow at an exponential rate, modern applications are incurring significant performance and energy penalties due to high data movement between the CPU and memory/storage. While processing in main memory can alleviate these penalties, it is becoming increasingly difficult to keep large datasets entirely in main memory. This has led to a recent p… ▽ More As the amount of data produced in society continues to grow at an exponential rate, modern applications are incurring significant performance and energy penalties due to high data movement between the CPU and memory/storage. While processing in main memory can alleviate these penalties, it is becoming increasingly difficult to keep large datasets entirely in main memory. This has led to a recent push for in-storage computation, where processing is performed inside the storage device. We propose TCAM-SSD, a new framework for search-based computation inside the NAND flash memory arrays of a conventional solid-state drive (SSD), which requires lightweight modifications to only the array periphery and firmware. TCAM-SSD introduces a search manager and link table, which can logically partition the NAND flash memory's contents into search-enabled regions and standard storage regions. Together, these light firmware changes enable TCAM-SSD to seamlessly handle block I/O operations, in addition to new search operations, thereby reducing end-to-end execution time and total data movement. We provide an NVMe-compatible interface that provides programmers with the ability to dynamically allocate data on and make use of TCAM-SSD, allowing the system to be leveraged by a wide variety of applications. We evaluate three example use cases of TCAM-SSD to demonstrate its benefits. For transactional databases, TCAM-SSD can mitigate the performance penalties for applications with large datasets, achieving a 60.9% speedup over a conventional system that retrieves data from the SSD and computes using the CPU. For database analytics, TCAM-SSD provides an average speedup of 17.7x over a conventional system for a collection of analytical queries. For graph analytics, we combine TCAM-SSD's associative search with a sparse data structure, speeding up graph computing for larger-than-memory datasets by 14.5%. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05513 [pdf, other]

A Detection and Filtering Framework for Collaborative Localization

Authors: Thirumalaesh Ashokkumar, Katherine A Skinner, Siddarth Agarwal, Ankit Vora, Ashutosh Bhown

Abstract: Increasingly, autonomous vehicles (AVs) are becoming a reality, such as the Advanced Driver Assistance Systems (ADAS) in vehicles that assist drivers in driving and parking functions with vehicles today. The localization problem for AVs relies primarily on multiple sensors, including cameras, LiDARs, and radars. Manufacturing, installing, calibrating, and maintaining these sensors can be very expe… ▽ More Increasingly, autonomous vehicles (AVs) are becoming a reality, such as the Advanced Driver Assistance Systems (ADAS) in vehicles that assist drivers in driving and parking functions with vehicles today. The localization problem for AVs relies primarily on multiple sensors, including cameras, LiDARs, and radars. Manufacturing, installing, calibrating, and maintaining these sensors can be very expensive, thereby increasing the overall cost of AVs. This research explores the means to improve localization on vehicles belonging to the ADAS category in a platooning context, where an ADAS vehicle follows a lead "Smart" AV equipped with a highly accurate sensor suite. We propose and produce results by using a filtering framework to combine pose information derived from vision and odometry to improve the localization of the ADAS vehicle that follows the smart vehicle. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04160 [pdf, other]

Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy

Authors: SeongKu Kang, Shivam Agarwal, Bowen **, Dongha Lee, Hwanjo Yu, Jiawei Han

Abstract: Document retrieval has greatly benefited from the advancements of large-scale pre-trained language models (PLMs). However, their effectiveness is often limited in theme-specific applications for specialized areas or industries, due to unique terminologies, incomplete contexts of user queries, and specialized search intents. To capture the theme-specific information and improve retrieval, we propos… ▽ More Document retrieval has greatly benefited from the advancements of large-scale pre-trained language models (PLMs). However, their effectiveness is often limited in theme-specific applications for specialized areas or industries, due to unique terminologies, incomplete contexts of user queries, and specialized search intents. To capture the theme-specific information and improve retrieval, we propose to use a corpus topical taxonomy, which outlines the latent topic structure of the corpus while reflecting user-interested aspects. We introduce ToTER (Topical Taxonomy Enhanced Retrieval) framework, which identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts. As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers. Through extensive quantitative, ablative, and exploratory experiments on two real-world datasets, we ascertain the benefits of using topical taxonomy for retrieval in theme-specific applications and demonstrate the effectiveness of ToTER. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: TheWebConf'24

arXiv:2403.03579 [pdf, other]

Testing the unified bounds of quantum speed limit

Authors: Yaozu Wu, Jiale Yuan, Chuanyu Zhang, Zitian Zhu, **feng Deng, Xu Zhang, Pengfei Zhang, Qiujiang Guo, Zhen Wang, Jiehui Huang, Chao Song, Hekang Li, Da-Wei Wang, H. Wang, Girish S. Agarwal

Abstract: Quantum speed limits (QSLs) impose fundamental constraints on the evolution speed of quantum systems. Traditionally, the Mandelstam-Tamm (MT) and Margolus-Levitin (ML) bounds have been widely employed, relying on the standard deviation and mean of energy distribution to define the QSLs. However, these universal bounds only offer loose restrictions on the quantum evolution. Here we introduce the ge… ▽ More Quantum speed limits (QSLs) impose fundamental constraints on the evolution speed of quantum systems. Traditionally, the Mandelstam-Tamm (MT) and Margolus-Levitin (ML) bounds have been widely employed, relying on the standard deviation and mean of energy distribution to define the QSLs. However, these universal bounds only offer loose restrictions on the quantum evolution. Here we introduce the generalized ML bounds, which prove to be more stringent in constraining dynamic evolution, by utilizing moments of energy spectra of arbitrary orders, even noninteger orders. To validate our findings, we conduct experiments in a superconducting circuit, where we have the capability to prepare a wide range of quantum photonic states and rigorously test these bounds by measuring the evolution of the system and its photon statistics using quantum state tomography. While, in general, the MT bound is effective for short-time evolution, we identify specific parameter regimes where either the MT or the generalized ML bounds suffice to constrain the entire evolution. Our findings not only establish new criteria for estimating QSLs but also substantially enhance our comprehension of the dynamic evolution of quantum systems. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.02682 [pdf, other]

Time Weaver: A Conditional Time Series Generation Model

Authors: Sai Shankar Narasimhan, Shubhankar Agarwal, Oguzhan Akcin, Sujay Sanghavi, Sandeep Chinchali

Abstract: Imagine generating a city's electricity demand pattern based on weather, the presence of an electric vehicle, and location, which could be used for capacity planning during a winter freeze. Such real-world time series are often enriched with paired heterogeneous contextual metadata (weather, location, etc.). Current approaches to time series generation often ignore this paired metadata, and its he… ▽ More Imagine generating a city's electricity demand pattern based on weather, the presence of an electric vehicle, and location, which could be used for capacity planning during a winter freeze. Such real-world time series are often enriched with paired heterogeneous contextual metadata (weather, location, etc.). Current approaches to time series generation often ignore this paired metadata, and its heterogeneity poses several practical challenges in adapting existing conditional generation approaches from the image, audio, and video domains to the time series domain. To address this gap, we introduce Time Weaver, a novel diffusion-based model that leverages the heterogeneous metadata in the form of categorical, continuous, and even time-variant variables to significantly improve time series generation. Additionally, we show that naive extensions of standard evaluation metrics from the image to the time series domain are insufficient. These metrics do not penalize conditional generation approaches for their poor specificity in reproducing the metadata-specific features in the generated time series. Thus, we innovate a novel evaluation metric that accurately captures the specificity of conditional generation and the realism of the generated time series. We show that Time Weaver outperforms state-of-the-art benchmarks, such as Generative Adversarial Networks (GANs), by up to 27% in downstream classification tasks on real-world energy, medical, air quality, and traffic data sets. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.18434 [pdf, other]

Graph Regularized Encoder Training for Extreme Classification

Authors: Anshul Mittal, Shikhar Mohan, Deepak Saini, Suchith C. Prabhu, Jain jiao, Sumeet Agarwal, Soumen Chakrabarti, Purushottam Kar, Manik Varma

Abstract: Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) prese… ▽ More Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) present a convenient but computationally expensive way to leverage task metadata and enhance model accuracies in these settings. This paper formally establishes that in several use cases, the steep computational cost of GCNs is entirely avoidable by replacing GCNs with non-GCN architectures. The paper notices that in these settings, it is much more effective to use graph data to regularize encoder training than to implement a GCN. Based on these insights, an alternative paradigm RAMEN is presented to utilize graph metadata in XC settings that offers significant performance boosts with zero increase in inference computational costs. RAMEN scales to datasets with up to 1M labels and offers prediction accuracy up to 15% higher on benchmark datasets than state of the art methods, including those that use graph metadata to train GCNs. RAMEN also offers 10% higher accuracy over the best baseline on a proprietary recommendation dataset sourced from click logs of a popular search engine. Code for RAMEN will be released publicly. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.06608 [pdf, other]

TIC: Translate-Infer-Compile for accurate "text to plan" using LLMs and Logical Representations

Authors: Sudhir Agarwal, Anu Sreepathy

Abstract: We study the problem of generating plans for given natural language planning task requests. On one hand, LLMs excel at natural language processing but do not perform well on planning. On the other hand, classical planning tools excel at planning tasks but require input in a structured language such as the Planning Domain Definition Language (PDDL). We leverage the strengths of both the techniques… ▽ More We study the problem of generating plans for given natural language planning task requests. On one hand, LLMs excel at natural language processing but do not perform well on planning. On the other hand, classical planning tools excel at planning tasks but require input in a structured language such as the Planning Domain Definition Language (PDDL). We leverage the strengths of both the techniques by using an LLM for generating the PDDL representation (task PDDL) of planning task requests followed by using a classical planner for computing a plan. Unlike previous approaches that use LLMs for generating task PDDLs directly, our approach comprises of (a) translate: using an LLM only for generating a logically interpretable intermediate representation of natural language task description, (b) infer: deriving additional logically dependent information from the intermediate representation using a logic reasoner (currently, Answer Set Programming solver), and (c) compile: generating the target task PDDL from the base and inferred information. We observe that using an LLM to only output the intermediate representation significantly reduces LLM errors. Consequently, TIC approach achieves, for at least one LLM, high accuracy on task PDDL generation for all seven domains of our evaluation dataset. △ Less

Submitted 28 June, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: 20 pages (7 main + 2 references + 11 appendix), 4 figures, 2 tables

arXiv:2402.05398 [pdf, other]

On the Effect of Image Resolution on Semantic Segmentation

Authors: Ritambhara Singh, Abhishek Jain, Pietro Perona, Shivani Agarwal, Junfeng Yang

Abstract: High-resolution semantic segmentation requires substantial computational resources. Traditional approaches in the field typically downscale the input images before processing and then upscale the low-resolution outputs back to their original dimensions. While this strategy effectively identifies broad regions, it often misses finer details. In this study, we demonstrate that a streamlined model ca… ▽ More High-resolution semantic segmentation requires substantial computational resources. Traditional approaches in the field typically downscale the input images before processing and then upscale the low-resolution outputs back to their original dimensions. While this strategy effectively identifies broad regions, it often misses finer details. In this study, we demonstrate that a streamlined model capable of directly producing high-resolution segmentations can match the performance of more complex systems that generate lower-resolution results. By simplifying the network architecture, we enable the processing of images at their native resolution. Our approach leverages a bottom-up information propagation technique across various scales, which we have empirically shown to enhance segmentation accuracy. We have rigorously tested our method using leading-edge semantic segmentation datasets. Specifically, for the Cityscapes dataset, we further boost accuracy by applying the Noisy Student Training technique. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:2209.08667 by other authors

arXiv:2402.01829 [pdf, other]

Predicting ATP binding sites in protein sequences using Deep Learning and Natural Language Processing

Authors: Shreyas V, Swati Agarwal

Abstract: Predicting ATP-Protein Binding sites in genes is of great significance in the field of Biology and Medicine. The majority of research in this field has been conducted through time- and resource-intensive 'wet experiments' in laboratories. Over the years, researchers have been investigating computational methods computational methods to accomplish the same goals, utilising the strength of advanced… ▽ More Predicting ATP-Protein Binding sites in genes is of great significance in the field of Biology and Medicine. The majority of research in this field has been conducted through time- and resource-intensive 'wet experiments' in laboratories. Over the years, researchers have been investigating computational methods computational methods to accomplish the same goals, utilising the strength of advanced Deep Learning and NLP algorithms. In this paper, we propose to develop methods to classify ATP-Protein binding sites. We conducted various experiments mainly using PSSMs and several word embeddings as features. We used 2D CNNs and LightGBM classifiers as our chief Deep Learning Algorithms. The MP3Vec and BERT models have also been subjected to testing in our study. The outcomes of our experiments demonstrated improvement over the state-of-the-art benchmarks. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: Published at 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE)

arXiv:2402.01788 [pdf, other]

LitLLM: A Toolkit for Scientific Literature Review

Authors: Shubham Agarwal, Issam H. Laradji, Laurent Charlin, Christopher Pal

Abstract: Conducting literature reviews for scientific papers is essential for understanding research, its limitations, and building on existing work. It is a tedious task which makes an automatic literature review generator appealing. Unfortunately, many existing works that generate such reviews using Large Language Models (LLMs) have significant limitations. They tend to hallucinate-generate non-actual in… ▽ More Conducting literature reviews for scientific papers is essential for understanding research, its limitations, and building on existing work. It is a tedious task which makes an automatic literature review generator appealing. Unfortunately, many existing works that generate such reviews using Large Language Models (LLMs) have significant limitations. They tend to hallucinate-generate non-actual information-and ignore the latest research they have not been trained on. To address these limitations, we propose a toolkit that operates on Retrieval Augmented Generation (RAG) principles, specialized prompting and instructing techniques with the help of LLMs. Our system first initiates a web search to retrieve relevant papers by summarizing user-provided abstracts into keywords using an off-the-shelf LLM. Authors can enhance the search by supplementing it with relevant papers or keywords, contributing to a tailored retrieval process. Second, the system re-ranks the retrieved papers based on the user-provided abstract. Finally, the related work section is generated based on the re-ranked results and the abstract. There is a substantial reduction in time and effort for literature review compared to traditional methods, establishing our toolkit as an efficient alternative. Our open-source toolkit is accessible at https://github.com/shubhamagarwal92/LitLLM and Huggingface space (https://huggingface.co/spaces/shubhamagarwal92/LitLLM) with the video demo at https://youtu.be/E2ggOZBAFw0. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2402.01528 [pdf, other]

Decoding Speculative Decoding

Authors: Minghao Yan, Saurabh Agarwal, Shivaram Venkataraman

Abstract: Speculative Decoding is a widely used technique to speed up inference for Large Language Models (LLMs) without sacrificing quality. When performing inference, speculative decoding uses a smaller draft model to generate speculative tokens and then uses the target LLM to verify those draft tokens. The speedup provided by speculative decoding heavily depends on the choice of the draft model. In this… ▽ More Speculative Decoding is a widely used technique to speed up inference for Large Language Models (LLMs) without sacrificing quality. When performing inference, speculative decoding uses a smaller draft model to generate speculative tokens and then uses the target LLM to verify those draft tokens. The speedup provided by speculative decoding heavily depends on the choice of the draft model. In this work, we perform a detailed study comprising over 350 experiments with LLaMA-65B and OPT-66B using speculative decoding and delineate the factors that affect the performance gain provided by speculative decoding. Our experiments indicate that the performance of speculative decoding depends heavily on the latency of the draft model, and the draft model's capability in language modeling does not correlate strongly with its performance in speculative decoding. Based on these insights we explore a new design space for draft models and design hardware-efficient draft models for speculative decoding. Our newly designed draft model for LLaMA-65B can provide 60% higher throughput than existing draft models and can generalize further to the LLaMA-2 model family and supervised fine-tuned models. △ Less

Submitted 26 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2402.01055 [pdf, other]

Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures

Authors: Mingyuan Zhang, Shivani Agarwal

Abstract: There has been much interest in recent years in learning good classifiers from data with noisy labels. Most work on learning from noisy labels has focused on standard loss-based performance measures. However, many machine learning problems require using non-decomposable performance measures which cannot be expressed as the expectation or sum of a loss on individual examples; these include for exam… ▽ More There has been much interest in recent years in learning good classifiers from data with noisy labels. Most work on learning from noisy labels has focused on standard loss-based performance measures. However, many machine learning problems require using non-decomposable performance measures which cannot be expressed as the expectation or sum of a loss on individual examples; these include for example the H-mean, Q-mean and G-mean in class imbalance settings, and the Micro $F_1$ in information retrieval. In this paper, we design algorithms to learn from noisy labels for two broad classes of multiclass non-decomposable performance measures, namely, monotonic convex and ratio-of-linear, which encompass all the above examples. Our work builds on the Frank-Wolfe and Bisection based methods of Narasimhan et al. (2015). In both cases, we develop noise-corrected versions of the algorithms under the widely studied family of class-conditional noise models. We provide regret (excess risk) bounds for our algorithms, establishing that even though they are trained on noisy data, they are Bayes consistent in the sense that their performance converges to the optimal performance w.r.t. the clean (non-noisy) distribution. Our experiments demonstrate the effectiveness of our algorithms in handling label noise. △ Less

Submitted 23 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.11417 [pdf, other]

Study of Reconnection Dynamics and Plasma Relaxation in MHD simulation of a Solar Flare

Authors: Satyam Agarwal, Ramit Bhattacharyya, Shangbin Yang

Abstract: Self-organization in continuous systems is associated with dissipative processes. In particular, for magnetized plasmas, it is known as magnetic relaxation, where the magnetic energy is converted into heat and kinetic energy of flow through the process of magnetic reconnection. An example of such a system is the solar corona, where reconnection manifests as solar transients like flares and jets. C… ▽ More Self-organization in continuous systems is associated with dissipative processes. In particular, for magnetized plasmas, it is known as magnetic relaxation, where the magnetic energy is converted into heat and kinetic energy of flow through the process of magnetic reconnection. An example of such a system is the solar corona, where reconnection manifests as solar transients like flares and jets. Consequently, toward investigation of plasma relaxation in solar transients, we utilize a novel approach of data-constrained MHD simulation for an observed solar flare. The selected active region NOAA 12253 hosts a GOES M1.3 class flare. The investigation of extrapolated coronal magnetic field in conjunction with the spatiotemporal evolution of the flare reveals a hyperbolic flux tube (HFT), overlying the observed brightenings. MHD simulation is carried out with the EULAG-MHD numerical model to explore the corresponding reconnection dynamics. The overall simulation shows signatures of relaxation. For a detailed analysis, we consider three distinct sub-volumes. We analyze the magnetic field line dynamics along with time evolution of physically relevant quantities like magnetic energy, current density, twist, and gradients in magnetic field. In the terminal state, none of the sub-volumes are seen to reach a force-free state, thus remaining in non-equilibrium, suggesting the possibility of further relaxation. We conclude that the extent of relaxation depends on the efficacy and duration of reconnection, and hence, on the energetics and time span of the flare. △ Less

Submitted 21 January, 2024; originally announced January 2024.

arXiv:2401.07919 [pdf, ps, other]

Steenrod operations on polyhedral products

Authors: Sanjana Agarwal, Jelena Grbić, Michele Intermont, Milica Jovanović, Evgeniya Lagoda, Sarah Whitehouse

Abstract: We describe the action of the mod $2$ Steenrod algebra on the cohomology of various polyhedral products and related spaces. We carry this out for Davis-Januszkiewicz spaces and their generalizations, for moment-angle complexes as well as for certain polyhedral joins. By studying the combinatorics of underlying simplicial complexes, we deduce some consequences for the lowest cohomological dimension… ▽ More We describe the action of the mod $2$ Steenrod algebra on the cohomology of various polyhedral products and related spaces. We carry this out for Davis-Januszkiewicz spaces and their generalizations, for moment-angle complexes as well as for certain polyhedral joins. By studying the combinatorics of underlying simplicial complexes, we deduce some consequences for the lowest cohomological dimension in which non-trivial Steenrod operations can appear. We present a version of cochain-level formulas for Steenrod operations on simplicial complexes. We explain the idea of "propagating" such formulas from a simplicial complex $K$ to polyhedral joins over $K$ and we give examples of this process. We tie the propagation of the Steenrod algebra actions on polyhedral joins to those on moment-angle complexes. Although these are cases where one can understand the Steenrod action via a stable homotopy decomposition, we anticipate applying this method to cases where there is no such decomposition. △ Less

Submitted 20 June, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: 21 pages, v2 minor changes, accepted version to appear in special issue of Topology and its Applications dedicated to proceedings of the Women in Topology 4 workshop

MSC Class: 55U10; 05E45; 13F55; 55S10

arXiv:2401.04855 [pdf, other]

LPAC: Learnable Perception-Action-Communication Loops with Applications to Coverage Control

Authors: Saurav Agarwal, Ramya Muthukrishnan, Walker Gosrich, Vijay Kumar, Alejandro Ribeiro

Abstract: Coverage control is the problem of navigating a robot swarm to collaboratively monitor features or a phenomenon of interest not known a priori. The problem is challenging in decentralized settings with robots that have limited communication and sensing capabilities. We propose a learnable Perception-Action-Communication (LPAC) architecture for the problem, wherein a convolution neural network (CNN… ▽ More Coverage control is the problem of navigating a robot swarm to collaboratively monitor features or a phenomenon of interest not known a priori. The problem is challenging in decentralized settings with robots that have limited communication and sensing capabilities. We propose a learnable Perception-Action-Communication (LPAC) architecture for the problem, wherein a convolution neural network (CNN) processes localized perception; a graph neural network (GNN) facilitates robot communications; finally, a shallow multi-layer perceptron (MLP) computes robot actions. The GNN enables collaboration in the robot swarm by computing what information to communicate with nearby robots and how to incorporate received information. Evaluations show that the LPAC models -- trained using imitation learning -- outperform standard decentralized and centralized coverage control algorithms. The learned policy generalizes to environments different from the training dataset, transfers to larger environments with more robots, and is robust to noisy position estimates. The results indicate the suitability of LPAC architectures for decentralized navigation in robot swarms to achieve collaborative behavior. △ Less

Submitted 8 February, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.01596 [pdf, other]

MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries

Authors: Akash Ghosh, Arkadeep Acharya, Prince Jha, Aniket Gaudgaul, Rajdeep Majumdar, Sriparna Saha, Aman Chadha, Raghav Jain, Setu Sinha, Shivani Agarwal

Abstract: In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical quest… ▽ More In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical question summarisation have been limited to the English language. This work introduces the task of multimodal medical question summarization for codemixed input in a low-resource setting. To address this gap, we introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset, which combines Hindi-English codemixed medical queries with visual aids. This integration enriches the representation of a patient's medical condition, providing a more comprehensive perspective. We also propose a framework named MedSumm that leverages the power of LLMs and VLMs for this task. By utilizing our MMCQS dataset, we demonstrate the value of integrating visual information from images to improve the creation of medically detailed summaries. This multimodal strategy not only improves healthcare decision-making but also promotes a deeper comprehension of patient queries, paving the way for future exploration in personalized and responsive medical care. Our dataset, code, and pre-trained models will be made publicly available. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: ECIR 2024

Showing 1–50 of 675 results for author: Agarwal, S