-
An Overview of Automated Vehicle Platooning Strategies
Authors:
M Sabbir Salek,
Mugdha Basu Thakur,
Pardha Sai Krishna Ala,
Mashrur Chowdhury,
Matthias Schmid,
Pamela Murray-Tuite,
Sakib Mahmud Khan,
Venkat Krovi
Abstract:
Automated vehicle (AV) platooning has the potential to improve the safety, operational, and energy efficiency of surface transportation systems by limiting or eliminating human involvement in the driving tasks. The theoretical validity of the AV platooning strategies has been established and practical applications are being tested under real-world conditions. The emergence of sensors, communicatio…
▽ More
Automated vehicle (AV) platooning has the potential to improve the safety, operational, and energy efficiency of surface transportation systems by limiting or eliminating human involvement in the driving tasks. The theoretical validity of the AV platooning strategies has been established and practical applications are being tested under real-world conditions. The emergence of sensors, communication, and control strategies has resulted in rapid and constant evolution of AV platooning strategies. In this paper, we review the state-of-the-art knowledge in AV platooning using a five-component platooning framework, which includes vehicle model, information-receiving process, information flow topology, spacing policy, and controller and discuss the advantages and limitations of the components. Based on the discussion about existing strategies and associated limitations, potential future research directions are presented.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Sum of squares of hook lengths and contents
Authors:
Krishna Menon
Abstract:
It is known that for the Young diagram of any partition of $n$, the sum of squares of the hook lengths of its cells is exactly $n^2$ more than that sum of squares of the contents of its cells. That is, for any $λ\vdash n$, \begin{equation*}
\sum_{u \in λ} h(u)^2 = n^2 + \sum_{u \in λ} c(u)^2. \end{equation*} We provide a bijective proof of this fact, thus solving a problem posed by Stanley. Alon…
▽ More
It is known that for the Young diagram of any partition of $n$, the sum of squares of the hook lengths of its cells is exactly $n^2$ more than that sum of squares of the contents of its cells. That is, for any $λ\vdash n$, \begin{equation*}
\sum_{u \in λ} h(u)^2 = n^2 + \sum_{u \in λ} c(u)^2. \end{equation*} We provide a bijective proof of this fact, thus solving a problem posed by Stanley. Along the way, we obtain a formula for the number of rectangles in the Young diagram of a partition. We also mention a result for sums of higher powers of hook lengths and contents.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
OGMP: Oracle Guided Multimodal Policies for Agile and Versatile Robot Control
Authors:
Lokesh Krishna,
Nikhil Sobanbabu,
Quan Nguyen
Abstract:
The efficacy of model-free learning for robot control relies on the tailored integration of task-specific priors and heuristics, hence calling for a unified approach. In this paper, we define a general class for priors called oracles and propose bounding the permissible state around the oracle's ansatz, resulting in task-agnostic oracle-guided policy optimization. Additionally, to enhance modulari…
▽ More
The efficacy of model-free learning for robot control relies on the tailored integration of task-specific priors and heuristics, hence calling for a unified approach. In this paper, we define a general class for priors called oracles and propose bounding the permissible state around the oracle's ansatz, resulting in task-agnostic oracle-guided policy optimization. Additionally, to enhance modularity, we introduce the notion of task-vital modes. A policy mastering a compact set of modes and intermediate transitions can then solve perpetual tasks. The proposed approach is validated in challenging biped control tasks: parkour and diving on a 16-DoF dynamic bipedal robot, Hector. OGMP results in a single policy per task, solving indefinite parkour over diverse tracks and omnidirectional diving from varied heights, exhibiting versatile agility. Finally, we introduce a novel latent mode space reachability analysis to study our policy's mode generalization by computing a feasible mode set function through which we certify a set of failure-free modes for our policy to perform at any given state.
△ Less
Submitted 14 June, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Metric-aware LLM inference for regression and scoring
Authors:
Michal Lukasik,
Harikrishna Narasimhan,
Aditya Krishna Menon,
Felix Yu,
Sanjiv Kumar
Abstract:
Large language models (LLMs) have demonstrated strong results on a range of NLP tasks. Typically, outputs are obtained via autoregressive sampling from the LLM's underlying distribution. Building on prior work on Minimum Bayes Risk Decoding, we show that this inference strategy can be suboptimal for a range of regression and scoring tasks, and associated evaluation metrics. As a remedy, we propose…
▽ More
Large language models (LLMs) have demonstrated strong results on a range of NLP tasks. Typically, outputs are obtained via autoregressive sampling from the LLM's underlying distribution. Building on prior work on Minimum Bayes Risk Decoding, we show that this inference strategy can be suboptimal for a range of regression and scoring tasks, and associated evaluation metrics. As a remedy, we propose metric aware LLM inference: a decision theoretic approach optimizing for custom regression and scoring metrics at inference time. We report improvements over baselines on academic benchmarks and publicly available models.
△ Less
Submitted 4 April, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems
Authors:
Wesley A. Suttle,
Vipul K. Sharma,
Krishna C. Kosaraju,
S. Sivaranjani,
Ji Liu,
Vijay Gupta,
Brian M. Sadler
Abstract:
We develop provably safe and convergent reinforcement learning (RL) algorithms for control of nonlinear dynamical systems, bridging the gap between the hard safety guarantees of control theory and the convergence guarantees of RL theory. Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints: model-free RL is used to le…
▽ More
We develop provably safe and convergent reinforcement learning (RL) algorithms for control of nonlinear dynamical systems, bridging the gap between the hard safety guarantees of control theory and the convergence guarantees of RL theory. Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints: model-free RL is used to learn a potentially unsafe controller, whose actions are projected onto safe sets prescribed, for example, by a control barrier function. Though safe, such approaches lose any convergence guarantees enjoyed by the underlying RL methods. In this paper, we develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees while satisfying hard safety constraints throughout training and deployment. We validate the efficacy of our approach in simulation, including safe control of a quadcopter in a challenging obstacle avoidance problem, and demonstrate that it outperforms existing benchmarks.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Some direct and inverse problems for the Restricted Signed sumset in set of integers
Authors:
Mohan,
Raj Kumar Mistri,
Ram Krishna Pandey
Abstract:
Given a positive integer $h$ and a nonempty finite set of integers $A=\{a_{1},a_{2},\ldots,a_{k}\}$, the restricted $h$-fold signed sumset of $A$, denoted by $h^{\wedge}_{\pm}A$, is defined as…
▽ More
Given a positive integer $h$ and a nonempty finite set of integers $A=\{a_{1},a_{2},\ldots,a_{k}\}$, the restricted $h$-fold signed sumset of $A$, denoted by $h^{\wedge}_{\pm}A$, is defined as $$h^{\wedge}_{\pm}A=\left\lbrace \sum_{i=1}^{k} λ_{i} a_{i}: λ_{i} \in \left\lbrace -1, 0, 1\right\rbrace \ \text{for} \ i= 1, 2, \ldots, k \ \text{and} \ \sum_{i=1}^{k} \left| λ_{i} \right| =h\right\rbrace.$$ The direct problem associated with this sumset is to find the optimal lower bound of $|h^{\wedge}_{\pm}A|$, and the inverse problem associated with this sumset is to determine the structure of the underlying set $A$, when $|h^{\wedge}_{\pm}A|$ attains the optimal lower bound. Bhanja, Komatsu and Pandey studied the direct and inverse problem for the restricted $h$-fold signed sumset for $h=2, 3$, and $k$ and conjectured some direct and inverse results for $h \geq 4$. In this paper, we prove these conjectures for $h=4$. We also prove the direct and inverse theorems for arbitrary $h$ under certain restrictions on the set $A$ which are particular cases of the conjectures. Moreover, we prove these conjectures for arithmetic progressions.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Authors:
Nathaniel Li,
Alexander Pan,
Anjali Gopal,
Summer Yue,
Daniel Berrios,
Alice Gatti,
Justin D. Li,
Ann-Kathrin Dombrowski,
Shashwat Goel,
Long Phan,
Gabriel Mukobi,
Nathan Helm-Burger,
Rassin Lababidi,
Lennart Justen,
Andrew B. Liu,
Michael Chen,
Isabelle Barrass,
Oliver Zhang,
Xiaoyuan Zhu,
Rishub Tamirisa,
Bhrugu Bharathi,
Adam Khoja,
Zhenqi Zhao,
Ariel Herbert-Voss,
Cort B. Breuer
, et al. (32 additional authors not shown)
Abstract:
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe…
▽ More
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai
△ Less
Submitted 15 May, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Obligations and permissions, algebraically
Authors:
Andrea De Domenico,
Ali Farjami,
Krishna Manoorkar,
Alessandra Palmigiano,
Mattia Panettiere,
Xiaolong Wang
Abstract:
We further develop the algebraic approach to input/output logic initiated in \cite{wollic22}, where subordination algebras and a family of their generalizations were proposed as a semantic environment of various input/output logics. In particular, we consider precontact algebras as a suitable algebraic environment for negative permission, and we characterize properties of several types of permissi…
▽ More
We further develop the algebraic approach to input/output logic initiated in \cite{wollic22}, where subordination algebras and a family of their generalizations were proposed as a semantic environment of various input/output logics. In particular, we consider precontact algebras as a suitable algebraic environment for negative permission, and we characterize properties of several types of permission (negative, static, dynamic), as well as their interactions with normative systems, by means of suitable modal languages encoding outputs.
△ Less
Submitted 7 March, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
Authors:
Imad Eddine Toubal,
Aditya Avinash,
Neil Gordon Alldrin,
Jan Dlabal,
Wenlei Zhou,
Enming Luo,
Otilia Stretcu,
Hao Xiong,
Chun-Ta Lu,
Howard Zhou,
Ranjay Krishna,
Ariel Fuxman,
Tom Duerig
Abstract:
From content moderation to wildlife conservation, the number of applications that require models to recognize nuanced or subjective visual concepts is growing. Traditionally, develo** classifiers for such concepts requires substantial manual effort measured in hours, days, or even months to identify and annotate data needed for training. Even with recently proposed Agile Modeling techniques, whi…
▽ More
From content moderation to wildlife conservation, the number of applications that require models to recognize nuanced or subjective visual concepts is growing. Traditionally, develo** classifiers for such concepts requires substantial manual effort measured in hours, days, or even months to identify and annotate data needed for training. Even with recently proposed Agile Modeling techniques, which enable rapid bootstrap** of image classifiers, users are still required to spend 30 minutes or more of monotonous, repetitive data labeling just to train a single classifier. Drawing on Fiske's Cognitive Miser theory, we propose a new framework that alleviates manual effort by replacing human labeling with natural language interactions, reducing the total effort required to define a concept by an order of magnitude: from labeling 2,000 images to only 100 plus some natural language interactions. Our framework leverages recent advances in foundation models, both large language models and vision-language models, to carve out the concept space through conversation and by automatically labeling training data points. Most importantly, our framework eliminates the need for crowd-sourced annotations. Moreover, our framework ultimately produces lightweight classification models that are deployable in cost-sensitive scenarios. Across 15 subjective concepts and across 2 public image classification datasets, our trained models outperform traditional Agile Modeling as well as state-of-the-art zero-shot classification models like ALIGN, CLIP, CuPL, and large visual question-answering models like PaLI-X.
△ Less
Submitted 19 March, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Mirage: Defense against CrossPath Attacks in Software Defined Networks
Authors:
Shariq Murtuza,
Krishna Asawa
Abstract:
The Software-Defined Networks (SDNs) face persistent threats from various adversaries that attack them using different methods to mount Denial of Service attacks. These attackers have different motives and follow diverse tactics to achieve their nefarious objectives. In this work, we focus on the impact of CrossPath attacks in SDNs and introduce our framework, Mirage, which not only detects but al…
▽ More
The Software-Defined Networks (SDNs) face persistent threats from various adversaries that attack them using different methods to mount Denial of Service attacks. These attackers have different motives and follow diverse tactics to achieve their nefarious objectives. In this work, we focus on the impact of CrossPath attacks in SDNs and introduce our framework, Mirage, which not only detects but also mitigates this attack. Our framework, Mirage, detects SDN switches that become unreachable due to being under attack, takes proactive measures to prevent Adversarial Path Reconnaissance, and effectively mitigates CrossPath attacks in SDNs. A CrossPath attack is a form of link flood attack that indirectly attacks the control plane by overwhelming the shared links that connect the data and control planes with data plane traffic. This attack is exclusive to in band SDN, where the data and the control plane, both utilize the same physical links for transmitting and receiving traffic. Our framework, Mirage, prevents attackers from launching adversarial path reconnaissance to identify shared links in a network, thereby thwarting their abuse and preventing this attack. Mirage not only stops adversarial path reconnaissance but also includes features to quickly counter ongoing attacks once detected. Mirage uses path diversity to reroute network packet to prevent timing based measurement. Mirage can also enforce short lived flow table rules to prevent timing attacks. These measures are carefully designed to enhance the security of the SDN environment. Moreover, we share the results of our experiments, which clearly show Mirage's effectiveness in preventing path reconnaissance, detecting CrossPath attacks, and mitigating ongoing threats. Our framework successfully protects the network from these harmful activities, giving valuable insights into SDN security.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Advancing Gene Selection in Oncology: A Fusion of Deep Learning and Sparsity for Precision Gene Selection
Authors:
Akhila Krishna,
Ravi Kant Gupta,
Pranav Jeevan,
Amit Sethi
Abstract:
Gene selection plays a pivotal role in oncology research for improving outcome prediction accuracy and facilitating cost-effective genomic profiling for cancer patients. This paper introduces two gene selection strategies for deep learning-based survival prediction models. The first strategy uses a sparsity-inducing method while the second one uses importance based gene selection for identifying r…
▽ More
Gene selection plays a pivotal role in oncology research for improving outcome prediction accuracy and facilitating cost-effective genomic profiling for cancer patients. This paper introduces two gene selection strategies for deep learning-based survival prediction models. The first strategy uses a sparsity-inducing method while the second one uses importance based gene selection for identifying relevant genes. Our overall approach leverages the power of deep learning to model complex biological data structures, while sparsity-inducing methods ensure the selection process focuses on the most informative genes, minimizing noise and redundancy. Through comprehensive experimentation on diverse genomic and survival datasets, we demonstrate that our strategy not only identifies gene signatures with high predictive power for survival outcomes but can also streamlines the process for low-cost genomic profiling. The implications of this research are profound as it offers a scalable and effective tool for advancing personalized medicine and targeted cancer therapies. By pushing the boundaries of gene selection methodologies, our work contributes significantly to the ongoing efforts in cancer genomics, promising improved diagnostic and prognostic capabilities in clinical settings.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Inverse optically-induced ring currents in ring-shaped molecules
Authors:
Krishna Reddy Nandipati,
Sudip Sasmal,
Oriol Vendrell
Abstract:
Permanent electronic ring currents can be supported within a manifold of $Γ_E$ degenerate excited electronic states as $E_{\pm} = E_x \pm i E_y$ excitations. This requires at least a 3-fold-symmetry rotational axis or higher, and includes the subclass of ring-shaped molecules. In [Phys. Rev. Res. {\bf 3}, L042003 (2021)] we showed the existence of inverse-current manifolds, where the direction of…
▽ More
Permanent electronic ring currents can be supported within a manifold of $Γ_E$ degenerate excited electronic states as $E_{\pm} = E_x \pm i E_y$ excitations. This requires at least a 3-fold-symmetry rotational axis or higher, and includes the subclass of ring-shaped molecules. In [Phys. Rev. Res. {\bf 3}, L042003 (2021)] we showed the existence of inverse-current manifolds, where the direction of the electronic ring-current in each degenerate state $E_\pm$ is opposite to the circular polarization of the generating light-fields. This phenomenon can be traced back to vibronic effects, namely the exchange of orbital angular momentum between the circulating electrons and vibrational modes with the required symmetry. Here we consider the case of fixed nuclei and find that ring-shaped molecular systems can posses inverse-current manifolds on a purely electronic-structure basis, i.e. without intervention of vibronic coupling. The effect is illustrated and explained first on a simple tight-binding model with cyclic symmetry, and then considering the {\it{ab initio}} electronic structure of benzene and sym-triazine. A framework for discriminating regular- and inverse-current $Γ_E$ manifolds in molecules using quantum chemistry calculations is provided.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Cause and Effect: Can Large Language Models Truly Understand Causality?
Authors:
Swagata Ashwani,
Kshiteesh Hegde,
Nishith Reddy Mannuru,
Mayank **dal,
Dushyant Singh Sengar,
Krishna Chaitanya Rao Kathala,
Dishant Banga,
Vinija Jain,
Aman Chadha
Abstract:
With the rise of Large Language Models(LLMs), it has become crucial to understand their capabilities and limitations in deciphering and explaining the complex web of causal relationships that language entails. Current methods use either explicit or implicit causal reasoning, yet there is a strong need for a unified approach combining both to tackle a wide array of causal relationships more effecti…
▽ More
With the rise of Large Language Models(LLMs), it has become crucial to understand their capabilities and limitations in deciphering and explaining the complex web of causal relationships that language entails. Current methods use either explicit or implicit causal reasoning, yet there is a strong need for a unified approach combining both to tackle a wide array of causal relationships more effectively. This research proposes a novel architecture called Context Aware Reasoning Enhancement with Counterfactual Analysis(CARE CA) framework to enhance causal reasoning and explainability. The proposed framework incorporates an explicit causal detection module with ConceptNet and counterfactual statements, as well as implicit causal detection through LLMs. Our framework goes one step further with a layer of counterfactual explanations to accentuate LLMs understanding of causality. The knowledge from ConceptNet enhances the performance of multiple causal reasoning tasks such as causal discovery, causal identification and counterfactual reasoning. The counterfactual sentences add explicit knowledge of the not caused by scenarios. By combining these powerful modules, our model aims to provide a deeper understanding of causal relationships, enabling enhanced interpretability. Evaluation of benchmark datasets shows improved performance across all metrics, such as accuracy, precision, recall, and F1 scores. We also introduce CausalNet, a new dataset accompanied by our code, to facilitate further research in this domain.
△ Less
Submitted 15 April, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Constrained Decoding for Code Language Models via Efficient Left and Right Quotienting of Context-Sensitive Grammars
Authors:
Daniel Melcer,
Nathan Fulton,
Sanjay Krishna Gouda,
Haifeng Qian
Abstract:
Large Language Models are powerful tools for program synthesis and advanced auto-completion, but come with no guarantee that their output code is syntactically correct. This paper contributes an incremental parser that allows early rejection of syntactically incorrect code, as well as efficient detection of complete programs for fill-in-the-middle (FItM) tasks. We develop Earley-style parsers that…
▽ More
Large Language Models are powerful tools for program synthesis and advanced auto-completion, but come with no guarantee that their output code is syntactically correct. This paper contributes an incremental parser that allows early rejection of syntactically incorrect code, as well as efficient detection of complete programs for fill-in-the-middle (FItM) tasks. We develop Earley-style parsers that operate over left and right quotients of arbitrary context-free grammars, and we extend our incremental parsing and quotient operations to several context-sensitive features present in the grammars of many common programming languages. The result of these contributions is an efficient, general, and well-grounded method for left and right quotient parsing.
To validate our theoretical contributions -- and the practical effectiveness of certain design decisions -- we evaluate our method on the particularly difficult case of FItM completion for Python 3. Our results demonstrate that constrained generation can significantly reduce the incidence of syntax errors in recommended code.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Spinning Black Hole in a Fluid
Authors:
Surojit Dalui,
Arpan Krishna Mitra,
Deeshani Mitra,
Subir Ghosh
Abstract:
In this paper, we propose a new Analogue Gravity example - a spinning (or Kerr) Black Hole in an extended fluid model. The fluid model receives Berry curvature contributions and applies to electron dynamics in Condensed Matter lattice systems in the hydrodynamic limit. We construct the acoustic metric for sonic fluctuations that obey a structurally relativistic wave equation in an effective curved…
▽ More
In this paper, we propose a new Analogue Gravity example - a spinning (or Kerr) Black Hole in an extended fluid model. The fluid model receives Berry curvature contributions and applies to electron dynamics in Condensed Matter lattice systems in the hydrodynamic limit. We construct the acoustic metric for sonic fluctuations that obey a structurally relativistic wave equation in an effective curved background. In a novel approach of dimensional analysis, we have derived explicit expressions for effective mass and angular momentum per unit mass in the acoustic metric (in terms of fluid parameters), to identify with corresponding parameters of the Kerr metric. The spin is a manifestation of the Berry curvature-induced effective noncommutative structure in the fluid. Finally we put the Kerr Black Hole analogy in a robust setting by revealing explicitly the presence of horizon and ergo-region for a specific background fluid velocity profile. We also show that near horizon behavior of the phase-space trajectory of a probe particle agrees with Kerr Black Hole analogy. In fluid dynamics perspective, presence of a horizon signifies the wave blocking phenomenon.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
COBIAS: Contextual Reliability in Bias Assessment
Authors:
Priyanshul Govil,
Hemang Jain,
Vamshi Krishna Bonagiri,
Aman Chadha,
Ponnurangam Kumaraguru,
Manas Gaur,
Sanorita Dey
Abstract:
Large Language Models (LLMs) are trained on extensive web corpora, which enable them to understand and generate human-like text. However, this training process also results in inherent biases within the models. These biases arise from web data's diverse and often uncurated nature, containing various stereotypes and prejudices. Previous works on debiasing models rely on benchmark datasets to measur…
▽ More
Large Language Models (LLMs) are trained on extensive web corpora, which enable them to understand and generate human-like text. However, this training process also results in inherent biases within the models. These biases arise from web data's diverse and often uncurated nature, containing various stereotypes and prejudices. Previous works on debiasing models rely on benchmark datasets to measure their method's performance. However, these datasets suffer from several pitfalls due to the highly subjective understanding of bias, highlighting a critical need for contextual exploration. We propose understanding the context of inputs by considering the diverse situations in which they may arise. Our contribution is two-fold: (i) we augment 2,291 stereotyped statements from two existing bias-benchmark datasets with points for adding context; (ii) we develop the Context-Oriented Bias Indicator and Assessment Score (COBIAS) to assess a statement's contextual reliability in measuring bias. Our metric aligns with human judgment on contextual reliability of statements (Spearman's $ρ= 0.65, p = 3.4 * 10^{-60}$) and can be used to create reliable datasets, which would assist bias mitigation works.
△ Less
Submitted 17 June, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Scaling Up LLM Reviews for Google Ads Content Moderation
Authors:
Wei Qiao,
Tushar Dogra,
Otilia Stretcu,
Yu-Han Lyu,
Tiantian Fang,
Dong** Kwon,
Chun-Ta Lu,
Enming Luo,
Yuan Wang,
Chih-Chun Chia,
Ariel Fuxman,
Fangzhou Wang,
Ranjay Krishna,
Mehmet Tek
Abstract:
Large language models (LLMs) are powerful tools for content moderation, but their inference costs and latency make them prohibitive for casual use on large datasets, such as the Google Ads repository. This study proposes a method for scaling up LLM reviews for content moderation in Google Ads. First, we use heuristics to select candidates via filtering and duplicate removal, and create clusters of…
▽ More
Large language models (LLMs) are powerful tools for content moderation, but their inference costs and latency make them prohibitive for casual use on large datasets, such as the Google Ads repository. This study proposes a method for scaling up LLM reviews for content moderation in Google Ads. First, we use heuristics to select candidates via filtering and duplicate removal, and create clusters of ads for which we select one representative ad per cluster. We then use LLMs to review only the representative ads. Finally, we propagate the LLM decisions for the representative ads back to their clusters. This method reduces the number of reviews by more than 3 orders of magnitude while achieving a 2x recall compared to a baseline non-LLM model. The success of this approach is a strong function of the representations used in clustering and label propagation; we found that cross-modal similarity representations yield better results than uni-modal representations.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
One-dimensional proximity superconductivity in the quantum Hall regime
Authors:
Julien Barrier,
Minsoo Kim,
Roshan Krishna Kumar,
Na Xin,
P. Kumaravadivel,
Lee Hague,
E. Nguyen,
A. I. Berdyugin,
Christian Moulsdale,
V. V. Enaldiev,
J. R. Prance,
F. H. L. Koppens,
R. V. Gorbachev,
K. Watanabe,
T. Taniguchi,
L. I. Glazman,
I. V. Grigorieva,
V. I. Fal'ko,
A. K. Geim
Abstract:
Extensive efforts have been undertaken to combine superconductivity and the quantum Hall effect so that Cooper-pair transport between superconducting electrodes in Josephson junctions is mediated by one-dimensional edge states. This interest has been motivated by prospects of finding new physics, including topologically-protected quasiparticles, but also extends into metrology and device applicati…
▽ More
Extensive efforts have been undertaken to combine superconductivity and the quantum Hall effect so that Cooper-pair transport between superconducting electrodes in Josephson junctions is mediated by one-dimensional edge states. This interest has been motivated by prospects of finding new physics, including topologically-protected quasiparticles, but also extends into metrology and device applications. So far it has proven challenging to achieve detectable supercurrents through quantum Hall conductors. Here we show that domain walls in minimally twisted bilayer graphene support exceptionally robust proximity superconductivity in the quantum Hall regime, allowing Josephson junctions to operate in fields close to the upper critical field of superconducting electrodes. The critical current is found to be non-oscillatory and practically unchanging over the entire range of quantizing fields, with its value being limited by the quantum conductance of ballistic, strictly one-dimensional electronic channels residing within the domain walls. The system described is unique in its ability to support Andreev bound states at quantizing fields and offers many interesting directions for further exploration.
△ Less
Submitted 25 April, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
SaGE: Evaluating Moral Consistency in Large Language Models
Authors:
Vamshi Krishna Bonagiri,
Sreeram Vennam,
Priyanshul Govil,
Ponnurangam Kumaraguru,
Manas Gaur
Abstract:
Despite recent advancements showcasing the impressive capabilities of Large Language Models (LLMs) in conversational systems, we show that even state-of-the-art LLMs are morally inconsistent in their generations, questioning their reliability (and trustworthiness in general). Prior works in LLM evaluation focus on develo** ground-truth data to measure accuracy on specific tasks. However, for mor…
▽ More
Despite recent advancements showcasing the impressive capabilities of Large Language Models (LLMs) in conversational systems, we show that even state-of-the-art LLMs are morally inconsistent in their generations, questioning their reliability (and trustworthiness in general). Prior works in LLM evaluation focus on develo** ground-truth data to measure accuracy on specific tasks. However, for moral scenarios that often lack universally agreed-upon answers, consistency in model responses becomes crucial for their reliability. To address this issue, we propose an information-theoretic measure called Semantic Graph Entropy (SaGE), grounded in the concept of "Rules of Thumb" (RoTs) to measure a model's moral consistency. RoTs are abstract principles learned by a model and can help explain their decision-making strategies effectively. To this extent, we construct the Moral Consistency Corpus (MCC), containing 50K moral questions, responses to them by LLMs, and the RoTs that these models followed. Furthermore, to illustrate the generalizability of SaGE, we use it to investigate LLM consistency on two popular datasets -- TruthfulQA and HellaSwag. Our results reveal that task-accuracy and consistency are independent problems, and there is a dire need to investigate these issues further.
△ Less
Submitted 8 March, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
From Cloud to Edge: Rethinking Generative AI for Low-Resource Design Challenges
Authors:
Sai Krishna Revanth Vuruma,
Ashley Margetts,
Jianhai Su,
Faez Ahmed,
Biplav Srivastava
Abstract:
Generative Artificial Intelligence (AI) has shown tremendous prospects in all aspects of technology, including design. However, due to its heavy demand on resources, it is usually trained on large computing infrastructure and often made available as a cloud-based service. In this position paper, we consider the potential, challenges, and promising approaches for generative AI for design on the edg…
▽ More
Generative Artificial Intelligence (AI) has shown tremendous prospects in all aspects of technology, including design. However, due to its heavy demand on resources, it is usually trained on large computing infrastructure and often made available as a cloud-based service. In this position paper, we consider the potential, challenges, and promising approaches for generative AI for design on the edge, i.e., in resource-constrained settings where memory, compute, energy (battery) and network connectivity may be limited. Adapting generative AI for such settings involves overcoming significant hurdles, primarily in how to streamline complex models to function efficiently in low-resource environments. This necessitates innovative approaches in model compression, efficient algorithmic design, and perhaps even leveraging edge computing. The objective is to harness the power of generative AI in creating bespoke solutions for design problems, such as medical interventions, farm equipment maintenance, and educational material design, tailored to the unique constraints and needs of remote areas. These efforts could democratize access to advanced technology and foster sustainable development, ensuring universal accessibility and environmental consideration of AI-driven design benefits.
△ Less
Submitted 25 February, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence
Authors:
Kundan Krishna,
Sanjana Ramprasad,
Prakhar Gupta,
Byron C. Wallace,
Zachary C. Lipton,
Jeffrey P. Bigham
Abstract:
LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that ar…
▽ More
LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that are not supported by the reference document, and also presents evidence from the reference for facts that do appear to have support. We train models to execute these tasks, and design an interactive interface to present suggested edits and evidence to users. Comprehensive evaluation by human raters shows that GenAudit can detect errors in 8 different LLM outputs when summarizing documents from diverse domains. To ensure that most errors are flagged by the system, we propose a method that can increase the error recall while minimizing impact on precision. We release our tool (GenAudit) and fact-checking model for public use.
△ Less
Submitted 16 March, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Cryo-Near-Field Photovoltage Microscopy of Heavy-Fermion Twisted Symmetric Trilayer Graphene
Authors:
Sergi Batlle-Porro,
Dumitru Calugaru,
Haoyu Hu,
Roshan Krishna Kumar,
Niels C. H. Hesp,
Kenji Watanabe,
Takashi Taniguchi,
B. Andrei Bernevig,
Petr Stepanov,
Frank H. L. Koppens
Abstract:
Ever since the initial experimental observation of correlated insulators and superconductivity in the flat Dirac bands of magic angle twisted bilayer graphene, a search for the microscopic description that explains its strong electronic interactions has begun. While the seemingly disagreeing electronic transport and scanning tunneling microscopy experiments suggest a dichotomy between local and ex…
▽ More
Ever since the initial experimental observation of correlated insulators and superconductivity in the flat Dirac bands of magic angle twisted bilayer graphene, a search for the microscopic description that explains its strong electronic interactions has begun. While the seemingly disagreeing electronic transport and scanning tunneling microscopy experiments suggest a dichotomy between local and extended electronic orbitals, definitive experimental evidence merging the two patterns together has been much sought after. Here, we report on the local photothermoelectric measurements in the flat electronic bands of twisted symmetric trilayer graphene (TSTG). We use a cryogenic scanning near-field optical microscope with an oscillating atomic force microscopy (AFM) tip irradiated by the infrared photons to create a nanoscopic hot spot in the planar samples, which generates a photocurrent that we probe globally. We observe a breakdown of the non-interacting Mott formalism at low temperatures (10K), signaling the importance of the electronic interactions. Our measurements reveal an overall negative offset of the Seebeck coefficient and significant peaks of the local photovoltage values at all positive integer fillings of the TSTG's moiré superlattice, further indicating a substantial deviation from the classical two-band semiconductor Seebeck response. We explain these observations using the interacting topological heavy-fermion model. In addition, our data reveal a spatial variation of the relative interaction strength dependent on the measured local twist angle (1.2° - 1.6°). Our findings provide experimental evidence of heavy fermion behaviour in the topological flat bands of moiré graphene and epitomize an avenue to apply local thermoelectric measurements to other strongly correlated materials in the disorder-free limit.
△ Less
Submitted 20 February, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Offline Training of Language Model Agents with Functions as Learnable Weights
Authors:
Shaokun Zhang,
Jieyu Zhang,
Jiale Liu,
Linxin Song,
Chi Wang,
Ranjay Krishna,
Qingyun Wu
Abstract:
Researchers and practitioners have recently reframed powerful Large Language Models (LLMs) as agents, enabling them to automate complex tasks largely via the use of specialized functions. To facilitate the development of LLM agents, we present a novel paradigm of training LLM agents without modifying the LLM weights, which is particularly useful when the LLMs are difficult or inaccessible for modi…
▽ More
Researchers and practitioners have recently reframed powerful Large Language Models (LLMs) as agents, enabling them to automate complex tasks largely via the use of specialized functions. To facilitate the development of LLM agents, we present a novel paradigm of training LLM agents without modifying the LLM weights, which is particularly useful when the LLMs are difficult or inaccessible for modifications. Inspired by how humans continuously forge tools to adapt to real-world tasks, rather than change our biological structure to fit a static set of tools, we propose to progressively forge agent's functions to better solve the downstream tasks instead of modifying the LLM weights. By treating the functions as learnable `agent parameters' and leveraging the fundamental idea of model training in artificial intelligence, we develop AgentOptimizer that employs the LLM to update agents' functions and devise an agent training algorithm with two strategies, roll-back, and early-stop, to streamline the training process. With extensive experiments, we showcase that the agent training paradigm could significantly improve the performance of representative LLM agents in various downstream tasks. We also study the behavior of the agent training regarding aspects like the learning curve and domain transferability.
△ Less
Submitted 7 June, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Analysis and Mortality Prediction using Multiclass Classification for Older Adults with Type 2 Diabetes
Authors:
Ruchika Desure,
Gutha Jaya Krishna
Abstract:
Designing proper treatment plans to manage diabetes requires health practitioners to pay heed to the individuals remaining life along with the comorbidities affecting them. Older adults with Type 2 Diabetes Mellitus (T2DM) are prone to experience premature death or even hypoglycaemia. The structured dataset utilized has 68 potential mortality predictors for 275,190 diabetic U.S. military Veterans…
▽ More
Designing proper treatment plans to manage diabetes requires health practitioners to pay heed to the individuals remaining life along with the comorbidities affecting them. Older adults with Type 2 Diabetes Mellitus (T2DM) are prone to experience premature death or even hypoglycaemia. The structured dataset utilized has 68 potential mortality predictors for 275,190 diabetic U.S. military Veterans aged 65 years or older. A new target variable is invented by combining the two original target variables. Outliers are handled by discretizing the continuous variables. Categorical variables have been dummy encoded. Class balancing is achieved by random under-sampling. A benchmark regression model is built using Multinomial Logistic Regression with LASSO. Chi-Squared and Information Gain are the filter-based feature selection techniques utilized. Classifiers such as Multinomial Logistic Regression, Random Forest, Extreme Gradient Boosting (XGBoost), and One-vs-Rest classifier are employed to build various models. Contrary to expectations, all the models have constantly underperformed. XGBoost has given the highest accuracy of 53.03 percent with Chi-Squared feature selection. All the models have consistently shown an acceptable performance for Class 3 (remaining life is more than 10 years), significantly low for Class 1 (remaining life is up to 5 years), and the worst for Class 2 (remaining life is more than 5 but up to 10 years). Features analysis has deduced that almost all input variables are associated with multiple target classes. The high dimensionality of the input data after dummy encoding seems to have confused the models, leading to misclassifications. The approach taken in this study is ineffective in producing a high-performing predictive model but lays a foundation as this problem has never been viewed from a multiclass classification perspective.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Personalised Drug Identifier for Cancer Treatment with Transformers using Auxiliary Information
Authors:
Aishwarya Jayagopal,
Hansheng Xue,
Ziyang He,
Robert J. Walsh,
Krishna Kumar Hariprasannan,
David Shao Peng Tan,
Tuan Zea Tan,
Jason J. Pitt,
Anand D. Jeyasekharan,
Vaibhav Rajan
Abstract:
Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are chall…
▽ More
Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are challenging to build due to limited labelled patient data. Previous methods to address this problem have used various forms of transfer learning. However, they do not explicitly model the variable length sequential structure of the list of mutations in such diagnostic panels. Further, they do not utilize auxiliary information (like patient survival) for model training. We address these limitations through a novel transformer based method, which surpasses the performance of state-of-the-art DRP models on benchmark data. We also present the design of a treatment recommendation system (TRS), which is currently deployed at the National University Hospital, Singapore and is being evaluated in a clinical trial.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Dissipation of nonlinear acoustic waves in thermoviscous pores
Authors:
Krishna Sahithi,
Prateek Gupta
Abstract:
We derive a nonlinear acoustic wave propagation model for analysing the thermoviscous dissipation in narrow pores with wavy walls. As the nonlinear waves propagate in the thermoviscous pores, the wave-steepening effect competes with the bulk dissipation, as well as the thermoviscous heat transfer and shear from the pore walls. Consequently, the length scale of the wave is modified. We use the char…
▽ More
We derive a nonlinear acoustic wave propagation model for analysing the thermoviscous dissipation in narrow pores with wavy walls. As the nonlinear waves propagate in the thermoviscous pores, the wave-steepening effect competes with the bulk dissipation, as well as the thermoviscous heat transfer and shear from the pore walls. Consequently, the length scale of the wave is modified. We use the characteristic nonlinear wave thickness scale to obtain linear and nonlinear wave equations governing the unsteady shock-wall interaction. We also perform two-dimensional shock-resolved DNS of the wave propagation inside the pores and compare the results with model equations. We show that for flat-walls and shock strength parameter $ε$, the dimensional wall heat-flux and shear scale as $ε$. For wavy walls, the scaling becomes $ε^{3/2 - n(k)}$ where $k$ is the wall-waviness wavenumber and the exponent $n$ increases from $0.5$ for $k=0$ to $n(k)\approx0.65$ for $k=10$, $n(k)\approx 0.75$ for $k=20$, and $n(k)\approx0.85$ for $k=40$. Hence, increasing the wall waviness reduces the dependence of the wall heat-flux and shear on nonlinear acoustic wave strength.
△ Less
Submitted 15 February, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results
Authors:
Kelly Payette,
Céline Steger,
Roxane Licandro,
Priscille de Dumast,
Hongwei Bran Li,
Matthew Barkovich,
Liu Li,
Maik Dannecker,
Chen Chen,
Cheng Ouyang,
Niccolò McConnell,
Alina Miron,
Yongmin Li,
Alena Uus,
Irina Grigorescu,
Paula Ramirez Gilliland,
Md Mahfuzur Rahman Siddiquee,
Daguang Xu,
Andriy Myronenko,
Haoyu Wang,
Ziyan Huang,
** Ye,
Mireia Alenyà,
Valentin Comte,
Oscar Camara
, et al. (42 additional authors not shown)
Abstract:
Segmentation is a critical step in analyzing the develo** human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across dif…
▽ More
Segmentation is a critical step in analyzing the develo** human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across different imaging centers remains unsolved, limiting real-world clinical applicability. The multi-center FeTA Challenge 2022 focuses on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two imaging centers as well as two additional unseen centers. The data from different centers varied in many aspects, including scanners used, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated in the challenge, and 17 algorithms were evaluated. Here, a detailed overview and analysis of the challenge results are provided, focusing on the generalizability of the submissions. Both in- and out of domain, the white matter and ventricles were segmented with the highest accuracy, while the most challenging structure remains the cerebral cortex due to anatomical complexity. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms. The resulting new methods contribute to improving the analysis of brain development in utero.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Nonlinear Maccone-Pati Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
We show that one of the two important uncertainty principles derived by Maccone and Pati \textit{[Phys. Rev. Lett., 2014]} can be derived for arbitrary maps defined on subsets of $\mathcal{L}^p$ spaces for $1< p<\infty$. Our main tool is the Clarkson inequalities. We also derive a nonlinear uncertainty principle for weak parallelogram spaces and Type-p Banach spaces.
We show that one of the two important uncertainty principles derived by Maccone and Pati \textit{[Phys. Rev. Lett., 2014]} can be derived for arbitrary maps defined on subsets of $\mathcal{L}^p$ spaces for $1< p<\infty$. Our main tool is the Clarkson inequalities. We also derive a nonlinear uncertainty principle for weak parallelogram spaces and Type-p Banach spaces.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Toward Mass-Production of Transition Metal Dichalcogenide Solar Cells: Scalable Growth of Photovoltaic-Grade Multilayer WSe2 by Tungsten Selenization
Authors:
Kathryn M. Neilson,
Sarallah Hamtaei,
Koosha Nassiri Nazif,
Joshua M. Carr,
Sepideh Rahimisheikh,
Frederick U. Nitta,
Guy Brammertz,
Jeffrey L. Blackburn,
Joke Hadermann,
Krishna C. Saraswat,
Obadiah G. Reid,
Bart Vermang,
Alwin Daus,
Eric Pop
Abstract:
Semiconducting transition metal dichalcogenides (TMDs) are promising for high-specific-power photovoltaics due to desirable band gaps, high absorption coefficients, and ideally dangling-bond-free surfaces. Despite their potential, the majority of TMD solar cells are fabricated in a non-scalable fashion using exfoliated materials due to the absence of high-quality, large-area, multilayer TMDs. Here…
▽ More
Semiconducting transition metal dichalcogenides (TMDs) are promising for high-specific-power photovoltaics due to desirable band gaps, high absorption coefficients, and ideally dangling-bond-free surfaces. Despite their potential, the majority of TMD solar cells are fabricated in a non-scalable fashion using exfoliated materials due to the absence of high-quality, large-area, multilayer TMDs. Here, we present the scalable, thickness-tunable synthesis of multilayer tungsten diselenide (WSe$_{2}$) films by selenizing pre-patterned tungsten with either solid source selenium or H$_{2}$Se precursors, which leads to smooth, wafer-scale WSe$_{2}$ films with a layered van der Waals structure. The films have charge carrier lifetimes up to 144 ns, over 14x higher than large-area TMD films previously demonstrated. Such high carrier lifetimes correspond to power conversion efficiency of ~22% and specific power of ~64 W g$^{-1}$ in a packaged solar cell, or ~3 W g$^{-1}$ in a fully-packaged solar module. This paves the way for the mass-production of high-efficiency multilayer WSe$_{2}$ solar cells at low cost.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Quantum Computing-Enhanced Algorithm Unveils Novel Inhibitors for KRAS
Authors:
Mohammad Ghazi Vakili,
Christoph Gorgulla,
AkshatKumar Nigam,
Dmitry Bezrukov,
Daniel Varoli,
Alex Aliper,
Daniil Polykovsky,
Krishna M. Padmanabha Das,
Jamie Snider,
Anna Lyakisheva,
Ardalan Hosseini Mansob,
Zhong Yao,
Lela Bitar,
Eugene Radchenko,
Xiao Ding,
**xin Liu,
Fanye Meng,
Feng Ren,
Yudong Cao,
Igor Stagljar,
Alán Aspuru-Guzik,
Alex Zhavoronkov
Abstract:
The discovery of small molecules with therapeutic potential is a long-standing challenge in chemistry and biology. Researchers have increasingly leveraged novel computational techniques to streamline the drug development process to increase hit rates and reduce the costs associated with bringing a drug to market. To this end, we introduce a quantum-classical generative model that seamlessly integr…
▽ More
The discovery of small molecules with therapeutic potential is a long-standing challenge in chemistry and biology. Researchers have increasingly leveraged novel computational techniques to streamline the drug development process to increase hit rates and reduce the costs associated with bringing a drug to market. To this end, we introduce a quantum-classical generative model that seamlessly integrates the computational power of quantum algorithms trained on a 16-qubit IBM quantum computer with the established reliability of classical methods for designing small molecules. Our hybrid generative model was applied to designing new KRAS inhibitors, a crucial target in cancer therapy. We synthesized 15 promising molecules during our investigation and subjected them to experimental testing to assess their ability to engage with the target. Notably, among these candidates, two molecules, ISM061-018-2 and ISM061-22, each featuring unique scaffolds, stood out by demonstrating effective engagement with KRAS. ISM061-018-2 was identified as a broad-spectrum KRAS inhibitor, exhibiting a binding affinity to KRAS-G12D at $1.4 μM$. Concurrently, ISM061-22 exhibited specific mutant selectivity, displaying heightened activity against KRAS G12R and Q61H mutants. To our knowledge, this work shows for the first time the use of a quantum-generative model to yield experimentally confirmed biological hits, showcasing the practical potential of quantum-assisted drug discovery to produce viable therapeutics. Moreover, our findings reveal that the efficacy of distribution learning correlates with the number of qubits utilized, underlining the scalability potential of quantum computing resources. Overall, we anticipate our results to be a step** stone towards develo** more advanced quantum generative models in drug discovery.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation
Authors:
Wilbert Pumacay,
Ishika Singh,
Jiafei Duan,
Ranjay Krishna,
Jesse Thomason,
Dieter Fox
Abstract:
To realize effective large-scale, real-world robotic applications, we must evaluate how well our robot policies adapt to changes in environmental conditions. Unfortunately, a majority of studies evaluate robot performance in environments closely resembling or even identical to the training setup. We present THE COLOSSEUM, a novel simulation benchmark, with 20 diverse manipulation tasks, that enabl…
▽ More
To realize effective large-scale, real-world robotic applications, we must evaluate how well our robot policies adapt to changes in environmental conditions. Unfortunately, a majority of studies evaluate robot performance in environments closely resembling or even identical to the training setup. We present THE COLOSSEUM, a novel simulation benchmark, with 20 diverse manipulation tasks, that enables systematical evaluation of models across 14 axes of environmental perturbations. These perturbations include changes in color, texture, and size of objects, table-tops, and backgrounds; we also vary lighting, distractors, physical properties perturbations and camera pose. Using THE COLOSSEUM, we compare 5 state-of-the-art manipulation models to reveal that their success rate degrades between 30-50% across these perturbation factors. When multiple perturbations are applied in unison, the success rate degrades $\geq$75%. We identify that changing the number of distractor objects, target object color, or lighting conditions are the perturbations that reduce model performance the most. To verify the ecological validity of our results, we show that our results in simulation are correlated ($\bar{R}^2 = 0.614$) to similar perturbations in real-world experiments. We open source code for others to use THE COLOSSEUM, and also release code to 3D print the objects used to replicate the real-world perturbations. Ultimately, we hope that THE COLOSSEUM will serve as a benchmark to identify modeling decisions that systematically improve generalization for manipulation. See https://robot-colosseum.github.io/ for more details.
△ Less
Submitted 27 May, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Obligations and permissions on selfextensional logics
Authors:
Andrea De Domenico,
Ali Farjami,
Krishna Manoorkar,
Alessandra Palmigiano,
Mattia Panettiere,
Xiaolong Wang
Abstract:
We further develop the abstract algebraic logic approach to input/output logic initiated in \cite{wollic22}, where the family of selfextensional logics was proposed as a general background environment for input/output logics. In this paper, we introduce and discuss the generalizations of several types of permission (negative, dual negative, static, dynamic), as well as their interactions with norm…
▽ More
We further develop the abstract algebraic logic approach to input/output logic initiated in \cite{wollic22}, where the family of selfextensional logics was proposed as a general background environment for input/output logics. In this paper, we introduce and discuss the generalizations of several types of permission (negative, dual negative, static, dynamic), as well as their interactions with normative systems, to various families of selfextensional logics, thereby proposing a systematic approach to the definition of normative and permission systems on nonclassical propositional bases.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Multi-User SR-LDPC Codes via Coded Demixing with Applications to Cell-Free Systems
Authors:
Jamison R. Ebert,
Jean-Francois Chamberland,
Krishna R. Narayanan
Abstract:
Novel sparse regression LDPC (SR-LDPC) codes exhibit excellent performance over additive white Gaussian noise (AWGN) channels in part due to their natural provision of sha** gains. Though SR-LDPC-like codes have been considered within the context of single-user error correction and massive random access, they are yet to be examined as candidates for coordinated multi-user communication scenarios…
▽ More
Novel sparse regression LDPC (SR-LDPC) codes exhibit excellent performance over additive white Gaussian noise (AWGN) channels in part due to their natural provision of sha** gains. Though SR-LDPC-like codes have been considered within the context of single-user error correction and massive random access, they are yet to be examined as candidates for coordinated multi-user communication scenarios. This article explores this gap in the literature and demonstrates that SR-LDPC codes, when combined with coded demixing techniques, offer a new framework for efficient non-orthogonal multiple access (NOMA) in the context of coordinated multi-user communication channels. The ensuing communication scheme is referred to as MU-SR-LDPC coding. Empirical evidence suggests that, for a fixed SNR, MU-SR-LDPC coding can achieve a target bit error rate (BER) at a higher sum rate than orthogonal multiple access (OMA) techniques such as time division multiple access (TDMA) and frequency division multiple access (FDMA). Importantly, MU-SR-LDPC codes enable a pragmatic solution path for user-centric cell-free communication systems with (local) joint decoding. Results are supported by numerical simulations.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Understanding the Effects of Iterative Prompting on Truthfulness
Authors:
Satyapriya Krishna,
Chirag Agarwal,
Himabindu Lakkaraju
Abstract:
The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities. Yet, the reliability and truthfulness of these models remain pressing concerns. To this end, we investigate iterative prompting, a strategy hypothesized to refine LLM responses, assessing its impact on LLM truthfulness, an area which has not been thoroughly ex…
▽ More
The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities. Yet, the reliability and truthfulness of these models remain pressing concerns. To this end, we investigate iterative prompting, a strategy hypothesized to refine LLM responses, assessing its impact on LLM truthfulness, an area which has not been thoroughly explored. Our extensive experiments delve into the intricacies of iterative prompting variants, examining their influence on the accuracy and calibration of model responses. Our findings reveal that naive prompting methods significantly undermine truthfulness, leading to exacerbated calibration errors. In response to these challenges, we introduce several prompting variants designed to address the identified issues. These variants demonstrate marked improvements over existing baselines, signaling a promising direction for future research. Our work provides a nuanced understanding of iterative prompting and introduces novel approaches to enhance the truthfulness of LLMs, thereby contributing to the development of more accurate and trustworthy AI systems.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning
Authors:
Shivalika Singh,
Freddie Vargus,
Daniel Dsouza,
Börje F. Karlsson,
Abinaya Mahendiran,
Wei-Yin Ko,
Herumb Shandilya,
Jay Patel,
Deividas Mataciunas,
Laura OMahony,
Mike Zhang,
Ramith Hettiarachchi,
Joseph Wilson,
Marina Machado,
Luisa Souza Moura,
Dominik Krzemiński,
Hakimeh Fadaei,
Irem Ergün,
Ifeoma Okoh,
Aisha Alaagib,
Oshan Mudannayake,
Zaid Alyafeai,
Vu Minh Chien,
Sebastian Ruder,
Surya Guthikonda
, et al. (8 additional authors not shown)
Abstract:
Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets.…
▽ More
Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets. However, existing datasets are almost all in the English language. In this work, our primary goal is to bridge the language gap by building a human-curated instruction-following dataset spanning 65 languages. We worked with fluent speakers of languages from around the world to collect natural instances of instructions and completions. Furthermore, we create the most extensive multilingual collection to date, comprising 513 million instances through templating and translating existing datasets across 114 languages. In total, we contribute four key resources: we develop and open-source the Aya Annotation Platform, the Aya Dataset, the Aya Collection, and the Aya Evaluation Suite. The Aya initiative also serves as a valuable case study in participatory research, involving collaborators from 119 countries. We see this as a valuable framework for future research collaborations that aim to bridge gaps in resources.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Probabilistic Forecasting of Irregular Time Series via Conditional Flows
Authors:
Vijaya Krishna Yalavarthi,
Randolf Scholz,
Stefan Born,
Lars Schmidt-Thieme
Abstract:
Probabilistic forecasting of irregularly sampled multivariate time series with missing values is an important problem in many fields, including health care, astronomy, and climate. State-of-the-art methods for the task estimate only marginal distributions of observations in single channels and at single timepoints, assuming a fixed-shape parametric distribution. In this work, we propose a novel mo…
▽ More
Probabilistic forecasting of irregularly sampled multivariate time series with missing values is an important problem in many fields, including health care, astronomy, and climate. State-of-the-art methods for the task estimate only marginal distributions of observations in single channels and at single timepoints, assuming a fixed-shape parametric distribution. In this work, we propose a novel model, ProFITi, for probabilistic forecasting of irregularly sampled time series with missing values using conditional normalizing flows. The model learns joint distributions over the future values of the time series conditioned on past observations and queried channels and times, without assuming any fixed shape of the underlying distribution. As model components, we introduce a novel invertible triangular attention layer and an invertible non-linear activation function on and onto the whole real line. We conduct extensive experiments on four datasets and demonstrate that the proposed model provides $4$ times higher likelihood over the previously best model.
△ Less
Submitted 21 May, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
Animated Stickers: Bringing Stickers to Life with Video Diffusion
Authors:
David Yan,
Winnie Zhang,
Luxin Zhang,
Anmol Kalia,
Dingkang Wang,
Ankit Ramchandani,
Miao Liu,
Albert Pumarola,
Edgar Schoenfeld,
Elliot Blanchard,
Krishna Narni,
Yaqiao Luo,
Lawrence Chen,
Guan Pang,
Ali Thabet,
Peter Vajda,
Amy Bearman,
Licheng Yu
Abstract:
We introduce animated stickers, a video diffusion model which generates an animation conditioned on a text prompt and static sticker image. Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion. Due to the domain gap, i.e. differences in visual and motion style, a model which performed well on generating natural videos can n…
▽ More
We introduce animated stickers, a video diffusion model which generates an animation conditioned on a text prompt and static sticker image. Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion. Due to the domain gap, i.e. differences in visual and motion style, a model which performed well on generating natural videos can no longer generate vivid videos when applied to stickers. To bridge this gap, we employ a two-stage finetuning pipeline: first with weakly in-domain data, followed by human-in-the-loop (HITL) strategy which we term ensemble-of-teachers. It distills the best qualities of multiple teachers into a smaller student model. We show that this strategy allows us to specifically target improvements to motion quality while maintaining the style from the static image. With inference optimizations, our model is able to generate an eight-frame video with high-quality, interesting, and relevant motion in under one second.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Spreading Information via Social Networks: An Irrelevance Result
Authors:
Yu Awaya,
Vijay Krishna
Abstract:
An informed planner wishes to spread information among a group of agents in order to induce efficient coordination -- say the adoption of a new technology with positive externalities. The agents are connected via a social network. The planner informs a seed and then the information spreads via the network. While the structure of the network affects the rate of diffusion, we show that the rate of a…
▽ More
An informed planner wishes to spread information among a group of agents in order to induce efficient coordination -- say the adoption of a new technology with positive externalities. The agents are connected via a social network. The planner informs a seed and then the information spreads via the network. While the structure of the network affects the rate of diffusion, we show that the rate of adoption is the same for all acyclic networks.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Exploring responsible applications of Synthetic Data to advance Online Safety Research and Development
Authors:
Pica Johansson,
Jonathan Bright,
Shyam Krishna,
Claudia Fischer,
David Leslie
Abstract:
The use of synthetic data provides an opportunity to accelerate online safety research and development efforts while showing potential for bias mitigation, facilitating data storage and sharing, preserving privacy and reducing exposure to harmful content. However, the responsible use of synthetic data requires caution regarding anticipated risks and challenges. This short report explores the poten…
▽ More
The use of synthetic data provides an opportunity to accelerate online safety research and development efforts while showing potential for bias mitigation, facilitating data storage and sharing, preserving privacy and reducing exposure to harmful content. However, the responsible use of synthetic data requires caution regarding anticipated risks and challenges. This short report explores the potential applications of synthetic data to the domain of online safety, and addresses the ethical challenges that effective use of the technology may present.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers
Authors:
Abhimanyu Rajeshkumar Bambhaniya,
Amir Yazdanbakhsh,
Suvinay Subramanian,
Sheng-Chun Kao,
Shivani Agrawal,
Utku Evci,
Tushar Krishna
Abstract:
N:M Structured sparsity has garnered significant interest as a result of relatively modest overhead and improved efficiency. Additionally, this form of sparsity holds considerable appeal for reducing the memory footprint owing to their modest representation overhead. There have been efforts to develop training recipes for N:M structured sparsity, they primarily focus on low-sparsity regions (…
▽ More
N:M Structured sparsity has garnered significant interest as a result of relatively modest overhead and improved efficiency. Additionally, this form of sparsity holds considerable appeal for reducing the memory footprint owing to their modest representation overhead. There have been efforts to develop training recipes for N:M structured sparsity, they primarily focus on low-sparsity regions ($\sim$50\%). Nonetheless, performance of models trained using these approaches tends to decline when confronted with high-sparsity regions ($>$80\%). In this work, we study the effectiveness of existing sparse training recipes at \textit{high-sparsity regions} and argue that these methods fail to sustain the model quality on par with low-sparsity regions. We demonstrate that the significant factor contributing to this disparity is the presence of elevated levels of induced noise in the gradient magnitudes. To mitigate this undesirable effect, we employ decay mechanisms to progressively restrict the flow of gradients towards pruned elements. Our approach improves the model quality by up to 2$\%$ and 5$\%$ in vision and language models at high sparsity regime, respectively. We also evaluate the trade-off between model accuracy and training compute cost in terms of FLOPs. At iso-training FLOPs, our method yields better performance compared to conventional sparse training recipes, exhibiting an accuracy improvement of up to 2$\%$. The source code is available at https://github.com/abhibambhaniya/progressive_gradient_flow_nm_sparsity.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
RIS-NOMA integrated low-complexity transceiver architecture: Sum rate and energy efficiency perspective
Authors:
Kali Krishna Kota,
Praful D. Mankar
Abstract:
This paper aims to explore reconfigurable intelligent surface (RIS) integration in a millimeter wave (mmWave) communication system with low-complexity transceiver architecture under imperfect CSI assumption. Towards this, we propose a RIS-aided system with a fully analog (FA) architecture at the base station. However, to overcome the disadvantage of single-user transmission due to the single RF-ch…
▽ More
This paper aims to explore reconfigurable intelligent surface (RIS) integration in a millimeter wave (mmWave) communication system with low-complexity transceiver architecture under imperfect CSI assumption. Towards this, we propose a RIS-aided system with a fully analog (FA) architecture at the base station. However, to overcome the disadvantage of single-user transmission due to the single RF-chain, we employ NOMA. For such a system, we formulate sum rate (SR) and energy efficiency (EE) maximization problems to obtain the joint transmit beamformer, RIS phase shift matrix, and power allocation solutions under minimum rate constraint. We first tackle the fractional objectives of both problems by reformulating the SR and EE maximization problems into equivalent quadratic forms using the quadratic transform. On the other hand, we employ successive convex approximation and the semi-definite relaxation technique to handle the non-convex minimum rate and unit modulus constraint of the RIS phase shifts, respectively. Next, we propose an alternating optimization-based algorithm that iterates over the transmit beamformer, power allocation, and RIS phase shift subproblems. Further, we also show that the quadratic reformulation is equivalent to the WMSE-based reformulation for the case of SR maximization problem. Our numerical results show that the proposed RIS-NOMA integrated FA architecture system outperforms the optimally configured fully digital architecture in terms of SR at low SNR and EE for a wide range of SNR while still maintaining low hardware complexity and cost. Finally, we present the numerical performance analysis of the RIS-NOMA integrated low-complexity system for various system configuration parameters.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines
Authors:
Chao Pang,
Xinzhuo Jiang,
Nishanth Parameshwar Pavinkurve,
Krishna S. Kalluri,
Elise L. Minto,
Jason Patterson,
Linying Zhang,
George Hripcsak,
Gamze Gürsoy,
Noémie Elhadad,
Karthik Natarajan
Abstract:
Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabula…
▽ More
Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabular format, disregarding temporal dependencies in patient histories and limiting data replication. Recently, there has been a growing interest in leveraging Generative Pre-trained Transformers (GPT) for EHR data. This enables applications like disease progression analysis, population estimation, counterfactual reasoning, and synthetic data generation. In this work, we focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation derived from CEHR-BERT, enabling us to generate patient sequences that can be seamlessly converted to the Observational Medical Outcomes Partnership (OMOP) data format.
△ Less
Submitted 5 May, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Functional Kup**er-Durisi-Bölcskei Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $\mathcal{X}$ be a Banach space. Let $\{τ_j\}_{j=1}^n, \{ω_k\}_{k=1}^m\subseteq \mathcal{X}$ and $\{f_j\}_{j=1}^n$, $\{g_k\}_{k=1}^m\subseteq \mathcal{X}^*$ satisfy $ |f_j(τ_j)|\geq 1$ for all $ 1\leq j \leq n$, $|g_k(ω_k)|\geq 1 $ for all $1\leq k \leq m$. If $x \in \mathcal{X}\setminus \{0\}$ is such that $x=θ_τθ_f x=θ_ωθ_g x$, then we show that \begin{align}\label{FKDB} (1) \quad\quad\quad\…
▽ More
Let $\mathcal{X}$ be a Banach space. Let $\{τ_j\}_{j=1}^n, \{ω_k\}_{k=1}^m\subseteq \mathcal{X}$ and $\{f_j\}_{j=1}^n$, $\{g_k\}_{k=1}^m\subseteq \mathcal{X}^*$ satisfy $ |f_j(τ_j)|\geq 1$ for all $ 1\leq j \leq n$, $|g_k(ω_k)|\geq 1 $ for all $1\leq k \leq m$. If $x \in \mathcal{X}\setminus \{0\}$ is such that $x=θ_τθ_f x=θ_ωθ_g x$, then we show that \begin{align}\label{FKDB} (1) \quad\quad\quad\quad \|θ_fx\|_0\|θ_gx\|_0\geq \frac{\bigg[1-(\|θ_fx\|_0-1)\max\limits_{1\leq j,r \leq n,j\neq r}|f_j(τ_r)|\bigg]^+\bigg[1-(\|θ_g x\|_0-1)\max\limits_{1\leq k,s \leq m,k\neq s}|g_k(ω_s)|\bigg]^+}{\left(\displaystyle\max_{1\leq j \leq n, 1\leq k \leq m}|f_j(ω_k)|\right)\left(\displaystyle\max_{1\leq j \leq n, 1\leq k \leq m}|g_k(τ_j)|\right)}. \end{align}
We call Inequality (1) as \textbf{Functional Kup**er-Durisi-Bölcskei Uncertainty Principle}. Inequality (1) improves the uncertainty principle obtained by Kup**er, Durisi and Bölcskei \textit{[IEEE Trans. Inform. Theory (2012)]} (which improved the Donoho-Stark-Elad-Bruckstein uncertainty principle \textit{[SIAM J. Appl. Math. (1989), IEEE Trans. Inform. Theory (2002)]}). We also derive functional form of the uncertainity principle obtained by Studer, Kup**er, Pope and Bölcskei \textit{[EEE Trans. Inform. Theory (2012)]}.
△ Less
Submitted 1 January, 2024;
originally announced February 2024.
-
ARGO: An Auto-Tuning Runtime System for Scalable GNN Training on Multi-Core Processor
Authors:
Yi-Chien Lin,
Yuyang Chen,
Sameh Gobriel,
Nilesh Jain,
Gopi Krishna Jha,
Viktor Prasanna
Abstract:
As Graph Neural Networks (GNNs) become popular, libraries like PyTorch-Geometric (PyG) and Deep Graph Library (DGL) are proposed; these libraries have emerged as the de facto standard for implementing GNNs because they provide graph-oriented APIs and are purposefully designed to manage the inherent sparsity and irregularity in graph structures. However, these libraries show poor scalability on mul…
▽ More
As Graph Neural Networks (GNNs) become popular, libraries like PyTorch-Geometric (PyG) and Deep Graph Library (DGL) are proposed; these libraries have emerged as the de facto standard for implementing GNNs because they provide graph-oriented APIs and are purposefully designed to manage the inherent sparsity and irregularity in graph structures. However, these libraries show poor scalability on multi-core processors, which under-utilizes the available platform resources and limits the performance. This is because GNN training is a resource-intensive workload with high volume of irregular data accessing, and existing libraries fail to utilize the memory bandwidth efficiently. To address this challenge, we propose ARGO, a novel runtime system for GNN training that offers scalable performance. ARGO exploits multi-processing and core-binding techniques to improve platform resource utilization. We further develop an auto-tuner that searches for the optimal configuration for multi-processing and core-binding. The auto-tuner works automatically, making it completely transparent from the user. Furthermore, the auto-tuner allows ARGO to adapt to various platforms, GNN models, datasets, etc. We evaluate ARGO on two representative GNN models and four widely-used datasets on two platforms. With the proposed autotuner, ARGO is able to select a near-optimal configuration by exploring only 5% of the design space. ARGO speeds up state-of-the-art GNN libraries by up to 5.06x and 4.54x on a four-socket Ice Lake machine with 112 cores and a two-socket Sapphire Rapids machine with 64 cores, respectively. Finally, ARGO can seamlessly integrate into widely-used GNN libraries (e.g., DGL, PyG) with few lines of code and speed up GNN training.
△ Less
Submitted 27 February, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains
Authors:
Sanjana Ramprasad,
Kundan Krishna,
Zachary C Lipton,
Byron C Wallace
Abstract:
Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (pote…
▽ More
Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (potentially more specialized) domains? In this work we evaluate zero-shot generated summaries across specialized domains including biomedical articles, and legal bills (in addition to standard news benchmarks for reference). We focus especially on the factuality of outputs. We acquire annotations from domain experts to identify inconsistencies in summaries and systematically categorize these errors. We analyze whether the prevalence of a given domain in the pretraining corpus affects extractiveness and faithfulness of generated summaries of articles in this domain. We release all collected annotations to facilitate additional research toward measuring and realizing factually accurate summarization, beyond news articles. The dataset can be downloaded from https://github.com/sanjanaramprasad/zero_shot_faceval_domains
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
On additive complement with special structures
Authors:
Mohan,
Bhuwanesh Rao Patil,
Ram Krishna Pandey
Abstract:
Let $A$ be a set of natural numbers. A set $B$, a set of natural numbers, is said to be an additive complement of the set $A$ if all sufficiently large natural numbers can be represented in the form $x+y$, where $x\in A$ and $y\in B$. This article describes various types of additive complements of the set $A$ such as those additive complement of $A$ that does not intersects $A$, additive complemen…
▽ More
Let $A$ be a set of natural numbers. A set $B$, a set of natural numbers, is said to be an additive complement of the set $A$ if all sufficiently large natural numbers can be represented in the form $x+y$, where $x\in A$ and $y\in B$. This article describes various types of additive complements of the set $A$ such as those additive complement of $A$ that does not intersects $A$, additive complements of the form of the union of disjoint infinite arithmetic progressions, additive complement having various density etc. As an application of this study, we also focus on the structure of sumset of arithmetic progression and geometric progression. Apart from this, for given positive real no. $α\leq 1$ and finite set $A$, we investigate a set $B$ such that it can be written as union of disjoint infinite arithmetic progression and density of $A+B$ is $α$.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
A Comprehensive Study of the Current State-of-the-Art in Nepali Automatic Speech Recognition Systems
Authors:
Rupak Raj Ghimire,
Bal Krishna Bal,
Prakash Poudyal
Abstract:
In this paper, we examine the research conducted in the field of Nepali Automatic Speech Recognition (ASR). The primary objective of this survey is to conduct a comprehensive review of the works on Nepali Automatic Speech Recognition Systems completed to date, explore the different datasets used, examine the technology utilized, and take account of the obstacles encountered in implementing the Nep…
▽ More
In this paper, we examine the research conducted in the field of Nepali Automatic Speech Recognition (ASR). The primary objective of this survey is to conduct a comprehensive review of the works on Nepali Automatic Speech Recognition Systems completed to date, explore the different datasets used, examine the technology utilized, and take account of the obstacles encountered in implementing the Nepali ASR system. In tandem with the global trends of ever-increasing research on speech recognition based research, the number of Nepalese ASR-related projects are also growing. Nevertheless, the investigation of language and acoustic models of the Nepali language has not received adequate attention compared to languages that possess ample resources. In this context, we provide a framework as well as directions for future investigations.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
PuzzleBench: Can LLMs Solve Challenging First-Order Combinatorial Reasoning Problems?
Authors:
Chinmay Mittal,
Krishna Kartik,
Mausam,
Parag Singla
Abstract:
Recent works show that the largest of the large language models (LLMs) can solve many simple reasoning tasks expressed in natural language, without any/much supervision. But, can they also solve challenging first-order combinatorial reasoning problems, such as graph coloring, knapsack and cryptarithmetic? To answer this question, we present PuzzleBench, a dataset of 31 such challenging problems al…
▽ More
Recent works show that the largest of the large language models (LLMs) can solve many simple reasoning tasks expressed in natural language, without any/much supervision. But, can they also solve challenging first-order combinatorial reasoning problems, such as graph coloring, knapsack and cryptarithmetic? To answer this question, we present PuzzleBench, a dataset of 31 such challenging problems along with a few solved instances for each problem. These problems are all first order, i.e., they can be instantiated with problem instances of varying sizes, and most of them are NP-hard, requiring several reasoning steps to reach the solution. We first observe that LLMs, even when aided by symbolic solvers, perform rather poorly on our dataset. In response, we propose a new approach, Puzzle-LM, which combines LLMs with both symbolic solvers and program interpreters, along with feedback from solved examples, to achieve huge performance gains. Our extensive experimentation and analyses offer new insights into the reasoning abilities and limitations of present-day LLMs.
△ Less
Submitted 22 February, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Measuring Moral Inconsistencies in Large Language Models
Authors:
Vamshi Krishna Bonagiri,
Sreeram Vennam,
Manas Gaur,
Ponnurangam Kumaraguru
Abstract:
A Large Language Model (LLM) is considered consistent if semantically equivalent prompts produce semantically equivalent responses. Despite recent advancements showcasing the impressive capabilities of LLMs in conversational systems, we show that even state-of-the-art LLMs are highly inconsistent in their generations, questioning their reliability. Prior research has tried to measure this with tas…
▽ More
A Large Language Model (LLM) is considered consistent if semantically equivalent prompts produce semantically equivalent responses. Despite recent advancements showcasing the impressive capabilities of LLMs in conversational systems, we show that even state-of-the-art LLMs are highly inconsistent in their generations, questioning their reliability. Prior research has tried to measure this with task-specific accuracy. However, this approach is unsuitable for moral scenarios, such as the trolley problem, with no "correct" answer. To address this issue, we propose a novel information-theoretic measure called Semantic Graph Entropy (SGE) to measure the consistency of an LLM in moral scenarios. We leverage "Rules of Thumb" (RoTs) to explain a model's decision-making strategies and further enhance our metric. Compared to existing consistency metrics, SGE correlates better with human judgments across five LLMs. In the future, we aim to investigate the root causes of LLM inconsistencies and propose improvements.
△ Less
Submitted 1 March, 2024; v1 submitted 26 January, 2024;
originally announced February 2024.
-
Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM
Authors:
Gabriel Ryan,
Siddhartha Jain,
Mingyue Shang,
Shiqi Wang,
Xiaofei Ma,
Murali Krishna Ramanathan,
Baishakhi Ray
Abstract:
Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software Testing (SBST) methods often struggle with complex software units, achieving suboptimal test coverage. Recent works using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs,…
▽ More
Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software Testing (SBST) methods often struggle with complex software units, achieving suboptimal test coverage. Recent works using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs, but use fixed prompting strategies that prompt the model to generate tests without additional guidance. As a result LLM-generated testsuites still suffer from low coverage. In this paper, we present SymPrompt, a code-aware prompting strategy for LLMs in test generation. SymPrompt's approach is based on recent work that demonstrates LLMs can solve more complex logical problems when prompted to reason about the problem in a multi-step fashion. We apply this methodology to test generation by deconstructing the testsuite generation process into a multi-stage sequence, each of which is driven by a specific prompt aligned with the execution paths of the method under test, and exposing relevant type and dependency focal context to the model. Our approach enables pretrained LLMs to generate more complete test cases without any additional training. We implement SymPrompt using the TreeSitter parsing framework and evaluate on a benchmark challenging methods from open source Python projects. SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2. Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.
△ Less
Submitted 2 April, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.