Search | arXiv e-print repository

RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics

Authors: Wentao Yuan, Jiafei Duan, Valts Blukis, Wilbert Pumacay, Ranjay Krishna, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox

Abstract: From rearranging objects on a table to putting groceries into shelves, robots must plan precise action points to perform tasks accurately and reliably. In spite of the recent adoption of vision language models (VLMs) to control robot behavior, VLMs struggle to precisely articulate robot actions using language. We introduce an automatic synthetic data generation pipeline that instruction-tunes VLMs… ▽ More From rearranging objects on a table to putting groceries into shelves, robots must plan precise action points to perform tasks accurately and reliably. In spite of the recent adoption of vision language models (VLMs) to control robot behavior, VLMs struggle to precisely articulate robot actions using language. We introduce an automatic synthetic data generation pipeline that instruction-tunes VLMs to robotic domains and needs. Using the pipeline, we train RoboPoint, a VLM that predicts image keypoint affordances given language instructions. Compared to alternative approaches, our method requires no real-world data collection or human demonstration, making it much more scalable to diverse environments and viewpoints. In addition, RoboPoint is a general model that enables several downstream applications such as robot navigation, manipulation, and augmented reality (AR) assistance. Our experiments demonstrate that RoboPoint outperforms state-of-the-art VLMs (GPT-4o) and visual prompting techniques (PIVOT) by 21.8% in the accuracy of predicting spatial affordance and by 30.5% in the success rate of downstream tasks. Project website: https://robo-point.github.io. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.09617 [pdf, other]

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Authors: Shruti Palaskar, Oggi Rudovic, Sameer Dharur, Florian Pesce, Gautam Krishna, Aswin Sivaraman, Jack Berkowitz, Ahmed Hussen Abdelaziz, Saurabh Adya, Ahmed Tewfik

Abstract: Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to… ▽ More Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low rank adaptation. For device-directed speech detection, using FLoRA, the multimodal LLM achieves 22% relative reduction in equal error rate (EER) over the text-only approach and attains performance parity with its full fine-tuning (FFT) counterpart while needing to tune only a fraction of its parameters. Furthermore, with the newly introduced adapter dropout, FLoRA is robust to missing data, improving over FFT by 20% lower EER and 56% lower false accept rate. The proposed approach scales well for model sizes from 16M to 3B parameters. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted at Interspeech 2024

arXiv:2406.09403 [pdf, other]

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

Authors: Yushi Hu, Weijia Shi, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Ranjay Krishna

Abstract: Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In t… ▽ More Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad. The LM conducts planning and reasoning according to the visual artifacts it has drawn. Different from prior work, which uses text-to-image models to enable LMs to draw, Sketchpad enables LMs to draw with lines, boxes, marks, etc., which is closer to human sketching and better facilitates reasoning. Sketchpad can also use specialist vision models during the sketching process (e.g., draw bounding boxes with object detection models, draw masks with segmentation models), to further enhance visual perception and reasoning. We experiment with a wide range of math tasks (including geometry, functions, graphs, and chess) and complex visual reasoning tasks. Sketchpad substantially improves performance on all tasks over strong base models with no sketching, yielding an average gain of 12.7% on math tasks, and 8.6% on vision tasks. GPT-4o with Sketchpad sets a new state of the art on all tasks, including V*Bench (80.3%), BLINK spatial reasoning (83.9%), and visual correspondence (80.8%). All codes and data are in https://visualsketchpad.github.io/. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 26 pages

arXiv:2406.09264 [pdf, other]

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. In particular, ML- and philosophy-oriented alignment research often views AI alignment as a static, unidirectional process (i.e., aiming to ensure that AI systems' objectives match humans) rather than an ongoing, mutual alignment problem [429]. This perspective largely neglects the long-term interaction and dynamic changes of alignment. To understand these gaps, we introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML), and others. We characterize, define and scope human-AI alignment. From this, we present a conceptual framework of "Bidirectional Human-AI Alignment" to organize the literature from a human-centered perspective. This framework encompasses both 1) conventional studies of aligning AI to humans that ensures AI produces the intended outcomes determined by humans, and 2) a proposed concept of aligning humans to AI, which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally. Additionally, we articulate the key findings derived from literature analysis, including discussions about human values, interaction techniques, and evaluations. To pave the way for future studies, we envision three key challenges for future directions and propose examples of potential future solutions. △ Less

Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: 56 pages

arXiv:2406.08803 [pdf, other]

Asymptotic Birkhoff-Violation in Operational Theories: Thermodynamic Implications and Information Processing

Authors: Ananya Chakraborty, Sahil Gopalkrishna Naik, Samrat Sen, Ram Krishna Patra, Pratik Ghosal, Mir Alimuddin, Manik Banik

Abstract: In accordance with the entropy principle of thermodynamics, under spontaneous evolutions, physical systems always evolve towards states with equal or greater randomness. But, where does this randomness originate? Renowned Birkhoff-von Neumann theorem, often referred to as Birkhoff theorem, identifies source of this randomness to be the stochastic application of reversible operations on the system… ▽ More In accordance with the entropy principle of thermodynamics, under spontaneous evolutions, physical systems always evolve towards states with equal or greater randomness. But, where does this randomness originate? Renowned Birkhoff-von Neumann theorem, often referred to as Birkhoff theorem, identifies source of this randomness to be the stochastic application of reversible operations on the system under study, thereby ensuring its epistemic origin. Analogue of this theorem is known to fail in the quantum case. Here, we extend this investigation beyond quantum mechanics to a broader class of operational theories described within the framework of general probabilistic theories (GPTs). In this generalized framework, we establish Birkhoff-violation as the prevalent trait; in fact the asymptotic variant of the theorem gets violated. We then demonstrate that Birkhoff-violation in GPTs can lead to consequences that are atypical to quantum theory. For instance, we report manifestation of Birkhoff-violation in a communication task, which otherwise is not observed in quantum world. We also show that, unlike the quantum case, in other operational theories the state transformation criteria can be distinct under mixtures of reversible transformations and doubly stochastic evolutions, leading to different resource theories of purity. Despite these exotic implications, we analyze how to define a coherent notion of entropy in this generalized framework, while upholding alignment with von Neumann's thought experiment. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: (4.25 + 7) Pages, 6 Figures, Comments are welcome

arXiv:2406.08714 [pdf, other]

Real-time Digital RF Emulation -- II: A Near Memory Custom Accelerator

Authors: Mandovi Mukherjee, Xiangyu Mao, Nael Rahman, Coleman DeLude, Joe Driscoll, Sudarshan Sharma, Payman Behnam, Uday Kamal, Jongseok Woo, Daehyun Kim, Sharjeel Khan, Jianming Tong, Jamin Seo, Prachi Sinha, Madhavan Swaminathan, Tushar Krishna, Santosh Pande, Justin Romberg, Saibal Mukhopadhyay

Abstract: A near memory hardware accelerator, based on a novel direct path computational model, for real-time emulation of radio frequency systems is demonstrated. Our evaluation of hardware performance uses both application-specific integrated circuits (ASIC) and field programmable gate arrays (FPGA) methodologies: 1). The ASIC testchip implementation, using TSMC 28nm CMOS, leverages distributed autonomous… ▽ More A near memory hardware accelerator, based on a novel direct path computational model, for real-time emulation of radio frequency systems is demonstrated. Our evaluation of hardware performance uses both application-specific integrated circuits (ASIC) and field programmable gate arrays (FPGA) methodologies: 1). The ASIC testchip implementation, using TSMC 28nm CMOS, leverages distributed autonomous control to extract concurrency in compute as well as low latency. It achieves a $518$ MHz per channel bandwidth in a prototype $4$-node system. The maximum emulation range supported in this paradigm is $9.5$ km with $0.24$ $μ$s of per-sample emulation latency. 2). The FPGA-based implementation, evaluated on a Xilinx ZCU104 board, demonstrates a $9$-node test case (two Transmitters, one Receiver, and $6$ passive reflectors) with an emulation range of $1.13$ km to $27.3$ km at $215$ MHz bandwidth. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08504 [pdf, ps, other]

Noncommutative Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle

Authors: K. Mahesh Krishna

Abstract: Let $\{τ_n\}_{n=1}^\infty$ and $\{ω_m\}_{m=1}^\infty$ be two modular Parseval frames for a Hilbert C*-module $\mathcal{E}$. Then for every $x \in \mathcal{E}\setminus\{0\}$, we show that \begin{align} (1) \quad \quad \quad \quad \|θ_τx \|_0 \|θ_ωx \|_0 \geq \frac{1}{\sup_{n, m \in \mathbb{N}} \|\langle τ_n, ω_m\rangle \|^2}. \end{align} We call Inequality (1) as \textbf{Noncommutative Donoho-Stark… ▽ More Let $\{τ_n\}_{n=1}^\infty$ and $\{ω_m\}_{m=1}^\infty$ be two modular Parseval frames for a Hilbert C*-module $\mathcal{E}$. Then for every $x \in \mathcal{E}\setminus\{0\}$, we show that \begin{align} (1) \quad \quad \quad \quad \|θ_τx \|_0 \|θ_ωx \|_0 \geq \frac{1}{\sup_{n, m \in \mathbb{N}} \|\langle τ_n, ω_m\rangle \|^2}. \end{align} We call Inequality (1) as \textbf{Noncommutative Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle}. Inequality (1) is the noncommutative analogue of breakthrough Ricaud-Torrésani uncertainty principle \textit{[IEEE Trans. Inform. Theory, 2013]}. In particular, Inequality (1) extends Elad-Bruckstein uncertainty principle \textit{[IEEE Trans. Inform. Theory, 2002]} and Donoho-Stark uncertainty principle \textit{[SIAM J. Appl. Math., 1989]}. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 5 pages, 0 figures

MSC Class: 42C15; 46L08

arXiv:2406.07892 [pdf, ps, other]

Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP

Authors: Tejaram Sangadi, L. A. Prashanth, Krishna Jagannathan

Abstract: Motivated by risk-sensitive reinforcement learning scenarios, we consider the problem of policy evaluation for variance in a discounted reward Markov decision process (MDP). For this problem, a temporal difference (TD) type learning algorithm with linear function approximation (LFA) exists in the literature, though only asymptotic guarantees are available for this algorithm. We derive finite sampl… ▽ More Motivated by risk-sensitive reinforcement learning scenarios, we consider the problem of policy evaluation for variance in a discounted reward Markov decision process (MDP). For this problem, a temporal difference (TD) type learning algorithm with linear function approximation (LFA) exists in the literature, though only asymptotic guarantees are available for this algorithm. We derive finite sample bounds that hold (i) in the mean-squared sense; and (ii) with high probability, when tail iterate averaging is employed with/without regularization. Our bounds exhibit exponential decay for the initial error, while the overall bound is $O(1/t)$, where $t$ is the number of update iterations of the TD algorithm. Further, the bound for the regularized TD variant is for a universal step size. Our bounds open avenues for analysis of actor-critic algorithms for mean-variance optimization in a discounted MDP. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07332 [pdf, other]

Minimizing Energy Costs in Deep Learning Model Training: The Gaussian Sampling Approach

Authors: Challapalli Phanindra Revanth, Sumohana S. Channappayya, C Krishna Mohan

Abstract: Computing the loss gradient via backpropagation consumes considerable energy during deep learning (DL) model training. In this paper, we propose a novel approach to efficiently compute DL models' gradients to mitigate the substantial energy overhead associated with backpropagation. Exploiting the over-parameterized nature of DL models and the smoothness of their loss landscapes, we propose a metho… ▽ More Computing the loss gradient via backpropagation consumes considerable energy during deep learning (DL) model training. In this paper, we propose a novel approach to efficiently compute DL models' gradients to mitigate the substantial energy overhead associated with backpropagation. Exploiting the over-parameterized nature of DL models and the smoothness of their loss landscapes, we propose a method called {\em GradSamp} for sampling gradient updates from a Gaussian distribution. Specifically, we update model parameters at a given epoch (chosen periodically or randomly) by perturbing the parameters (element-wise) from the previous epoch with Gaussian ``noise''. The parameters of the Gaussian distribution are estimated using the error between the model parameter values from the two previous epochs. {\em GradSamp} not only streamlines gradient computation but also enables skip** entire epochs, thereby enhancing overall efficiency. We rigorously validate our hypothesis across a diverse set of standard and non-standard CNN and transformer-based models, spanning various computer vision tasks such as image classification, object detection, and image segmentation. Additionally, we explore its efficacy in out-of-distribution scenarios such as Domain Adaptation (DA), Domain Generalization (DG), and decentralized settings like Federated Learning (FL). Our experimental results affirm the effectiveness of {\em GradSamp} in achieving notable energy savings without compromising performance, underscoring its versatility and potential impact in practical DL applications. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07246 [pdf, other]

Marginalization Consistent Mixture of Separable Flows for Probabilistic Irregular Time Series Forecasting

Authors: Vijaya Krishna Yalavarthi, Randolf Scholz, Kiran Madhusudhanan, Stefan Born, Lars Schmidt-Thieme

Abstract: Probabilistic forecasting models for joint distributions of targets in irregular time series are a heavily under-researched area in machine learning with, to the best of our knowledge, only three models researched so far: GPR, the Gaussian Process Regression model~\citep{Durichen2015.Multitask}, TACTiS, the Transformer-Attentional Copulas for Time Series~\cite{Drouin2022.Tactis, ashok2024tactis} a… ▽ More Probabilistic forecasting models for joint distributions of targets in irregular time series are a heavily under-researched area in machine learning with, to the best of our knowledge, only three models researched so far: GPR, the Gaussian Process Regression model~\citep{Durichen2015.Multitask}, TACTiS, the Transformer-Attentional Copulas for Time Series~\cite{Drouin2022.Tactis, ashok2024tactis} and ProFITi \citep{Yalavarthi2024.Probabilistica}, a multivariate normalizing flow model based on invertible attention layers. While ProFITi, thanks to using multivariate normalizing flows, is the more expressive model with better predictive performance, we will show that it suffers from marginalization inconsistency: it does not guarantee that the marginal distributions of a subset of variables in its predictive distributions coincide with the directly predicted distributions of these variables. Also, TACTiS does not provide any guarantees for marginalization consistency. We develop a novel probabilistic irregular time series forecasting model, Marginalization Consistent Mixtures of Separable Flows (moses), that mixes several normalizing flows with (i) Gaussian Processes with full covariance matrix as source distributions and (ii) a separable invertible transformation, aiming to combine the expressivity of normalizing flows with the marginalization consistency of Gaussians. In experiments on four different datasets we show that moses outperforms other state-of-the-art marginalization consistent models, performs on par with ProFITi, but different from ProFITi, guarantee marginalization consistency. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06712 [pdf, ps, other]

Classification of Non-Degenerate Symmetric Bilinear and Quadratic Forms in the Verlinde Category $\mathrm{Ver}_4^+$

Authors: Iz Chen, Arun S. Kannan, Krishna Pothapragada

Abstract: Although Deligne's theorem classifies all symmetric tensor categories (STCs) with moderate growth over algebraically closed fields of characteristic zero, the classification does not extend to positive characteristic. At the forefront of the study of STCs is the search for an analog to Deligne's theorem in positive characteristic, and it has become increasingly apparent that the Verlinde categorie… ▽ More Although Deligne's theorem classifies all symmetric tensor categories (STCs) with moderate growth over algebraically closed fields of characteristic zero, the classification does not extend to positive characteristic. At the forefront of the study of STCs is the search for an analog to Deligne's theorem in positive characteristic, and it has become increasingly apparent that the Verlinde categories are to play a significant role. Moreover, these categories are largely unstudied, but have already shown very interesting phenomena as both a generalization of and a departure from superalgebra and supergeometry. In this paper, we study $\mathrm{Ver}_4^+$, the simplest non-trivial Verlinde category in characteristic $2$. In particular, we classify all isomorphism classes of non-degenerate symmetric bilinear forms and non-degenerate quadratic forms and study the associated Witt semi-ring that arises from the addition and multiplication operations on bilinear forms. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06361 [pdf, ps, other]

Challenges with Differentiable Quantum Dynamics

Authors: Sri Hari Krishna Narayanan, Michael Perlin, Robert Lewis-Swan, Jeffrey Larson, Matt Menickelly, Jan Hückelheim, Paul Hovland

Abstract: Differentiable quantum dynamics require automatic differentiation of a complex-valued initial value problem, which numerically integrates a system of ordinary differential equations from a specified initial condition, as well as the eigendecomposition of a matrix. We explored several automatic differentiation frameworks for these tasks, finding that no framework natively supports our application r… ▽ More Differentiable quantum dynamics require automatic differentiation of a complex-valued initial value problem, which numerically integrates a system of ordinary differential equations from a specified initial condition, as well as the eigendecomposition of a matrix. We explored several automatic differentiation frameworks for these tasks, finding that no framework natively supports our application requirements. We therefore demonstrate a need for broader support of complex-valued, differentiable numerical integration in scientific computing libraries. △ Less

Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05342 [pdf]

Compensation for reactive power and harmonic current drawn by a non-linear load in a pv-micro hydro grid

Authors: Raj Krishna Nepal, Bibek Khanal, Sanket Khatiwada, Nirajan Bhandari, Bishal Rijal, Raisha Karmacharya, Ajay Thapa

Abstract: This paper presents a simulation approach to enhance the power quality of a PV-micro hydro grid supplying both linear consumer load and non-linear industrial load by integrating Shunt Active Power Filter (SAPF), utilizing instantaneous PQ theory and hysteresis current control band logic. The non-linear load draws reactive power and harmonic current from the source thereby affecting the power quali… ▽ More This paper presents a simulation approach to enhance the power quality of a PV-micro hydro grid supplying both linear consumer load and non-linear industrial load by integrating Shunt Active Power Filter (SAPF), utilizing instantaneous PQ theory and hysteresis current control band logic. The non-linear load draws reactive power and harmonic current from the source thereby affecting the power quality. The integration of the SAPF at the point of common coupling (PCC) offers reactive power and harmonic current compensation, ensuring that the current supply to the grid remains nearly sinusoidal and proportional to the active power. By injecting equal and opposite harmonic components, the SAPF effectively reduces Total Harmonic Distortion (THD) from 7% to 2.96%, thereby enhancing the overall power quality of the PV-micro hydro grid system. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 5 pages, 21 figures, submitted on IEEE powercon 2024 conference

arXiv:2406.05184 [pdf, other]

The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

Authors: Scott Geng, Cheng-Yu Hsieh, Vivek Ramanujan, Matthew Wallingford, Chun-Liang Li, Pang Wei Koh, Ranjay Krishna

Abstract: Generative text-to-image models enable us to synthesize unlimited amounts of images in a controllable manner, spurring many recent efforts to train vision models with synthetic data. However, every synthetic image ultimately originates from the upstream data used to train the generator. What additional value does the intermediate generator provide over directly training on relevant parts of the up… ▽ More Generative text-to-image models enable us to synthesize unlimited amounts of images in a controllable manner, spurring many recent efforts to train vision models with synthetic data. However, every synthetic image ultimately originates from the upstream data used to train the generator. What additional value does the intermediate generator provide over directly training on relevant parts of the upstream data? Grounding this question in the setting of image classification,a we compare finetuning on task-relevant, targeted synthetic data generated by Stable Diffusion -- a generative model trained on the LAION-2B dataset -- against finetuning on targeted real images retrieved directly from LAION-2B. We show that while synthetic data can benefit some downstream tasks, it is universally matched or outperformed by real data from our simple retrieval baseline. Our analysis suggests that this underperformance is partially due to generator artifacts and inaccurate task-relevant visual details in the synthetic images. Overall, we argue that retrieval is a critical baseline to consider when training with synthetic data -- a baseline that current methods do not yet surpass. We release code, data, and models at https://github.com/scottgeng00/unmet-promise. △ Less

Submitted 3 July, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

Comments: Correspondence to sgeng at cs dot washington dot edu. RK and PWK equally advised the project

arXiv:2406.05112 [pdf]

Ohms law lost and regained: observation and impact of zeros and poles

Authors: Krishna Joshi, Israel Kurtz, Zhou Shi, Azriel Z. Genack

Abstract: The quantum conductance and its classical wave analogue, the transmittance, are given by the sum of the eigenvalues of the transmission matrix. The lowest transmission eigenvalue in diffusive media might be expected to play a negligible role in the conductance, and, in any case, to be too small to be observed. Here, we observe the lowest transmission eigenchannel in microwave waveguides, though it… ▽ More The quantum conductance and its classical wave analogue, the transmittance, are given by the sum of the eigenvalues of the transmission matrix. The lowest transmission eigenvalue in diffusive media might be expected to play a negligible role in the conductance, and, in any case, to be too small to be observed. Here, we observe the lowest transmission eigenchannel in microwave waveguides, though it is orders of magnitude below the nominal noise level, and show that the transmittance is pulled down by global correlation among transmission eigenvalues and among zeros and poles of the transmission matrix. Transmission vanishes either when the energy density on the sample output vanishes at topological transmission zeros or when the longitudinal velocity vanishes precisely at the crossover to a new channel. This lowers the conductance by an amount proportional to the modulation of the density of states. In accord with the correspondence principle, the conductance approaches Ohms law as the number of channels increases with sample width. The exploration of the transmission matrix opens the door to a new understanding of mesoscopic transport and ultrasensitive detection techniques. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.04548 [pdf, other]

GNNAnatomy: Systematic Generation and Evaluation of Multi-Level Explanations for Graph Neural Networks

Authors: Hsiao-Ying Lu, Yiran Li, Ujwal Pratap Krishna Kaluvakolanu Thyagarajan, Kwan-Liu Ma

Abstract: Graph Neural Networks (GNNs) have proven highly effective in various machine learning (ML) tasks involving graphs, such as node/graph classification and link prediction. However, explaining the decisions made by GNNs poses challenges because of the aggregated relational information based on graph structure, leading to complex data transformations. Existing methods for explaining GNNs often face li… ▽ More Graph Neural Networks (GNNs) have proven highly effective in various machine learning (ML) tasks involving graphs, such as node/graph classification and link prediction. However, explaining the decisions made by GNNs poses challenges because of the aggregated relational information based on graph structure, leading to complex data transformations. Existing methods for explaining GNNs often face limitations in systematically exploring diverse substructures and evaluating results in the absence of ground truths. To address this gap, we introduce GNNAnatomy, a model- and dataset-agnostic visual analytics system designed to facilitate the generation and evaluation of multi-level explanations for GNNs. In GNNAnatomy, we employ graphlets to elucidate GNN behavior in graph-level classification tasks. By analyzing the associations between GNN classifications and graphlet frequencies, we formulate hypothesized factual and counterfactual explanations. To validate a hypothesized graphlet explanation, we introduce two metrics: (1) the correlation between its frequency and the classification confidence, and (2) the change in classification confidence after removing this substructure from the original graph. To demonstrate the effectiveness of GNNAnatomy, we conduct case studies on both real-world and synthetic graph datasets from various domains. Additionally, we qualitatively compare GNNAnatomy with a state-of-the-art GNN explainer, demonstrating the utility and versatility of our design. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03093 [pdf, other]

Modelling the propagation of slow magneto-acoustic waves in a multi-stranded coronal loop

Authors: S. Krishna Prasad, T. Van Doorsselaere

Abstract: We study the propagation properties of slow magneto-acoustic waves in a multi-thermal coronal loop using a 3D MHD model, for the first time. A bundle of 33 vertical cylinders, each of 100{\,}km radius, randomly distributed over a circular region of radius 1{\,}Mm is considered to represent the coronal loop. The slow waves are driven by perturbing the vertical velocity ($v_z$) at the base of the lo… ▽ More We study the propagation properties of slow magneto-acoustic waves in a multi-thermal coronal loop using a 3D MHD model, for the first time. A bundle of 33 vertical cylinders, each of 100{\,}km radius, randomly distributed over a circular region of radius 1{\,}Mm is considered to represent the coronal loop. The slow waves are driven by perturbing the vertical velocity ($v_z$) at the base of the loop. We apply forward modelling to the simulation results to generate synthetic images in the coronal channels of SDO/AIA. Furthermore, we add appropriate data noise to enable direct comparison with the real observations. It is found that the synthetic images at the instrument resolution show non-cospatial features in different temperature channels in agreement with previous observations. Time-distance maps are constructed from the synthetic data to study the propagation properties. The results indicate that the oscillations are only visible in specific channels depending on the temperature range of plasma existing within the loop. Additionally, the propagation speed of slow waves is also found to be sensitive to the available temperature range. Overall, we propose that the cross-field thermal properties of coronal structures can be inferred using a combination of numerical simulations and observations of slow magneto-acoustic waves. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted for publication in ApJ

arXiv:2406.01859 [pdf, other]

Variational quantum state preparation for quantum-enhanced metrology in noisy systems

Authors: Juan C. Zuñiga Castro, Jeffrey Larson, Sri Hari Krishna Narayanan, Victor E. Colussi, Michael A. Perlin, Robert J. Lewis-Swan

Abstract: We investigate optimized quantum state preparation for quantum metrology applications in noisy environments. We simulate a low-depth variational quantum circuit (VQC) composed of a sequence of global rotations and entangling operations applied to a chain of qubits that are subject to dephasing noise. The parameters controlling the VQC are numerically optimized to maximize the quantum Fisher inform… ▽ More We investigate optimized quantum state preparation for quantum metrology applications in noisy environments. We simulate a low-depth variational quantum circuit (VQC) composed of a sequence of global rotations and entangling operations applied to a chain of qubits that are subject to dephasing noise. The parameters controlling the VQC are numerically optimized to maximize the quantum Fisher information, which characterizes the ultimate metrological sensitivity of a quantum state, with respect to a global rotation. We find that regardless of the details of the entangling operation implemented in the VQC, the optimal quantum states can be broadly classified into a trio of qualitative regimes -- cat-like, squeezed-like, and product states -- associated with different dephasing rates. Our findings are relevant for designing optimal state-preparation strategies for next-generation quantum sensors exploiting entanglement, such as time and frequency standards and magnetometers, aimed at achieving state-of-the-art performance in the presence of noise and decoherence. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 11 pages, 5 figures

arXiv:2406.01698 [pdf, other]

Demystifying Platform Requirements for Diverse LLM Inference Use Cases

Authors: Abhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna

Abstract: Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these parameter-heavy models efficiently for diverse inference use cases requires carefully designed hardware platforms with ample computing, memory, and network resources. With LLM deployment scenarios and models evolving at breakneck speed, the… ▽ More Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these parameter-heavy models efficiently for diverse inference use cases requires carefully designed hardware platforms with ample computing, memory, and network resources. With LLM deployment scenarios and models evolving at breakneck speed, the hardware requirements to meet SLOs remains an open research question. In this work, we present an analytical tool, GenZ, to study the relationship between LLM inference performance and various platform design parameters. Our analysis provides insights into configuring platforms for different LLM workloads and use cases. We quantify the platform requirements to support SOTA LLMs models like LLaMA and GPT-4 under diverse serving settings. Furthermore, we project the hardware capabilities needed to enable future LLMs potentially exceeding hundreds of trillions of parameters. The trends and insights derived from GenZ can guide AI engineers deploying LLMs as well as computer architects designing next-generation hardware accelerators and platforms. Ultimately, this work sheds light on the platform design considerations for unlocking the full potential of large language models across a spectrum of applications. The source code is available at https://github.com/abhibambhaniya/GenZ-LLM-Analyzer . △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 12 Pages, https://github.com/abhibambhaniya/GenZ-LLM-Analyzer

arXiv:2406.01000 [pdf, other]

doi 10.1016/j.asr.2024.06.040

Seasonal variation in nighttime NO radiative cooling as observed by TIMED/SABER in lower thermosphere during solar maximum and solar minimum

Authors: Alok Kumar Ranjan, MV Sunil Krishna, Akash Kumar, Dayakrishna Nailwal, Sumanta Sarkhel

Abstract: Both composition and temperature play a crucial role in determining the NO radiative cooling in lower thermosphere as observed by TIMED/SABER. In this work, we present a detailed investigation of seasonal variation in thermospheric NO radiative cooling. We have carried forward the investigation of \cite{li2018} regarding the variations in local nighttime peak NO radiative cooling and its altitude… ▽ More Both composition and temperature play a crucial role in determining the NO radiative cooling in lower thermosphere as observed by TIMED/SABER. In this work, we present a detailed investigation of seasonal variation in thermospheric NO radiative cooling. We have carried forward the investigation of \cite{li2018} regarding the variations in local nighttime peak NO radiative cooling and its altitude during solar maximum and solar minimum conditions. By analyzing latitudinal changes over quiet times for each month in year 2018, it is evident that both the investigative parameters exhibit summer-winter variability. The qualitative contribution of different species (i.e., NO, and O), and temperatures in determining the vertical profile of NO radiative cooling for different latitudes is investigated by utilizing the NRLMSISE-00 estimated parameters, and SNOE observed NO density. The temperature, NO density, meridional wind, and associated compositional variations due to asymmetrical solar heating in both the hemispheres during solar minimum conditions seem to be the dominating factor in controlling the NO radiative cooling during different seasons. The altitudes at which maximum cooling by NO occurs exhibits an inverse correlation with the amount of radiative cooling. The region of enhanced NO densities (polar and summer hemispheric low-mid latitude regions) have larger NO radiative cooling with lower peak altitudes in comparison to other regions (equatorial to winter hemispheric low-mid latitude regions), where NO radiative cooling is low with higher peak altitude values. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 19 pages, 10 figures

arXiv:2406.00060 [pdf, other]

Cascade-Aware Training of Language Models

Authors: Congchao Wang, Sean Augenstein, Keith Rush, Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Aditya Krishna Menon, Alec Go

Abstract: Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for simpler queries. Cascaded systems are typically built with independently trained models, neglecting the advantages of considering inference-time interactions of the… ▽ More Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for simpler queries. Cascaded systems are typically built with independently trained models, neglecting the advantages of considering inference-time interactions of the cascaded LMs during training. In this paper, we present cascade-aware training(CAT), an approach to optimizing the overall quality-cost performance tradeoff of a cascade of LMs. We achieve inference-time benefits by training the small LM with awareness of its place in a cascade and downstream capabilities. We demonstrate the value of the proposed method with over 60 LM tasks of the SuperGLUE, WMT22, and FLAN2021 datasets. △ Less

Submitted 29 May, 2024; originally announced June 2024.

Comments: 22 pages, 13 figures

arXiv:2405.20933 [pdf, ps, other]

Concentration Bounds for Optimized Certainty Equivalent Risk Estimation

Authors: Ayon Ghosh, L. A. Prashanth, Krishna Jagannathan

Abstract: We consider the problem of estimating the Optimized Certainty Equivalent (OCE) risk from independent and identically distributed (i.i.d.) samples. For the classic sample average approximation (SAA) of OCE, we derive mean-squared error as well as concentration bounds (assuming sub-Gaussianity). Further, we analyze an efficient stochastic approximation-based OCE estimator, and derive finite sample b… ▽ More We consider the problem of estimating the Optimized Certainty Equivalent (OCE) risk from independent and identically distributed (i.i.d.) samples. For the classic sample average approximation (SAA) of OCE, we derive mean-squared error as well as concentration bounds (assuming sub-Gaussianity). Further, we analyze an efficient stochastic approximation-based OCE estimator, and derive finite sample bounds for the same. To show the applicability of our bounds, we consider a risk-aware bandit problem, with OCE as the risk. For this problem, we derive bound on the probability of mis-identification. Finally, we conduct numerical experiments to validate the theoretical findings. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20654 [pdf, other]

Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models

Authors: Xuyang Wu, Zhiyuan Peng, Krishna Sravanthi Rajanala Sai, Hsin-Tai Wu, Yi Fang

Abstract: Effective passage retrieval and reranking methods have been widely utilized to identify suitable candidates in open-domain question answering tasks, recent studies have resorted to LLMs for reranking the retrieved passages by the log-likelihood of the question conditioned on each passage. Although these methods have demonstrated promising results, the performance is notably sensitive to the human-… ▽ More Effective passage retrieval and reranking methods have been widely utilized to identify suitable candidates in open-domain question answering tasks, recent studies have resorted to LLMs for reranking the retrieved passages by the log-likelihood of the question conditioned on each passage. Although these methods have demonstrated promising results, the performance is notably sensitive to the human-written prompt (or hard prompt), and fine-tuning LLMs can be computationally intensive and time-consuming. Furthermore, this approach limits the leverage of question-passage relevance pairs and passage-specific knowledge to enhance the ranking capabilities of LLMs. In this paper, we propose passage-specific prompt tuning for reranking in open-domain question answering (PSPT): a parameter-efficient method that fine-tunes learnable passage-specific soft prompts, incorporating passage-specific knowledge from a limited set of question-passage relevance pairs. The method involves ranking retrieved passages based on the log-likelihood of the model generating the question conditioned on each passage and the learned soft prompt. We conducted extensive experiments utilizing the Llama-2-chat-7B model across three publicly available open-domain question answering datasets and the results demonstrate the effectiveness of the proposed approach. △ Less

Submitted 20 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

Comments: Accepted at Gen-IR@SIGIR24

arXiv:2405.20617 [pdf, other]

Large-scale Outdoor Cell-free mMIMO Channel Measurement in an Urban Scenario at 3.5 GHz

Authors: Yuning Zhang, Thomas Choi, Zihang Cheng, Issei Kanno, Masaaki Ito, Jorge Gomez-Ponce, Hussein Hammoud, Bowei Wu, Ashwani Pradhan, Kelvin Arana, Pramod Krishna, Tianyi Yang, Tyler Chen, Ishita Vasishtha, Haoyu Xie, Linyu Sun, Andreas F. Molisch

Abstract: The design of cell-free massive MIMO (CF-mMIMO) systems requires accurate, measurement-based channel models. This paper provides the first results from the by far most extensive outdoor measurement campaign for CF-mMIMO channels in an urban environment. We measured impulse responses between over 20,000 potential access point (AP) locations and 80 user equipments (UEs) at 3.5 GHz with 350 MHz bandw… ▽ More The design of cell-free massive MIMO (CF-mMIMO) systems requires accurate, measurement-based channel models. This paper provides the first results from the by far most extensive outdoor measurement campaign for CF-mMIMO channels in an urban environment. We measured impulse responses between over 20,000 potential access point (AP) locations and 80 user equipments (UEs) at 3.5 GHz with 350 MHz bandwidth (BW). Measurements use a "virtual array" approach at the AP and a hybrid switched/virtual approach at the UE. This paper describes the sounder design, measurement environment, data processing, and sample results, particularly the evolution of the power-delay profiles (PDPs) as a function of the AP locations, and its relation to the propagation environment. △ Less

Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

Comments: Submitted to: VTC 2024-Fall

arXiv:2405.20457 [pdf, other]

Online network topology shapes personal narratives and hashtag generation

Authors: J. Hunter Priniski, Bryce Linford, Sai Krishna, Fred Morstatter, Jeff Brantingham, Hong**g Lu

Abstract: While narratives have shaped cognition and cultures for centuries, digital media and online social networks have introduced new narrative phenomena. With increased narrative agency, networked groups of individuals can directly contribute and steer narratives that center our collective discussions of politics, science, and morality. We report the results of an online network experiment on narrative… ▽ More While narratives have shaped cognition and cultures for centuries, digital media and online social networks have introduced new narrative phenomena. With increased narrative agency, networked groups of individuals can directly contribute and steer narratives that center our collective discussions of politics, science, and morality. We report the results of an online network experiment on narrative and hashtag generation, in which networked groups of participants interpreted a text-based narrative of a disaster event, and were incentivized to produce matching hashtags with their network neighbors. We found that network structure not only influences the emergence of dominant beliefs through coordination with network neighbors, but also impacts participants' use of causal language in their personal narratives. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Will be published in the 2024 Proceedings of the Cognitive Science Society

arXiv:2405.19801 [pdf, other]

Modeling of Nitric Oxide Infrared radiative flux in lower thermosphere: a machine learning perspective

Authors: Dayakrishna Nailwal, MV Sunil Krishna, Alok Kumar Ranjan, Jia Yue

Abstract: Nitric Oxide (NO) significantly impacts energy distribution and chemical processes in the mesosphere and lower thermosphere (MLT). During geomagnetic storms, a substantial influx of energy in the thermosphere leads to an increase in NO infrared emissions. Accurately predicting the radiative flux of Nitric Oxide is crucial for understanding the thermospheric energy budget, particularly during extre… ▽ More Nitric Oxide (NO) significantly impacts energy distribution and chemical processes in the mesosphere and lower thermosphere (MLT). During geomagnetic storms, a substantial influx of energy in the thermosphere leads to an increase in NO infrared emissions. Accurately predicting the radiative flux of Nitric Oxide is crucial for understanding the thermospheric energy budget, particularly during extreme space weather events. With advancements in computational techniques, machine learning (ML) has become a highly effective tool for space weather forecasting. This effort becomes even more worthwhile considering the availability of two decades of continuous NO infrared emissions measurement by TIMED/SABER along with several other key thermospheric variables. We present the scheme of development of an ML-based predictive model for Nitric Oxide Infrared Radiative Flux (NOIRF). Various ML algorithms have been tested for better predictive ability, and an optimized model (NOEMLM) has been developed for the study of NOIRF. This model is able to extract the underlying relationships between the input features and effectively predict the NOIRF. The NOEMLM predictions have very good agreements with SABER observation during quiet time as well as geomagnetic storms. In comparison with the existing TIEGCM model, NOEMLM has very good performance, especially during extreme space weather conditions. The results of this study suggest that utilizing geomagnetic and space weather indices with ML/AI can serve as superior parameters for studying the upper atmosphere, as compared to focusing on specific species having complex chemical processes and associated uncertainties in constituents. ML techniques can effectively carry out the analysis with greater ease than traditional chemical studies. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 18 pages, 7 figures

Journal ref: Under review in Advances in Space Research 2024

arXiv:2405.19597 [pdf, other]

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Authors: Vijay Lingam, Atula Tejaswi, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Alex Dimakis, Eunsol Choi, Aleksandar Bojchevski, Sujay Sanghavi

Abstract: Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights $W$ and inject learnable matrices $ΔW$. These $ΔW$ matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although… ▽ More Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights $W$ and inject learnable matrices $ΔW$. These $ΔW$ matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although recent PEFT methods have narrowed this gap, they do so at the cost of additional learnable parameters. We propose SVFT, a simple approach that fundamentally differs from existing methods: the structure imposed on $ΔW$ depends on the specific weight matrix $W$. Specifically, SVFT updates $W$ as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations. This approach allows fine-grained control over expressivity through the number of coefficients. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that only recover up to 85% performance using 0.03 to 0.8% of the trainable parameter budget. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 17 pages, 5 figures, 14 tables

arXiv:2405.19261 [pdf, other]

Faster Cascades via Speculative Decoding

Authors: Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar

Abstract: Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in p… ▽ More Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in parallel verification mode. These mechanisms offer different benefits: empirically, cascades are often capable of yielding better quality than even the larger model, while theoretically, speculative decoding offers a guarantee of quality-neutrality. In this paper, we leverage the best of both these approaches by designing new speculative cascading techniques that implement their deferral rule through speculative execution. We characterize the optimal deferral rule for our speculative cascades, and employ a plug-in approximation to the optimal rule. Through experiments with T5 models on benchmark language tasks, we show that the proposed approach yields better cost-quality trade-offs than cascading and speculative decoding baselines. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18400 [pdf, other]

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Authors: Ethan Shen, Alan Fan, Sarah M. Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati

Abstract: Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To… ▽ More Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To alleviate the computation cost of running $k$ inference passes, we propose Superposed Decoding, a new decoding algorithm that generates $k$ drafts at the computation cost of one autoregressive inference pass. We achieve this by feeding a superposition of the most recent token embeddings from the $k$ drafts as input to the next decoding step of the language model. At every inference step we combine the $k$ drafts with the top-$k$ tokens to get $k^2$ new drafts and cache the $k$ most likely options, using an n-gram interpolation with minimal compute overhead to filter out incoherent generations. Our experiments show that $k$ drafts from Superposed Decoding are at least as coherent and factual as Nucleus Sampling and Greedy Decoding respectively, while being at least $2.44\times$ faster for $k\ge3$. In a compute-normalized setting, user evaluations demonstrably favor text generated by Superposed Decoding over Nucleus Sampling. Code and more examples open-sourced at https://github.com/RAIVNLab/SuperposedDecoding. △ Less

Submitted 24 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: 22 pages, 15 figures

arXiv:2405.17731 [pdf, other]

Evaluating NoSQL Databases for OLAP Workloads: A Benchmarking Study of MongoDB, Redis, Kudu and ArangoDB

Authors: Rishi Kesav Mohan, Risheek Rakshit Sukumar Kanmani, Krishna Anandan Ganesan, Nisha Ramasubramanian

Abstract: In the era of big data, conventional RDBMS models have become impractical for handling colossal workloads. Consequently, NoSQL databases have emerged as the preferred storage solutions for executing processing-intensive Online Analytical Processing (OLAP) tasks. Within the realm of NoSQL databases, various classifications exist based on their data storage mechanisms, making it challenging to selec… ▽ More In the era of big data, conventional RDBMS models have become impractical for handling colossal workloads. Consequently, NoSQL databases have emerged as the preferred storage solutions for executing processing-intensive Online Analytical Processing (OLAP) tasks. Within the realm of NoSQL databases, various classifications exist based on their data storage mechanisms, making it challenging to select the most suitable one for a given OLAP workload. While each NoSQL database boasts distinct advantages, inherent scalability, adaptability to diverse data formats, and high data availability are universally recognized benefits crucial for managing OLAP workloads effectively. Existing research predominantly evaluates individual databases within custom data pipeline setups, lacking a standardized approach for comparative analysis across different databases to identify the optimal data pipeline for OLAP workloads. In this paper, we present our experimental insights into how various NoSQL databases handle OLAP workloads within a standardized data processing pipeline. Our experimental pipeline comprises Apache Spark for large-scale transformations, data cleansing, and schema normalization, diverse NoSQL databases as data stores, and a Business Intelligence tool for data analysis and visualization. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17545 [pdf, other]

Adiabatic Hydrodynamization and the Emergence of Attractors: a Unified Description of Hydrodynamization in Kinetic Theory

Authors: Krishna Rajagopal, Bruno Scheihing-Hitschfeld, Rachel Steinhorst

Abstract: "Attractor" solutions for the pre-hydrodynamic, far-from-equilibrium, evolution of the matter produced in relativistic heavy ion collisions have emerged as crucial descriptors of the rapid hydrodynamization of quark-gluon plasma (QGP). Adiabatic Hydrodynamization (AH) has been proposed as a framework with which to describe, explain, and predict attractor behavior that draws upon an analogy to the… ▽ More "Attractor" solutions for the pre-hydrodynamic, far-from-equilibrium, evolution of the matter produced in relativistic heavy ion collisions have emerged as crucial descriptors of the rapid hydrodynamization of quark-gluon plasma (QGP). Adiabatic Hydrodynamization (AH) has been proposed as a framework with which to describe, explain, and predict attractor behavior that draws upon an analogy to the adiabatic approximation in quantum mechanics. In this work, we systematize the description of pre-hydrodynamic attractors in kinetic theory by showing how to use the AH framework to identify these long-lived solutions to which varied initial conditions rapidly evolve, demonstrating the robustness of this framework. In a simplified QCD kinetic theory in the small-angle scattering limit, we use AH to explain both the early- and late-time scaling behavior of a longitudinally expanding gluon gas in a unified framework. In this context, we show that AH provides a unified description of, and intuition for, all the stages of what in QCD would be bottom-up thermalization, starting from a pre-hydrodynamic attractor and ending with hydrodynamization. We additionally discuss the connection between the notions of scaling behavior and adiabaticity and the crucial role of time-dependent coordinate redefinitions in identifying the degrees of freedom of kinetic theories that give rise to attractor solutions. The tools we present open a path to the intuitive explanation of how attractor behavior arises and how the attractor evolves in all stages of the hydrodynamization of QGP in heavy ion collisions. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 63 pages, 20 figures

Report number: MIT-CTP/5724

arXiv:2405.17309 [pdf, other]

Survey of Graph Neural Network for Internet of Things and NextG Networks

Authors: Sabarish Krishna Moorthy, Jithin Jagannath

Abstract: The exponential increase in Internet of Things (IoT) devices coupled with 6G pushing towards higher data rates and connected devices has sparked a surge in data. Consequently, harnessing the full potential of data-driven machine learning has become one of the important thrusts. In addition to the advancement in wireless technology, it is important to efficiently use the resources available and mee… ▽ More The exponential increase in Internet of Things (IoT) devices coupled with 6G pushing towards higher data rates and connected devices has sparked a surge in data. Consequently, harnessing the full potential of data-driven machine learning has become one of the important thrusts. In addition to the advancement in wireless technology, it is important to efficiently use the resources available and meet the users' requirements. Graph Neural Networks (GNNs) have emerged as a promising paradigm for effectively modeling and extracting insights which inherently exhibit complex network structures due to its high performance and accuracy, scalability, adaptability, and resource efficiency. There is a lack of a comprehensive survey that focuses on the applications and advances GNN has made in the context of IoT and Next Generation (NextG) networks. To bridge that gap, this survey starts by providing a detailed description of GNN's terminologies, architecture, and the different types of GNNs. Then we provide a comprehensive survey of the advancements in applying GNNs for IoT from the perspective of data fusion and intrusion detection. Thereafter, we survey the impact GNN has made in improving spectrum awareness. Next, we provide a detailed account of how GNN has been leveraged for networking and tactical systems. Through this survey, we aim to provide a comprehensive resource for researchers to learn more about GNN in the context of wireless networks, and understand its state-of-the-art use cases while contrasting to other machine learning approaches. Finally, we also discussed the challenges and wide range of future research directions to further motivate the use of GNN for IoT and NextG Networks. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16915 [pdf, other]

Multilingual Diversity Improves Vision-Language Representations

Authors: Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt, Pang Wei Koh, Ranjay Krishna

Abstract: Massive web-crawled image-text datasets lay the foundation for recent progress in multimodal learning. These datasets are designed with the goal of training a model to do well on standard computer vision benchmarks, many of which, however, have been shown to be English-centric (e.g., ImageNet). Consequently, existing data curation techniques gravitate towards using predominantly English image-text… ▽ More Massive web-crawled image-text datasets lay the foundation for recent progress in multimodal learning. These datasets are designed with the goal of training a model to do well on standard computer vision benchmarks, many of which, however, have been shown to be English-centric (e.g., ImageNet). Consequently, existing data curation techniques gravitate towards using predominantly English image-text pairs and discard many potentially useful non-English samples. Our work questions this practice. Multilingual data is inherently enriching not only because it provides a gateway to learn about culturally salient concepts, but also because it depicts common concepts differently from monolingual data. We thus conduct a systematic study to explore the performance benefits of using more samples of non-English origins with respect to English vision tasks. By translating all multilingual image-text pairs from a raw web crawl to English and re-filtering them, we increase the prevalence of (translated) multilingual data in the resulting training set. Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet, ImageNet distribution shifts, image-English-text retrieval and on average across 38 tasks from the DataComp benchmark. On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa. In addition, we quantitatively show that English and non-English data are significantly different in both image and (translated) text space. We hope that our findings motivate future work to be more intentional about including multicultural and multilingual data, not just when non-English or geographically diverse tasks are involved, but to enhance model capabilities at large. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16500 [pdf, other]

Optimal Intervention Strategies and Cost-effectiveness Analysis study of Tuberculosis with reference to TPT, Malnutrition and Diabetes Management

Authors: Sushil Chhetri, Krishna Kiran Vamsi Dasu, K N Kavya, Sharath B N, Uma Shankar S, Somashekar N, Vineet Kumar Chadda

Abstract: Tuberculosis remains a significant global health challenge, with millions of new cases reported annually. Recent studies suggest that expanding the accessibility of TB intervention programs can lead to a substantial decrease in both TB incidence and prevalence. This paper initiates by examining a deterministic mathematical model for TB transmission, aiming to analyze the underlying dynamics. Subse… ▽ More Tuberculosis remains a significant global health challenge, with millions of new cases reported annually. Recent studies suggest that expanding the accessibility of TB intervention programs can lead to a substantial decrease in both TB incidence and prevalence. This paper initiates by examining a deterministic mathematical model for TB transmission, aiming to analyze the underlying dynamics. Subsequently, an optimal control problem is formulated to enhance TB control measures, encompassing Tuberculosis Preventive Treatment (TPT) and other initiatives targeting malnutrition and diabetes. Through simulation studies, the effectiveness of the control program is assessed. The model dynamics allow us to identify the pseudo-prevalence and incidence. To determine the potential long-term trajectory of TB and to acquire future projections a cost-effectiveness analysis is performed using ACER, AIR, ICER, and four quadrants to compare competing interventions. In conclusion, this work provides valuable insights into TB and strategies for its control and cost effectiveness. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16389 [pdf, ps, other]

Decorrelation in Local Statistics for random operators

Authors: M. Krishna

Abstract: In this paper we study the local spectral statistics in the localised region of various random operator models, including the $d$-dimensional the Anderson model and random Schrödinger operators. It is already established, in the above models, that at an energy $E$, in the localised energy region of the spectrum, where the density of states $n(E) > 0$, the local eigenvalue statistics $X_E$ is a Poi… ▽ More In this paper we study the local spectral statistics in the localised region of various random operator models, including the $d$-dimensional the Anderson model and random Schrödinger operators. It is already established, in the above models, that at an energy $E$, in the localised energy region of the spectrum, where the density of states $n(E) > 0$, the local eigenvalue statistics $X_E$ is a Poisson processes with intensity $n(E) \mathcal{L}$, $\mathcal{L}$ being the Lebesgue measure on $\mathbb{R}$. The question of independence of $X_E, X_{E^\prime}$ for distinct energies was partially solved in the literature. We solve it completely for all the models for which the Minami technique works. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15839 [pdf, ps, other]

Balancing and Lucas-balancing numbers as difference of two repdigits

Authors: Monalisa Mohapatra, Pritam Kumar Bhoi, Gopal Krishna Panda

Abstract: Positive integers with all digits equal are called repdigits. In this paper, we find all balancing and Lucas-balancing numbers, which can be expressed as the difference of two repdigits. The method of proof involves the application of Baker's theory for linear forms in logarithms of algebraic numbers and the Baker-Davenport reduction procedure. Positive integers with all digits equal are called repdigits. In this paper, we find all balancing and Lucas-balancing numbers, which can be expressed as the difference of two repdigits. The method of proof involves the application of Baker's theory for linear forms in logarithms of algebraic numbers and the Baker-Davenport reduction procedure. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 13 pages. arXiv admin note: text overlap with arXiv:2405.04801

MSC Class: Primary 11B39; Secondary 11J86; 11D61

arXiv:2405.15590 [pdf, ps, other]

Profiling checkpointing schedules in adjoint ST-AD

Authors: Laurent Hascoët, Jean-Luc Bouchot, Shreyas Sunil Gaikwad, Sri Hari Krishna Narayanan, Jan Hückelheim

Abstract: Checkpointing is a cornerstone of data-flow reversal in adjoint algorithmic differentiation. Checkpointing is a storage/recomputation trade-off that can be applied at different levels, one of which being the call tree. We are looking for good placements of checkpoints onto the call tree of a given application, to reduce run time and memory footprint of its adjoint. There is no known optimal soluti… ▽ More Checkpointing is a cornerstone of data-flow reversal in adjoint algorithmic differentiation. Checkpointing is a storage/recomputation trade-off that can be applied at different levels, one of which being the call tree. We are looking for good placements of checkpoints onto the call tree of a given application, to reduce run time and memory footprint of its adjoint. There is no known optimal solution to this problem other than a combinatorial search on all placements. We propose a heuristics based on run-time profiling of the adjoint code. We describe implementation of this profiling tool in an existing source-transformation AD tool. We demonstrate the interest of this approach on test cases taken from the MITgcm ocean and atmospheric global circulation model. We discuss the limitations of our approach and propose directions to lift them. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.13762 [pdf, other]

A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

Authors: Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli

Abstract: Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the a… ▽ More Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space.Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process. Instead of the standard fixed diffusion timestep, we propose applying variable diffusion timesteps across the temporal dimension and across modalities of the inputs. This formulation offers flexibility to introduce variable noise levels for various portions of the input, hence the term mixture of noise levels. We propose a transformer-based audiovisual latent diffusion model and show that it can be trained in a task-agnostic fashion using our approach to enable a variety of audiovisual generation tasks at inference time. Experiments demonstrate the versatility of our method in tackling cross-modal and multimodal interpolation tasks in the audiovisual space. Notably, our proposed approach surpasses baselines in generating temporally and perceptually consistent samples conditioned on the input. Project page: avdit2024.github.io △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13181 [pdf, other]

Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting

Authors: Krishna Prasad Varadarajan Srinivasan, Prasanth Gumpena, Madhusudhana Yattapu, Vishal H. Brahmbhatt

Abstract: In the domain of large language models (LLMs), arXiv:2305.16938 showed that few-shot full-model fine-tuning -- namely Vanilla Fine Tuning (FT) and Pattern-Based Fine Tuning (PBFT) --, and In-Context Learning (ICL) generalize similarly on Out-Of-Domain (OOD) datasets, but vary in terms of task adaptation. However, they both pose challenges, especially in term of memory requirements. In this paper,… ▽ More In the domain of large language models (LLMs), arXiv:2305.16938 showed that few-shot full-model fine-tuning -- namely Vanilla Fine Tuning (FT) and Pattern-Based Fine Tuning (PBFT) --, and In-Context Learning (ICL) generalize similarly on Out-Of-Domain (OOD) datasets, but vary in terms of task adaptation. However, they both pose challenges, especially in term of memory requirements. In this paper, we further try to push the understanding of different fine-tuning strategies for LLM and aim to bring a myriad of these on the same pedestal for an elaborate comparison with full-model fine-tuning on two diverse datasets. To that end, we conducted a series of experiments, beginning with state-of-the-art methods like vanilla fine-tuning and Pattern-Based Fine-Tuning (PBFT) on pre-trained models across two datasets, COLA and MNLI. We then investigate adaptive fine-tuning and the efficiency of LoRA adapters in a few-shot setting. Finally, we also compare an alternative approach that has gained recent popularity -- context distillation -- with the vanilla FT and PBFT with and without few-shot setup. Our findings suggest that these alternative strategies that we explored can exhibit out-of-domain generalization comparable to that of vanilla FT and PBFT. PBFT under-performs Vanilla FT on out-of-domain (OOD) data, emphasizing the need for effective prompts. Further, our adaptive-fine tuning and LoRA experiments perform comparable or slightly worse than the standard fine-tunings as anticipated, since standard fine-tunings involve tuning the entire model. Finally, our context distillation experiments out-perform the standard fine-tuning methods. These findings underscore that eventually the choice of an appropriate fine-tuning method depends on the available resources (memory, compute, data) and task adaptability. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 9 pages of main paper, 1 page of references, 6 appendix pages, 11 figures, 18 tables

arXiv:2405.13170 [pdf, other]

FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching

Authors: Jianming Tong, Anirudh Itagi, Prasanth Chatarasi, Tushar Krishna

Abstract: The inference of ML models composed of diverse structures, types, and sizes boils down to the execution of different dataflows (i.e. different tiling, ordering, parallelism, and shapes). Using the optimal dataflow for every layer of workload can reduce latency by up to two orders of magnitude over a suboptimal dataflow. Unfortunately, reconfiguring hardware for different dataflows involves on-chip… ▽ More The inference of ML models composed of diverse structures, types, and sizes boils down to the execution of different dataflows (i.e. different tiling, ordering, parallelism, and shapes). Using the optimal dataflow for every layer of workload can reduce latency by up to two orders of magnitude over a suboptimal dataflow. Unfortunately, reconfiguring hardware for different dataflows involves on-chip data layout reordering and datapath reconfigurations, leading to non-trivial overhead that hinders ML accelerators from exploiting different dataflows, resulting in suboptimal performance. To address this challenge, we propose FEATHER, an innovative accelerator that leverages a novel spatial array termed Nest and a novel multi-stage reduction network called BIRRD for performing flexible data reduction with layout reordering under the hood, enabling seamless switching between optimal dataflows with negligible latency and resources overhead. For systematically evaluating the performance interaction between dataflows and layouts, we enhance Timeloop, a state-of-the-art dataflow cost modeling and search framework, with layout assessment capabilities, and term it as Layoutloop. We model FEATHER into Layoutloop and also deploy FEATHER end-to-end on the edge ZCU104 FPGA. FEATHER delivers 1.27~2.89x inference latency speedup and 1.3~6.43x energy efficiency improvement compared to various SoTAs like NVDLA, SIGMA and Eyeriss under ResNet-50 and MobiletNet-V3 in Layoutloop. On practical FPGA devices, FEATHER achieves 2.65/3.91x higher throughput than Xilinx DPU/Gemmini. Remarkably, such performance and energy efficiency enhancements come at only 6% area over a fixed-dataflow Eyeriss-like accelerator. Our code is released at https://github.com/maeri-project/FEATHER. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 17 pages, 14 figures. International Symposium on Computer Architecture (ISCA), Jun 2024

arXiv:2405.12983 [pdf, other]

Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer

Authors: Maxime Burchi, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg, Radu Timofte

Abstract: Humans are adept at leveraging visual cues from lip movements for recognizing speech in adverse listening conditions. Audio-Visual Speech Recognition (AVSR) models follow similar approach to achieve robust speech recognition in noisy conditions. In this work, we present a multilingual AVSR model incorporating several enhancements to improve performance and audio noise robustness. Notably, we adapt… ▽ More Humans are adept at leveraging visual cues from lip movements for recognizing speech in adverse listening conditions. Audio-Visual Speech Recognition (AVSR) models follow similar approach to achieve robust speech recognition in noisy conditions. In this work, we present a multilingual AVSR model incorporating several enhancements to improve performance and audio noise robustness. Notably, we adapt the recently proposed Fast Conformer model to process both audio and visual modalities using a novel hybrid CTC/RNN-T architecture. We increase the amount of audio-visual training data for six distinct languages, generating automatic transcriptions of unlabelled multilingual datasets (VoxCeleb2 and AVSpeech). Our proposed model achieves new state-of-the-art performance on the LRS3 dataset, reaching WER of 0.8%. On the recently introduced MuAViC benchmark, our model yields an absolute average-WER reduction of 11.9% in comparison to the original baseline. Finally, we demonstrate the ability of the proposed model to perform audio-only, visual-only, and audio-visual speech recognition at test time. △ Less

Submitted 13 March, 2024; originally announced May 2024.

arXiv:2405.12011 [pdf, ps, other]

Higher weight spectra of ternary codes associated to the quadratic Veronese $3$-fold

Authors: Krishna Kaipa, Puspendu Pradhan

Abstract: The problem studied in this work is to determine the higher weight spectra of the Projective Reed-Muller codes associated to the Veronese $3$-fold $\mathcal V$ in $PG(9,q)$, which is the image of the quadratic Veronese embedding of $PG(3,q)$ in $PG(9,q)$. We reduce the problem to the following combinatorial problem in finite geometry: For each subset $S$ of $\mathcal V$, determine the dimension of… ▽ More The problem studied in this work is to determine the higher weight spectra of the Projective Reed-Muller codes associated to the Veronese $3$-fold $\mathcal V$ in $PG(9,q)$, which is the image of the quadratic Veronese embedding of $PG(3,q)$ in $PG(9,q)$. We reduce the problem to the following combinatorial problem in finite geometry: For each subset $S$ of $\mathcal V$, determine the dimension of the linear subspace of $PG(9,q)$ generated by $S$. We develop a systematic method to solve the latter problem. We implement the method for $q=3$, and use it to obtain the higher weight spectra of the associated code. The case of a general finite field $\mathbb F_q$ will be treated in a future work. △ Less

Submitted 20 May, 2024; originally announced May 2024.

MSC Class: 94B27; 51E20; 05B25

arXiv:2405.09792 [pdf]

CMOS-compatible Strain Engineering for High-Performance Monolayer Semiconductor Transistors

Authors: Marc Jaikissoon, Çağıl Köroğlu, Jerry A. Yang, Kathryn M. Neilson, Krishna C. Saraswat, Eric Pop

Abstract: Strain engineering has played a key role in modern silicon electronics, having been introduced as a mobility booster in the 1990s and commercialized in the early 2000s. Achieving similar advances with two-dimensional (2D) semiconductors in a CMOS (complementary metal oxide semiconductor) compatible manner would radically improve the industrial viability of 2D transistors. Here, we show silicon nit… ▽ More Strain engineering has played a key role in modern silicon electronics, having been introduced as a mobility booster in the 1990s and commercialized in the early 2000s. Achieving similar advances with two-dimensional (2D) semiconductors in a CMOS (complementary metal oxide semiconductor) compatible manner would radically improve the industrial viability of 2D transistors. Here, we show silicon nitride cap** layers can impart strain to monolayer MoS2 transistors on conventional silicon substrates, enhancing their electrical performance with a low thermal budget (350 °C), CMOS-compatible approach. Strained back-gated and dual-gated MoS2 transistors demonstrate median increases up to 60% and 45% in on-state current, respectively. The greatest improvements are found when both transistor channels and contacts are reduced to ~200 nm, reaching saturation currents of 488 uA/um, higher than any previous reports at such short contact pitch. Simulations reveal that most benefits arise from tensile strain lowering the contact Schottky barriers, and that further reducing device dimensions (including contacts) will continue to offer increased strain and performance improvements. △ Less

Submitted 29 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.09109 [pdf]

doi 10.1109/ACCESS.2024.3400604

Motion Prediction with Gaussian Processes for Safe Human-Robot Interaction in Virtual Environments

Authors: Stanley Mugisha, Vamsi Krishna Guda, Christine Chevallereau, Damien Chablat, Matteo Zoppi

Abstract: Humans use collaborative robots as tools for accomplishing various tasks. The interaction between humans and robots happens in tight shared workspaces. However, these machines must be safe to operate alongside humans to minimize the risk of accidental collisions. Ensuring safety imposes many constraints, such as reduced torque and velocity limits during operation, thus increasing the time to accom… ▽ More Humans use collaborative robots as tools for accomplishing various tasks. The interaction between humans and robots happens in tight shared workspaces. However, these machines must be safe to operate alongside humans to minimize the risk of accidental collisions. Ensuring safety imposes many constraints, such as reduced torque and velocity limits during operation, thus increasing the time to accomplish many tasks. However, for applications such as using collaborative robots as haptic interfaces with intermittent contacts for virtual reality applications, speed limitations result in poor user experiences. This research aims to improve the efficiency of a collaborative robot while improving the safety of the human user. We used Gaussian process models to predict human hand motion and developed strategies for human intention detection based on hand motion and gaze to improve the time for the robot and human security in a virtual environment. We then studied the effect of prediction. Results from comparisons show that the prediction models improved the robot time by 3\% and safety by 17\%. When used alongside gaze, prediction with Gaussian process models resulted in an improvement of the robot time by 2\% and the safety by 13\%. △ Less

Submitted 18 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

Comments: 16 pages

ACM Class: I.2.6; I.2.9; I.3.2; H.5.2

arXiv:2405.08003 [pdf, ps, other]

doi 10.1142/S0219025724400022

Continuous Krishna-Parthasarathy Entropic Uncertainty Principle

Authors: K. Mahesh Krishna

Abstract: In 2002, Krishna and Parthasarathy [\textit{Sankhyā Ser. A}] derived discrete quantum version of Maassen-Uffink [\textit{Phys. Rev. Lett., 1988}] entropic uncertainty principle. In this paper, using the notion of continuous operator-valued frames, we derive an entropic uncertainty principle for arbitrary family of operators indexed by measure spaces having finite measure. We give an application to… ▽ More In 2002, Krishna and Parthasarathy [\textit{Sankhyā Ser. A}] derived discrete quantum version of Maassen-Uffink [\textit{Phys. Rev. Lett., 1988}] entropic uncertainty principle. In this paper, using the notion of continuous operator-valued frames, we derive an entropic uncertainty principle for arbitrary family of operators indexed by measure spaces having finite measure. We give an application to the special case of compact groups. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 7 pages, 0 Figures

MSC Class: 81P15; 94A17; 42C15

Journal ref: Special issue of Infinite Dimensional Analysis, Quantum Probability and Related Topics in honour of Prof. K. R. Parthasarathy, 18 March 2024

arXiv:2405.05572 [pdf, other]

From Human Judgements to Predictive Models: Unravelling Acceptability in Code-Mixed Sentences

Authors: Prashant Kodali, Anmol Goel, Likhith Asapu, Vamshi Krishna Bonagiri, Anirudh Govil, Monojit Choudhury, Manish Shrivastava, Ponnurangam Kumaraguru

Abstract: Current computational approaches for analysing or generating code-mixed sentences do not explicitly model "naturalness" or "acceptability" of code-mixed sentences, but rely on training corpora to reflect distribution of acceptable code-mixed sentences. Modelling human judgement for the acceptability of code-mixed text can help in distinguishing natural code-mixed text and enable quality-controlled… ▽ More Current computational approaches for analysing or generating code-mixed sentences do not explicitly model "naturalness" or "acceptability" of code-mixed sentences, but rely on training corpora to reflect distribution of acceptable code-mixed sentences. Modelling human judgement for the acceptability of code-mixed text can help in distinguishing natural code-mixed text and enable quality-controlled generation of code-mixed text. To this end, we construct Cline - a dataset containing human acceptability judgements for English-Hindi (en-hi) code-mixed text. Cline is the largest of its kind with 16,642 sentences, consisting of samples sourced from two sources: synthetically generated code-mixed text and samples collected from online social media. Our analysis establishes that popular code-mixing metrics such as CMI, Number of Switch Points, Burstines, which are used to filter/curate/compare code-mixed corpora have low correlation with human acceptability judgements, underlining the necessity of our dataset. Experiments using Cline demonstrate that simple Multilayer Perceptron (MLP) models trained solely on code-mixing metrics are outperformed by fine-tuned pre-trained Multilingual Large Language Models (MLLMs). Specifically, XLM-Roberta and Bernice outperform IndicBERT across different configurations in challenging data settings. Comparison with ChatGPT's zero and fewshot capabilities shows that MLLMs fine-tuned on larger data outperform ChatGPT, providing scope for improvement in code-mixed tasks. Zero-shot transfer from English-Hindi to English-Telugu acceptability judgments using our model checkpoints proves superior to random baselines, enabling application to other code-mixed language pairs and providing further avenues of research. We publicly release our human-annotated dataset, trained checkpoints, code-mix corpus, and code for data generation and model training. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.04801 [pdf, ps, other]

Repdigits as difference of two balancing or Lucas-balancing numbers

Authors: Monalisa Mohapatra, Pritam Kumar Bhoi, Gopal Krishna Panda

Abstract: Repdigits are natural numbers formed by the repetition of a single digit. In this paper, we study the problem of writing repdigits as the difference of two balancing or Lucas-balancing numbers. The method of proof involves the application of Baker's theory for linear forms in logarithms of algebraic numbers and the Baker-Davenport reduction procedure. Computations are done with the help of a simpl… ▽ More Repdigits are natural numbers formed by the repetition of a single digit. In this paper, we study the problem of writing repdigits as the difference of two balancing or Lucas-balancing numbers. The method of proof involves the application of Baker's theory for linear forms in logarithms of algebraic numbers and the Baker-Davenport reduction procedure. Computations are done with the help of a simple computer program in {\it Mathematica}. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 12

MSC Class: Primary 11B39; Secondary 11J86; 11D61

arXiv:2405.04619 [pdf, other]

Non-anomalous non-invertible symmetries in 1+1D from gapped boundaries of SymTFTs

Authors: Pavel Putrov, Rajath Radhakrishnan

Abstract: We study the anomalies of non-invertible symmetries in 1+1D QFTs using gapped boundaries of its SymTFT. We establish the explicit relation between Lagrangian algebras which determine gapped boundaries of the SymTFT, and algebras which determine non-anomalous/gaugeable topological line operators in the 1+1D QFT. If the Lagrangian algebras in the SymTFT are known, this provides a method to compute a… ▽ More We study the anomalies of non-invertible symmetries in 1+1D QFTs using gapped boundaries of its SymTFT. We establish the explicit relation between Lagrangian algebras which determine gapped boundaries of the SymTFT, and algebras which determine non-anomalous/gaugeable topological line operators in the 1+1D QFT. If the Lagrangian algebras in the SymTFT are known, this provides a method to compute algebras in all fusion categories that share the same SymTFT. We find necessary conditions that a line operator in the SymTFT must satisfy for the corresponding line operator in the 1+1D QFT to be non-anomalous. We use this constraint to show that a non-invertible symmetry admits a 1+1D trivially gapped phase if and only if the SymTFT admits a magnetic Lagrangian algebra. We define a process of transporting non-anomalous line operators between fusion categories which share the same SymTFT and apply this method to the three Haagerup fusion categories. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 72 pages, 24 figures

arXiv:2405.03582 [pdf, other]

Functional Latent Dynamics for Irregularly Sampled Time Series Forecasting

Authors: Christian Klötergens, Vijaya Krishna Yalavarthi, Maximilian Stubbemann, Lars Schmidt-Thieme

Abstract: Irregularly sampled time series with missing values are often observed in multiple real-world applications such as healthcare, climate and astronomy. They pose a significant challenge to standard deep learn- ing models that operate only on fully observed and regularly sampled time series. In order to capture the continuous dynamics of the irreg- ular time series, many models rely on solving an Ord… ▽ More Irregularly sampled time series with missing values are often observed in multiple real-world applications such as healthcare, climate and astronomy. They pose a significant challenge to standard deep learn- ing models that operate only on fully observed and regularly sampled time series. In order to capture the continuous dynamics of the irreg- ular time series, many models rely on solving an Ordinary Differential Equation (ODE) in the hidden state. These ODE-based models tend to perform slow and require large memory due to sequential operations and a complex ODE solver. As an alternative to complex ODE-based mod- els, we propose a family of models called Functional Latent Dynamics (FLD). Instead of solving the ODE, we use simple curves which exist at all time points to specify the continuous latent state in the model. The coefficients of these curves are learned only from the observed values in the time series ignoring the missing values. Through extensive experi- ments, we demonstrate that FLD achieves better performance compared to the best ODE-based model while reducing the runtime and memory overhead. Specifically, FLD requires an order of magnitude less time to infer the forecasts compared to the best performing forecasting model. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.03348 [pdf, other]

Evolution of the 5G New Radio Two-Step Random Access towards 6G Unsourced MAC

Authors: Patrick Agostini, Jean-Francois Chamberland, Federico Clazzer, Johannes Dommel, Gianluigi Liva, Andrea Munari, Krishna Narayanan, Yury Polyanskiy, Slawomir Stanczak, Zoran Utkovski

Abstract: This report summarizes some considerations on possible evolutions of grant-free random access in the next generation of the 3GPP wireless cellular standard. The analysis is carried out by map** the problem to the recently-introduced unsourced multiple access channel (UMAC) setup. By doing so, the performance of existing solutions can be benchmarked with information-theoretic bounds, assessing th… ▽ More This report summarizes some considerations on possible evolutions of grant-free random access in the next generation of the 3GPP wireless cellular standard. The analysis is carried out by map** the problem to the recently-introduced unsourced multiple access channel (UMAC) setup. By doing so, the performance of existing solutions can be benchmarked with information-theoretic bounds, assessing the potential gains that can be achieved over legacy 3GPP schemes. The study focuses on the two-step random access (2SRA) protocol introduced by Release 16 of the 5G New Radio standard, investigating its applicability to support large MTC / IoT terminal populations in a grant-free fashion. The analysis shows that the existing 2SRA scheme may not succeed in providing energy-efficient support to large user populations. Modifications to the protocol are proposed that enable remarkable gains in both energy and spectral efficiency while retaining a strong resemblance to the legacy protocol. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Version 1.0 of the report

Showing 51–100 of 3,387 results for author: Krishna