Search | arXiv e-print repository

Precision Guided Approach to Mitigate Data Poisoning Attacks in Federated Learning

Authors: K Naveen Kumar, C Krishna Mohan, Aravind Machiry

Abstract: Federated Learning (FL) is a collaborative learning paradigm enabling participants to collectively train a shared machine learning model while preserving the privacy of their sensitive data. Nevertheless, the inherent decentralized and data-opaque characteristics of FL render its susceptibility to data poisoning attacks. These attacks introduce malformed or malicious inputs during local model trai… ▽ More Federated Learning (FL) is a collaborative learning paradigm enabling participants to collectively train a shared machine learning model while preserving the privacy of their sensitive data. Nevertheless, the inherent decentralized and data-opaque characteristics of FL render its susceptibility to data poisoning attacks. These attacks introduce malformed or malicious inputs during local model training, subsequently influencing the global model and resulting in erroneous predictions. Current FL defense strategies against data poisoning attacks either involve a trade-off between accuracy and robustness or necessitate the presence of a uniformly distributed root dataset at the server. To overcome these limitations, we present FedZZ, which harnesses a zone-based deviating update (ZBDU) mechanism to effectively counter data poisoning attacks in FL. Further, we introduce a precision-guided methodology that actively characterizes these client clusters (zones), which in turn aids in recognizing and discarding malicious updates at the server. Our evaluation of FedZZ across two widely recognized datasets: CIFAR10 and EMNIST, demonstrate its efficacy in mitigating data poisoning attacks, surpassing the performance of prevailing state-of-the-art methodologies in both single and multi-client attack scenarios and varying attack volumes. Notably, FedZZ also functions as a robust client selection strategy, even in highly non-IID and attack-free scenarios. Moreover, in the face of escalating poisoning rates, the model accuracy attained by FedZZ displays superior resilience compared to existing techniques. For instance, when confronted with a 50% presence of malicious clients, FedZZ sustains an accuracy of 67.43%, while the accuracy of the second-best solution, FL-Defender, diminishes to 43.36%. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 14 pages, 11 figures, 5 tables, Accepted in ACM CODASPY 2024

arXiv:2404.04075 [pdf, other]

Crosstalk-mitigated microelectronic control for optically-active spins

Authors: Hao-Cheng Weng, John G. Rarity, Krishna C. Balram, Joe A. Smith

Abstract: To exploit the sub-nanometre dimensions of qubits for large-scale quantum information processing, corresponding control architectures require both energy and space efficiency, with the on-chip footprint of unit-cell electronics ideally micron-scale. However, the spin coherence of qubits in close packing is severely deteriorated by microwave crosstalk from neighbouring control sites. Here, we prese… ▽ More To exploit the sub-nanometre dimensions of qubits for large-scale quantum information processing, corresponding control architectures require both energy and space efficiency, with the on-chip footprint of unit-cell electronics ideally micron-scale. However, the spin coherence of qubits in close packing is severely deteriorated by microwave crosstalk from neighbouring control sites. Here, we present a crosstalk-mitigation scheme using foundry microelectronics, to address solid-state spins at sub-100 um spacing without the need for qubit-detuning. Using nitrogen-vacancy centres in nanodiamonds as qubit prototypes, we first demonstrate 10 MHz Rabi oscillation at milliwatts of microwave power. Implementing the active cancellation, we then prove that the crosstalk field from neighbouring lattice sites can be reduced to undetectable levels. We finally extend the scheme to show increased qubit control, tripling the spin coherence under crosstalk mitigation. Compatible with integrated optics, our results present a step towards scalable control across quantum platforms using silicon microelectronics. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 11 pages, 5 figures

arXiv:2404.03675 [pdf, other]

Deciphering Accretion-Driven Starquakes in Recycled Millisecond Pulsars using Gravitational Waves

Authors: Sagnik Chatterjee, Kamal Krishna Nath, Ritam Mallick

Abstract: Recycled millisecond pulsars are susceptible to starquakes as they are continuously accreting matter from their binary companion. A starquake happens when the rotational frequency of the star crosses its breaking frequency. In this study, we perform a model analysis of an accreting neutron star suffering a starquake. We analyze two models: a spherical star with accreting mountains and a deformed s… ▽ More Recycled millisecond pulsars are susceptible to starquakes as they are continuously accreting matter from their binary companion. A starquake happens when the rotational frequency of the star crosses its breaking frequency. In this study, we perform a model analysis of an accreting neutron star suffering a starquake. We analyze two models: a spherical star with accreting mountains and a deformed star with accreting mountains. We find that as the star crosses the breaking frequency and suffers a starquake there is a sudden change in the continuous gravitational wave signal arriving from them. It is interesting to note that the amplitude of the gravitational wave signals increases suddenly for the spherical star. In contrast, for the deformed star, the amplitude of the continuous gravitational wave signal decreases suddenly. This sudden change in the continuous gravitational wave signal in recycled millisecond pulsars can be a unique signature for such pulsars undergoing a starquake. △ Less

Submitted 27 March, 2024; originally announced April 2024.

Comments: 9 pages, 9 figures

arXiv:2404.03671 [pdf, other]

doi 10.3390/galaxies12020010

Central Engine and Spectral Energy Distribution Properties of High Redshift Gamma Ray Blazars

Authors: A. Tolamatti, K. K. Singh, K. K. Yadav

Abstract: We report on the properties of central engines in the $γ$-ray blazars located at high redshifts beyond z~>~0.4, where the extra-galactic background light (EBL) starts affecting their $γ$-ray spectra. The physical engine that provides power to the blazars of very high bolometric luminosity is assumed to be a highly collimated jet of matter moving relativistically away from the supermassive black ho… ▽ More We report on the properties of central engines in the $γ$-ray blazars located at high redshifts beyond z~>~0.4, where the extra-galactic background light (EBL) starts affecting their $γ$-ray spectra. The physical engine that provides power to the blazars of very high bolometric luminosity is assumed to be a highly collimated jet of matter moving relativistically away from the supermassive black hole (SMBH), located in the central region of the host galaxy, in a direction aligned toward the Earth. Due to their peculiar geometry and special physical conditions, blazars at redshifts beyond z~>~0.4 are bright enough to be detected in the $γ$-ray energy band. In this work, we investigate the physical properties of high-$z$ $γ$-ray blazars detected by the Large Area Telescope (LAT) on board the \emph{Fermi} satellite. We also study the properties of their emission regions and the central engines and discuss cosmological and astrophysical implications. △ Less

Submitted 11 March, 2024; originally announced April 2024.

Comments: 20 Pages,8 Figures, Published in Galaxies journal

Journal ref: Galaxies 2024, 12, 10

arXiv:2404.03665 [pdf]

Serial Parallel Reliability Redundancy Allocation Optimization for Energy Efficient and Fault Tolerant Cloud Computing

Authors: Gutha Jaya Krishna

Abstract: Serial-parallel redundancy is a reliable way to ensure service and systems will be available in cloud computing. That method involves making copies of the same system or program, with only one remaining active. When an error occurs, the inactive copy can step in as a backup right away, this provides continuous performance and uninterrupted operation. This approach is called parallel redundancy, ot… ▽ More Serial-parallel redundancy is a reliable way to ensure service and systems will be available in cloud computing. That method involves making copies of the same system or program, with only one remaining active. When an error occurs, the inactive copy can step in as a backup right away, this provides continuous performance and uninterrupted operation. This approach is called parallel redundancy, otherwise known as active-active redundancy, and its exceptional when it comes to strategy. It creates duplicates of a system or service that are all running at once. By doing this fault tolerance increases since if one copy fails, the workload can be distributed across any replica thats functioning properly. Reliability allocation depends on features in a system and the availability and fault tolerance you want from it. Serial redundancy or parallel redundancies can be applied to increase the dependability of systems and services. To demonstrate how well this concept works, we looked into fixed serial parallel reliability redundancy allocation issues followed by using an innovative hybrid optimization technique to find the best possible allocation for peak dependability. We then measured our findings against other research. △ Less

Submitted 16 February, 2024; originally announced April 2024.

Comments: 5 Pages, 1 Figure, 2 Tables

MSC Class: 68W50 ACM Class: I.2.11

arXiv:2404.03587 [pdf, other]

Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration

Authors: Shivam Singh, Karthik Swaminathan, Raghav Arora, Ramandeep Singh, Ahana Datta, Dipanjan Das, Snehasis Banerjee, Mohan Sridharan, Madhava Krishna

Abstract: An agent assisting humans in daily living activities can collaborate more effectively by anticipating upcoming tasks. Data-driven methods represent the state of the art in task anticipation, planning, and related problems, but these methods are resource-hungry and opaque. Our prior work introduced a proof of concept framework that used an LLM to anticipate 3 high-level tasks that served as goals f… ▽ More An agent assisting humans in daily living activities can collaborate more effectively by anticipating upcoming tasks. Data-driven methods represent the state of the art in task anticipation, planning, and related problems, but these methods are resource-hungry and opaque. Our prior work introduced a proof of concept framework that used an LLM to anticipate 3 high-level tasks that served as goals for a classical planning system that computed a sequence of low-level actions for the agent to achieve these goals. This paper describes DaTAPlan, our framework that significantly extends our prior work toward human-robot collaboration. Specifically, DaTAPlan planner computes actions for an agent and a human to collaboratively and jointly achieve the tasks anticipated by the LLM, and the agent automatically adapts to unexpected changes in human action outcomes and preferences. We evaluate DaTAPlan capabilities in a realistic simulation environment, demonstrating accurate task anticipation, effective human-robot collaboration, and the ability to adapt to unexpected changes. Project website: https://dataplan-hrc.github.io △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.03307 [pdf, other]

Bi-level Trajectory Optimization on Uneven Terrains with Differentiable Wheel-Terrain Interaction Model

Authors: Amith Manoharan, Aditya Sharma, Himani Belsare, Kaustab Pal, K. Madhava Krishna, Arun Kumar Singh

Abstract: Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-int… ▽ More Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-intensive and fraught with generalization issues. In this paper, we present a purely model-based approach that just requires the digital elevation information of the terrain. Specifically, we express the wheel-terrain interaction and 6dof pose prediction as a non-linear least squares (NLS) problem. As a result, trajectory planning can be viewed as a bi-level optimization. The inner optimization layer predicts the pose on the terrain along a given trajectory, while the outer layer deforms the trajectory itself to reduce the stability and kinematic costs of the pose. We improve the state-of-the-art in the following respects. First, we show that our NLS based pose prediction closely matches the output from a high-fidelity physics engine. This result coupled with the fact that we can query gradients of the NLS solver, makes our pose predictor, a differentiable wheel-terrain interaction model. We further leverage this differentiability to efficiently solve the proposed bi-level trajectory optimization problem. Finally, we perform extensive experiments, and comparison with a baseline to showcase the effectiveness of our approach in obtaining smooth, stable trajectories. △ Less

Submitted 11 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: 8 pages, 7 figures, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

arXiv:2404.03216 [pdf, other]

Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption

Authors: Jianming Tong, **gtian Dang, Anupam Golder, Callie Hao, Arijit Raychowdhury, Tushar Krishna

Abstract: As machine learning (ML) permeates fields like healthcare, facial recognition, and blockchain, the need to protect sensitive data intensifies. Fully Homomorphic Encryption (FHE) allows inference on encrypted data, preserving the privacy of both data and the ML model. However, it slows down non-secure inference by up to five magnitudes, with a root cause of replacing non-polynomial operators (ReLU… ▽ More As machine learning (ML) permeates fields like healthcare, facial recognition, and blockchain, the need to protect sensitive data intensifies. Fully Homomorphic Encryption (FHE) allows inference on encrypted data, preserving the privacy of both data and the ML model. However, it slows down non-secure inference by up to five magnitudes, with a root cause of replacing non-polynomial operators (ReLU and MaxPooling) with high-degree Polynomial Approximated Function (PAF). We propose SmartPAF, a framework to replace non-polynomial operators with low-degree PAF and then recover the accuracy of PAF-approximated model through four techniques: (1) Coefficient Tuning (CT) -- adjust PAF coefficients based on the input distributions before training, (2) Progressive Approximation (PA) -- progressively replace one non-polynomial operator at a time followed by a fine-tuning, (3) Alternate Training (AT) -- alternate the training between PAFs and other linear operators in the decoupled manner, and (4) Dynamic Scale (DS) / Static Scale (SS) -- dynamically scale PAF input value within (-1, 1) in training, and fix the scale as the running max value in FHE deployment. The synergistic effect of CT, PA, AT, and DS/SS enables SmartPAF to enhance the accuracy of the various models approximated by PAFs with various low degrees under multiple datasets. For ResNet-18 under ImageNet-1k, the Pareto-frontier spotted by SmartPAF in latency-accuracy tradeoff space achieves 1.42x ~ 13.64x accuracy improvement and 6.79x ~ 14.9x speedup than prior works. Further, SmartPAF enables a 14-degree PAF (f1^2 g_1^2) to achieve 7.81x speedup compared to the 27-degree PAF obtained by minimax approximation with the same 69.4% post-replacement accuracy. Our code is available at https://github.com/EfficientFHE/SmartPAF. △ Less

Submitted 7 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.02911 [pdf, ps, other]

Machine Learning Driven Global Optimisation Framework for Analog Circuit Design

Authors: Ria Rashid, Komala Krishna, Clint Pazhayidam George, Nandakumar Nambath

Abstract: We propose a machine learning-driven optimisation framework for analog circuit design in this paper. The primary objective is to determine the device sizes for the optimal performance of analog circuits for a given set of specifications. Our methodology entails employing machine learning models and spice simulations to direct the optimisation algorithm towards achieving the optimal design for anal… ▽ More We propose a machine learning-driven optimisation framework for analog circuit design in this paper. The primary objective is to determine the device sizes for the optimal performance of analog circuits for a given set of specifications. Our methodology entails employing machine learning models and spice simulations to direct the optimisation algorithm towards achieving the optimal design for analog circuits. Machine learning based global offline surrogate models, with the circuit design parameters as the input, are built in the design space for the analog circuits under study and is used to guide the optimisation algorithm, resulting in faster convergence and a reduced number of spice simulations. Multi-layer perceptron and random forest regressors are employed to predict the required design specifications of the analog circuit. Since the saturation condition of transistors is vital in the proper working of analog circuits, multi-layer perceptron classifiers are used to predict the saturation condition of each transistor in the circuit. The feasibility of the candidate solutions is verified using machine learning models before invoking spice simulations. We validate the proposed framework using three circuit topologies--a bandgap reference, a folded cascode operational amplifier, and a two-stage operational amplifier. The simulation results show better optimum values and lower standard deviations for fitness functions after convergence. Incorporating the machine learning-based predictions proposed in the optimisation method has resulted in the reduction of spice calls by 56%, 59%, and 83% when compared with standard approaches in the three test cases considered in the study. △ Less

Submitted 26 February, 2024; originally announced April 2024.

arXiv:2404.02872 [pdf, other]

Integrating Explanations in Learning LTL Specifications from Demonstrations

Authors: Ashutosh Gupta, John Komp, Abhay Singh Rajput, Krishna Shankaranarayanan, Ashutosh Trivedi, Namrita Varshney

Abstract: This paper investigates whether recent advances in Large Language Models (LLMs) can assist in translating human explanations into a format that can robustly support learning Linear Temporal Logic (LTL) from demonstrations. Both LLMs and optimization-based methods can extract LTL specifications from demonstrations; however, they have distinct limitations. LLMs can quickly generate solutions and inc… ▽ More This paper investigates whether recent advances in Large Language Models (LLMs) can assist in translating human explanations into a format that can robustly support learning Linear Temporal Logic (LTL) from demonstrations. Both LLMs and optimization-based methods can extract LTL specifications from demonstrations; however, they have distinct limitations. LLMs can quickly generate solutions and incorporate human explanations, but their lack of consistency and reliability hampers their applicability in safety-critical domains. On the other hand, optimization-based methods do provide formal guarantees but cannot process natural language explanations and face scalability challenges. We present a principled approach to combining LLMs and optimization-based methods to faithfully translate human explanations and demonstrations into LTL specifications. We have implemented a tool called Janaka based on our approach. Our experiments demonstrate the effectiveness of combining explanations with demonstrations in learning LTL specifications through several case studies. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 21 Pages, 13 Page Appendix

ACM Class: I.2.8

arXiv:2404.02603 [pdf]

A rare simultaneous detection of a mid-latitude plasma depleted structure in O($^1$D) 630.0 nm and O($^1$S) 557.7 nm all-sky airglow images on a geomagnetically quiet night

Authors: D. Patgiri, R. Rathi, V. Yadav, D. Chakrabarty, M. V. Sunil Krishna, S. Kannaujiya, P. Pavan Chaitanya, A. K. Patra, Jann-Yenq Liu, S. Sarkhel

Abstract: In general, nighttime thermospheric 557.7 nm emission over mid-latitudes is predominantly masked by significantly larger mesospheric component, and hence, F-region plasma structures are rarely observed in this emission. This paper reports the first rare simultaneous detection of F-region plasma depleted structure in O($^1$D) 630.0 nm and O($^1$S) 557.7 nm airglow images from Hanle, India, a mid-la… ▽ More In general, nighttime thermospheric 557.7 nm emission over mid-latitudes is predominantly masked by significantly larger mesospheric component, and hence, F-region plasma structures are rarely observed in this emission. This paper reports the first rare simultaneous detection of F-region plasma depleted structure in O($^1$D) 630.0 nm and O($^1$S) 557.7 nm airglow images from Hanle, India, a mid-latitude station (32.7°N, 78.9°E; Mlat. ~24.1°N) on a geomagnetically quiet night (Ap=3) of 26 June 2021. This indicates significant enhancement of thermospheric 557.7 nm emission. Interestingly, thermospheric 557.7 nm emission was not significant on the following geomagnetically quiet night as MSTID bands were only observed in 630.0 nm images. We show that enhanced dissociative recombination caused by descent of F-layer peak over the observation region coupled with the significant increase of the electron density at thermospheric 557.7 nm emission altitude enabled the detection of the plasma depleted structure on 26 June 2021. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.02145 [pdf, other]

Iterated Learning Improves Compositionality in Large Vision-Language Models

Authors: Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi, Ranjay Krishna

Abstract: A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, recent investigations find that most-if not all-our state-of-the-art vision-language models struggle at compositionality. They are unable to distinguish between images of " a girl in white facing a man… ▽ More A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, recent investigations find that most-if not all-our state-of-the-art vision-language models struggle at compositionality. They are unable to distinguish between images of " a girl in white facing a man in black" and "a girl in black facing a man in white". Moreover, prior work suggests that compositionality doesn't arise with scale: larger model sizes or training data don't help. This paper develops a new iterated training algorithm that incentivizes compositionality. We draw on decades of cognitive science research that identifies cultural transmission-the need to teach a new generation-as a necessary inductive prior that incentivizes humans to develop compositional languages. Specifically, we reframe vision-language contrastive learning as the Lewis Signaling Game between a vision agent and a language agent, and operationalize cultural transmission by iteratively resetting one of the agent's weights during training. After every iteration, this training paradigm induces representations that become "easier to learn", a property of compositional languages: e.g. our model trained on CC3M and CC12M improves standard CLIP by 4.7%, 4.0% respectfully in the SugarCrepe benchmark. △ Less

Submitted 16 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.00910 [pdf, ps, other]

Unexpected Uncertainty Principle for Disc Banach Spaces

Authors: K. Mahesh Krishna

Abstract: Let $(\{f_n\}_{n=1}^\infty, \{τ_n\}_{n=1}^\infty)$ and $(\{g_n\}_{n=1}^\infty, \{ω_n\}_{n=1}^\infty)$ be unbounded continuous p-Schauder frames ($0<p<1$) for a disc Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB} (1) \quad \quad \quad \quad \|θ_f x\|_0\|θ_g x\|_0 \geq \frac{1}{\left(\displaystyle\sup_{n… ▽ More Let $(\{f_n\}_{n=1}^\infty, \{τ_n\}_{n=1}^\infty)$ and $(\{g_n\}_{n=1}^\infty, \{ω_n\}_{n=1}^\infty)$ be unbounded continuous p-Schauder frames ($0<p<1$) for a disc Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB} (1) \quad \quad \quad \quad \|θ_f x\|_0\|θ_g x\|_0 \geq \frac{1}{\left(\displaystyle\sup_{n,m \in \mathbb{N} }|f_n(ω_m)|\right)^p\left(\displaystyle\sup_{n, m \in \mathbb{N}}|g_m(τ_n)|\right)^p}, \end{align} where \begin{align*} & θ_f: \mathcal{D}(θ_f) \ni x \mapsto θ_fx := \{f_n(x)\}_{n=1}^\infty\in \ell^p(\mathbb{N}), \quad θ_g: \mathcal{D}(θ_g) \ni x \mapsto θ_gx := \{g_n(x)\}_{n=1}^\infty\in \ell^p(\mathbb{N}). \end{align*} Inequality (1) is unexpectedly different from both bounded uncertainty principle arXiv:2308.00312v1 and unbounded uncertainty principle arXiv:2312.00366v1 for Banach spaces. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 6 Pages, 0 Figures

MSC Class: 42C15

arXiv:2404.00773 [pdf, other]

Half-sided Translations and the Information Recovery from Radiation

Authors: Krishna Jalan, Roji Pius, Manish Ramchander

Abstract: The island paradigm asserts that after the Page time the operators in the interior of an AdS2 eternal black hole in equilibrium with a finite temperature non-gravitating bath can not be reconstructed using the operators in the black hole region outside the horizon. In a recent paper, we demonstrated this using the black hole interior reconstruction proposal due to Leutheusser and Liu, based on the… ▽ More The island paradigm asserts that after the Page time the operators in the interior of an AdS2 eternal black hole in equilibrium with a finite temperature non-gravitating bath can not be reconstructed using the operators in the black hole region outside the horizon. In a recent paper, we demonstrated this using the black hole interior reconstruction proposal due to Leutheusser and Liu, based on the half-sided translations. This was done by introducing a notion of the reduced half-sided translations associated with the algebra of operators restricted to the black hole region outside the horizon, and by showing that albeit the reduced half-sided translations translate operators in the black hole region outside the horizon to the black hole interior before the Page time, it fails to do so after the Page time. In this paper, we demonstrate the second assertion of the island paradigm, which states that after the Page time the operators in the black hole interior can be reconstructed using the operators in the bath. We show that even though before the Page time the reduced half-sided translations associated with the algebra of operators restricted to the bath do not translate operators in the bath to the black hole interior, after the Page time they take them to the black hole interior. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: 24 pages, 9 figures. arXiv admin note: text overlap with arXiv:2312.11085

arXiv:2403.20116 [pdf, other]

LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving

Authors: Pranjal Paul, Anant Garg, Tushar Choudhary, Arun Kumar Singh, K. Madhava Krishna

Abstract: Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a set of control actions as a reactive solution for closed-loop planning based on their rich scene comprehension. However, these estimations are coarse and are subjective to their "world understanding" which may generate sub-optimal decisions due to perception errors. In this paper, we introduce LeGo-Drive, wh… ▽ More Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a set of control actions as a reactive solution for closed-loop planning based on their rich scene comprehension. However, these estimations are coarse and are subjective to their "world understanding" which may generate sub-optimal decisions due to perception errors. In this paper, we introduce LeGo-Drive, which aims to address this issue by estimating a goal location based on the given language command as an intermediate representation in an end-to-end setting. The estimated goal might fall in a non-desirable region, like on top of a car for a parking-like command, leading to inadequate planning. Hence, we propose to train the architecture in an end-to-end manner, resulting in iterative refinement of both the goal and the trajectory collectively. We validate the effectiveness of our method through comprehensive experiments conducted in diverse simulated environments. We report significant improvements in standard autonomous driving metrics, with a goal reaching Success Rate of 81%. We further showcase the versatility of LeGo-Drive across different driving scenarios and linguistic inputs, underscoring its potential for practical deployment in autonomous vehicles and intelligent transportation systems. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.19797 [pdf, other]

Efficient 3D Instance Map** and Localization with Neural Fields

Authors: George Tang, Krishna Murthy Jatavallabhula, Antonio Torralba

Abstract: We tackle the problem of learning an implicit scene representation for 3D instance segmentation from a sequence of posed RGB images. Towards this, we introduce 3DIML, a novel framework that efficiently learns a label field that may be rendered from novel viewpoints to produce view-consistent instance segmentation masks. 3DIML significantly improves upon training and inference runtimes of existing… ▽ More We tackle the problem of learning an implicit scene representation for 3D instance segmentation from a sequence of posed RGB images. Towards this, we introduce 3DIML, a novel framework that efficiently learns a label field that may be rendered from novel viewpoints to produce view-consistent instance segmentation masks. 3DIML significantly improves upon training and inference runtimes of existing implicit scene representation based methods. Opposed to prior art that optimizes a neural field in a self-supervised manner, requiring complicated training procedures and loss function design, 3DIML leverages a two-phase process. The first phase, InstanceMap, takes as input 2D segmentation masks of the image sequence generated by a frontend instance segmentation model, and associates corresponding masks across images to 3D labels. These almost view-consistent pseudolabel masks are then used in the second phase, InstanceLift, to supervise the training of a neural label field, which interpolates regions missed by InstanceMap and resolves ambiguities. Additionally, we introduce InstanceLoc, which enables near realtime localization of instance masks given a trained label field and an off-the-shelf image segmentation model by fusing outputs from both. We evaluate 3DIML on sequences from the Replica and ScanNet datasets and demonstrate 3DIML's effectiveness under mild assumptions for the image sequences. We achieve a large practical speedup over existing implicit scene representation methods with comparable quality, showcasing its potential to facilitate faster and more effective 3D scene understanding. △ Less

Submitted 31 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19477 [pdf, other]

Real-time Geoinformation Systems to Improve the Quality, Scalability, and Cost of Internet of Things for Agri-environment Research

Authors: Bryan C. Runck, Bobby Schulz, Jeff Bishop, Nathan Carlson, Bryan Chantigian, Gary Deters, Jesse Erdmann, Patrick M. Ewing, Michael Felzan, Xiao Fu, Jan Greyling, Christopher J. Hogan, Andrew Hollman, Ali Joglekar, Kris Junker, Michael Kantar, Lumbani Kaunda, Mohana Krishna, Benjamin Lynch, Peter Marchetto, Megan Marsolek, Troy McKay, Brad Morris, Ali Rashid Niaghi, Keerthi Pamulaparthy , et al. (19 additional authors not shown)

Abstract: With the increasing emphasis on machine learning and artificial intelligence to drive knowledge discovery in the agricultural sciences, spatial internet of things (IoT) technologies have become increasingly important for collecting real-time, high resolution data for these models. However, managing large fleets of devices while maintaining high data quality remains an ongoing challenge as scientis… ▽ More With the increasing emphasis on machine learning and artificial intelligence to drive knowledge discovery in the agricultural sciences, spatial internet of things (IoT) technologies have become increasingly important for collecting real-time, high resolution data for these models. However, managing large fleets of devices while maintaining high data quality remains an ongoing challenge as scientists iterate from prototype to mature end-to-end applications. Here, we provide a set of case studies using the framework of technology readiness levels for an open source spatial IoT system. The spatial IoT systems underwent 3 major and 14 minor system versions, had over 2,727 devices manufactured both in academic and commercial contexts, and are either in active or planned deployment across four continents. Our results show the evolution of a generalizable, open source spatial IoT system designed for agricultural scientists, and provide a model for academic researchers to overcome the challenges that exist in going from one-off prototypes to thousands of internet-connected devices. △ Less

Submitted 2 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures, 1 table

arXiv:2403.19377 [pdf, ps, other]

de Branges-Rovnyak spaces which are complete Nevanlinna-Pick spaces

Authors: Hamidul Ahmed, B. Krishna Das, Samir Panja

Abstract: We consider de Branges-Rovnyak spaces of a considerably large class of reproducing kernel Hilbert spaces and find a characterization for them to be complete Nevanlinna-Pick spaces. This extends as well as recovers earlier characterizations obtained for the Hardy space over the unit disc (\cite{Chu}) as well as for the Drury-Arveson space over the unit ball (\cite{Jesse}). Our characterization take… ▽ More We consider de Branges-Rovnyak spaces of a considerably large class of reproducing kernel Hilbert spaces and find a characterization for them to be complete Nevanlinna-Pick spaces. This extends as well as recovers earlier characterizations obtained for the Hardy space over the unit disc (\cite{Chu}) as well as for the Drury-Arveson space over the unit ball (\cite{Jesse}). Our characterization takes a complete form for the particular cases of the Hardy space over the polydisc and the Bergman space over the disc. We show that a non-trivial de Branges-Rovnyak space, associated to a contractive multiplier, of the Hardy space over the bidisc or the Bergman space over the unit disc is a complete Nevanlinna-Pick space if and only if it is isometrically isomorphic to the Hardy space over the unit disc. On the contrary, it is shown that non-trivial de Branges-Rovnyak spaces of the Hardy space over the $n$-disc with $n\ge 3$ are never complete Nevanlinna-Pick spaces. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 21 pages, Comments are Welcome

arXiv:2403.18623 [pdf, other]

Antitrust, Amazon, and Algorithmic Auditing

Authors: Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Jens Frankenreiter, Stefan Bechtold, Krishna P. Gummadi

Abstract: In digital markets, antitrust law and special regulations aim to ensure that markets remain competitive despite the dominating role that digital platforms play today in everyone's life. Unlike traditional markets, market participant behavior is easily observable in these markets. We present a series of empirical investigations into the extent to which Amazon engages in practices that are typically… ▽ More In digital markets, antitrust law and special regulations aim to ensure that markets remain competitive despite the dominating role that digital platforms play today in everyone's life. Unlike traditional markets, market participant behavior is easily observable in these markets. We present a series of empirical investigations into the extent to which Amazon engages in practices that are typically described as self-preferencing. We discuss how the computer science tools used in this paper can be used in a regulatory environment that is based on algorithmic auditing and requires regulating digital markets at scale. △ Less

Submitted 25 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: The paper has been accepted to appear at Journal of Institutional and Theoretical Economics (JITE) 2024

arXiv:2403.18475 [pdf]

Record cryogenic cooling in ferroelectric hafnia proximity induced via Mott transition

Authors: Jalaja M A, Shubham Kumar Parate, Binoy Krishna De, Sai Dutt K, Pavan Nukala

Abstract: On-chip refrigeration at cryogenic temperatures is becoming an important requirement in the context of quantum technologies and nanoelectronics. Ferroic materials with enhanced electrocaloric effects at phase transitions are good material candidates for the same. By exploiting the Mott metal-insulator transition (MIT) of TiOx(Ny), the bottom electrode, we engineer a depolarization field controlled… ▽ More On-chip refrigeration at cryogenic temperatures is becoming an important requirement in the context of quantum technologies and nanoelectronics. Ferroic materials with enhanced electrocaloric effects at phase transitions are good material candidates for the same. By exploiting the Mott metal-insulator transition (MIT) of TiOx(Ny), the bottom electrode, we engineer a depolarization field controlled reversible polar to non-polar phase transition in thick La-doped hafnia (40 nm). This transition occurs between ~125 and 140 K and produces giant negative pyroelectric and electrocaloric effects. Refrigeration metrics were estimated between 120 to 200 K, with a peak refrigerant capacity of 25 kJ Kg-1 (2 kJ Kg-1), peak isothermal entropy ΔS~ 8 kJ Kg-1 K-1 (0.5 kJ Kg-1 K-1) and adiabatic ΔTcooling ~ 106 K (11 K) at ~140 K and 5 MV cm-1 (0.5 MV cm-1, and these are the largest reported in any electrocaloric system. Our work fundamentally proposes design guidelines to induce significant solid-state refrigeration through proximity effects, even at cryogenic temperatures relevant to quantum technologies. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18441 [pdf, other]

Physics and data driven model for prediction of residual stresses in machining

Authors: Rachit Dhar, Ankur Krishna, Bilal Muhammed

Abstract: Predicting residual stresses has always been a topic of significance due to its implications in the development of enhanced materials and better processing conditions. In this work, an analytical model for prediction of residual stresses is developed for orthogonal machining. It consists of three component models for force, temperature and stress computation. The Oxley force model and Waldorf's sl… ▽ More Predicting residual stresses has always been a topic of significance due to its implications in the development of enhanced materials and better processing conditions. In this work, an analytical model for prediction of residual stresses is developed for orthogonal machining. It consists of three component models for force, temperature and stress computation. The Oxley force model and Waldorf's slip-line model are employed for obtaining cutting force, thrust force, and temperatures at the shear zone and tool-chip interface for the given parameters. The Komanduri-Hou two heat source model is used for obtaining the temperature distribution in the workpiece. The effect of coolant with differing mass flow rates has also been incorporated. The residual stresses are obtained by combining the mechanical and thermal components, followed by the loading and relaxation of the stresses. Optimal values for unknown parameters are predicted by leveraging a cost function. The residual stress distributions obtained give a tensile region near the surface for Inconel 718, and a compressive region for Ti6Al4V, which are in line with experimental results found in literature. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.17946 [pdf, ps, other]

Nonlinear Heisenberg-Robertson-Schrodinger Uncertainty Principle

Authors: K. Mahesh Krishna

Abstract: We derive an uncertainty principle for Lipschitz maps acting on subsets of Banach spaces. We show that this nonlinear uncertainty principle reduces to the Heisenberg-Robertson-Schrodinger uncertainty principle for linear operators acting on Hilbert spaces. We derive an uncertainty principle for Lipschitz maps acting on subsets of Banach spaces. We show that this nonlinear uncertainty principle reduces to the Heisenberg-Robertson-Schrodinger uncertainty principle for linear operators acting on Hilbert spaces. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 4 Pages, 0 Figures

MSC Class: 26A16; 46B99

arXiv:2403.16016 [pdf, other]

Fill in the ____ (a Diffusion-based Image Inpainting Pipeline)

Authors: Eyoel Gebre, Krishna Saxena, Timothy Tran

Abstract: Image inpainting is the process of taking an image and generating lost or intentionally occluded portions. Inpainting has countless applications including restoring previously damaged pictures, restoring the quality of images that have been degraded due to compression, and removing unwanted objects/text. Modern inpainting techniques have shown remarkable ability in generating sensible completions… ▽ More Image inpainting is the process of taking an image and generating lost or intentionally occluded portions. Inpainting has countless applications including restoring previously damaged pictures, restoring the quality of images that have been degraded due to compression, and removing unwanted objects/text. Modern inpainting techniques have shown remarkable ability in generating sensible completions for images with mask occlusions. In our paper, an overview of the progress of inpainting techniques will be provided, along with identifying current leading approaches, focusing on their strengths and weaknesses. A critical gap in these existing models will be addressed, focusing on the ability to prompt and control what exactly is generated. We will additionally justify why we think this is the natural next progressive step that inpainting models must take, and provide multiple approaches to implementing this functionality. Finally, we will evaluate the results of our approaches by qualitatively checking whether they generate high-quality images that correctly inpaint regions with the objects that they are instructed to produce. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.14820 [pdf, other]

Competition for binding targets results in paradoxical effects for simultaneous activator and repressor action -- Extended Version

Authors: M. Ali Al-Radhawi, Krishna Manoj, Dhruv D. Jatkar, Alon Duvall, Domitilla Del Vecchio, Eduardo D. Sontag

Abstract: In the context of epigenetic transformations in cancer metastasis, a puzzling effect was recently discovered, in which the elimination (knock-out) of an activating regulatory element leads to increased (rather than decreased) activity of the element being regulated. It has been postulated that this paradoxical behavior can be explained by activating and repressing transcription factors competing f… ▽ More In the context of epigenetic transformations in cancer metastasis, a puzzling effect was recently discovered, in which the elimination (knock-out) of an activating regulatory element leads to increased (rather than decreased) activity of the element being regulated. It has been postulated that this paradoxical behavior can be explained by activating and repressing transcription factors competing for binding to other possible targets. It is very difficult to prove this hypothesis in mammalian cells, due to the large number of potential players and the complexity of endogenous intracellular regulatory networks. Instead, this paper analyzes this issue through an analogous synthetic biology construct which aims to reproduce the paradoxical behavior using standard bacterial gene expression networks. The paper first reviews the motivating cancer biology work, and then describes a proposed synthetic construct. A mathematical model is formulated, and basic properties of uniqueness of steady states and convergence to equilibria are established, as well as an identification of parameter regimes which should lead to observing such paradoxical phenomena (more activator leads to less activity at steady state). A proof is also given to show that this is a steady-state property, and for initial transients the phenomenon will not be observed. This work adds to the general line of work of resource competition in synthetic circuits. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 14 pages, 8 figures

arXiv:2403.14669 [pdf]

Large-Scale Evaluation of Mobility, Technology and Demand Scenarios in the Chicago Region Using POLARIS

Authors: Joshua Auld, Jamie Cook, Krishna Murthy Gurumurthy, Nazmul Khan, Charbel Mansour, Aymeric Rousseau, Olcay Sahin, Felipe de Souza, Omer Verbas, Natalia Zuniga-Garcia

Abstract: Rapid technological progress and innovation in the areas of vehicle connectivity, automation and electrification, new modes of shared and alternative mobility, and advanced transportation system demand and supply management strategies, have motivated numerous questions and studies regarding the potential impact on key performance and equity metrics. Several of these areas of development may or may… ▽ More Rapid technological progress and innovation in the areas of vehicle connectivity, automation and electrification, new modes of shared and alternative mobility, and advanced transportation system demand and supply management strategies, have motivated numerous questions and studies regarding the potential impact on key performance and equity metrics. Several of these areas of development may or may not have a synergistic outcome on the overall benefits such as reduction in congestion and travel times. In this study, the use of an end-to-end modeling workflow centered around an activity-based agent-based travel demand forecasting tool called POLARIS is explored to provide insights on the effects of several different technology deployments and operational policies in combination for the Chicago region. The objective of the research was to explore the direct impacts and observe any interactions between the various policy and technology scenarios to help better characterize and evaluate their potential future benefits. We analyze system outcome metrics on mobility, energy and emissions, equity and environmental justice and overall efficiency for a scenario design of experiments that looks at combinations of supply interventions (congestion pricing, transit expansion, tnc policy, off-hours freight policy, connected signal optimization) for different potential demand scenarios defined by e-commerce and on-demand delivery engagement, and market penetration of electric vehicles. We found different combinations of strategies that can reduce overall travel times up to 7% and increase system efficiency up to 53% depending on how various metrics are prioritized. The results demonstrate the importance of considering various interventions jointly. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.14617 [pdf, other]

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

Authors: Xiang Fan, Anand Bhattad, Ranjay Krishna

Abstract: We introduce Videoshop, a training-free video editing algorithm for localized semantic edits. Videoshop allows users to use any editing software, including Photoshop and generative inpainting, to modify the first frame; it automatically propagates those changes, with semantic, spatial, and temporally consistent motion, to the remaining frames. Unlike existing methods that enable edits only through… ▽ More We introduce Videoshop, a training-free video editing algorithm for localized semantic edits. Videoshop allows users to use any editing software, including Photoshop and generative inpainting, to modify the first frame; it automatically propagates those changes, with semantic, spatial, and temporally consistent motion, to the remaining frames. Unlike existing methods that enable edits only through imprecise textual instructions, Videoshop allows users to add or remove objects, semantically change objects, insert stock photos into videos, etc. with fine-grained control over locations and appearance. We achieve this through image-based video editing by inverting latents with noise extrapolation, from which we generate videos conditioned on the edited image. Videoshop produces higher quality edits against 6 baselines on 2 editing benchmarks using 10 evaluation metrics. △ Less

Submitted 22 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: Project page at https://videoshop-editing.github.io/

arXiv:2403.14038 [pdf, other]

PureConnect: A Localized Social Media System to Increase Awareness and Connectedness in Environmental Justice Communities

Authors: Omar Hammad, Md Rezwanur Rahman, Gopala Krishna Vasanth Kanugo, Nicholas Clements, Shelly Miller, Shivakant Mishra, Esther Sullivan

Abstract: Frequent disruptions like highway constructions are common now-a-days, often impacting environmental justice communities (communities with low socio-economic status with disproportionately high and adverse human health and environmental effects) that live nearby. Based on our interactions via focus groups with the members of four environmental justice communities impacted by a major highway constr… ▽ More Frequent disruptions like highway constructions are common now-a-days, often impacting environmental justice communities (communities with low socio-economic status with disproportionately high and adverse human health and environmental effects) that live nearby. Based on our interactions via focus groups with the members of four environmental justice communities impacted by a major highway construction, a common concern is a sense of uncertainty about project activities and loss of social connectedness, leading to increased stress, depression, anxiety and diminished well-being. This paper addresses this concern by develo** a localized social media system called PureConnect with a goal to raise the level of awareness about the project and increase social connectedness among the community members. PureConnect has been designed using active engagement with four environmental justice communities affected by a major highway construction. It has been deployed in the real world among the members of the four environmental justice communities, and a detailed analysis of the data collected from this deployment as well as surveys show that PureConnect is potentially useful in improving community members' well-being and the members appreciate the functionalities it provides. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Submitted in COMPSAC 2024

arXiv:2403.14026 [pdf, other]

Modal reduction principles: a parametric shift to graphs

Authors: Willem Conradie, Krishna Manoorkar, Alessandra Palmigiano, Mattia Panettiere

Abstract: Graph-based frames have been introduced as a logical framework which internalizes an inherent boundary to knowability. They also support the interpretation of lattice-based (modal) logics as hyper-constructive logics of evidential reasoning. Conceptually, the present paper proposes graph-based frames as a formal framework suitable for generalizing Pawlak's rough set theory to a setting in which in… ▽ More Graph-based frames have been introduced as a logical framework which internalizes an inherent boundary to knowability. They also support the interpretation of lattice-based (modal) logics as hyper-constructive logics of evidential reasoning. Conceptually, the present paper proposes graph-based frames as a formal framework suitable for generalizing Pawlak's rough set theory to a setting in which inherent limits to knowability need to be considered. Technically, the present paper establishes systematic connections between the first-order correspondents of Sahlqvist modal reduction principles on Kripke frames, and on the more general relational environments of graph-based and polarity-based frames. This work is part of a research line aiming at: (a) comparing and inter-relating the various (first-order) conditions corresponding to a given (modal) axiom in different relational semantics (b) recognizing when first-order sentences in the frame-correspondence languages of different relational structures encode the same modal content (c) meaningfully transferring relational properties across different semantic contexts. The present paper develops these results for the graph-based semantics, polarity-based semantics, and all Sahlqvist modal reduction principles. As an application, we study well known modal axioms in rough set theory on graph-based frames and show that, although these axioms correspond to different first-order conditions on graph-based frames, their intuitive meaning is retained.This allows us to introduce the notion of hyperconstructivist approximation spaces as the subclass of graph-based frames defined by the first-order conditions corresponding to the same modal axioms defining classical generalized approximation spaces, and to transfer the properties and the intuitive understanding of different approximation spaces to graph-based frames. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13695 [pdf, other]

doi 10.1109/ICEFEET59656.2023.10452217

Loss Regularizing Robotic Terrain Classification

Authors: Shakti Deo Kumar, Sudhanshu Tripathi, Krishna Ujjwal, Sarvada Sakshi Jha, Suddhasil De

Abstract: Locomotion mechanics of legged robots are suitable when pacing through difficult terrains. Recognising terrains for such robots are important to fully yoke the versatility of their movements. Consequently, robotic terrain classification becomes significant to classify terrains in real time with high accuracy. The conventional classifiers suffer from overfitting problem, low accuracy problem, high… ▽ More Locomotion mechanics of legged robots are suitable when pacing through difficult terrains. Recognising terrains for such robots are important to fully yoke the versatility of their movements. Consequently, robotic terrain classification becomes significant to classify terrains in real time with high accuracy. The conventional classifiers suffer from overfitting problem, low accuracy problem, high variance problem, and not suitable for live dataset. On the other hand, classifying a growing dataset is difficult for convolution based terrain classification. Supervised recurrent models are also not practical for this classification. Further, the existing recurrent architectures are still evolving to improve accuracy of terrain classification based on live variable-length sensory data collected from legged robots. This paper proposes a new semi-supervised method for terrain classification of legged robots, avoiding preprocessing of long variable-length dataset. The proposed method has a stacked Long Short-Term Memory architecture, including a new loss regularization. The proposed method solves the existing problems and improves accuracy. Comparison with the existing architectures show the improvements. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Preliminary draft of the work published in IEEE conference 2023

arXiv:2403.12410 [pdf]

TikTok and the Art of Personalization: Investigating Exploration and Exploitation on Social Media Feeds

Authors: Karan Vombatkere, Sepehr Mousavi, Savvas Zannettou, Franziska Roesner, Krishna P. Gummadi

Abstract: Recommendation algorithms for social media feeds often function as black boxes from the perspective of users. We aim to detect whether social media feed recommendations are personalized to users, and to characterize the factors contributing to personalization in these feeds. We introduce a general framework to examine a set of social media feed recommendations for a user as a timeline. We label it… ▽ More Recommendation algorithms for social media feeds often function as black boxes from the perspective of users. We aim to detect whether social media feed recommendations are personalized to users, and to characterize the factors contributing to personalization in these feeds. We introduce a general framework to examine a set of social media feed recommendations for a user as a timeline. We label items in the timeline as the result of exploration vs. exploitation of the user's interests on the part of the recommendation algorithm and introduce a set of metrics to capture the extent of personalization across user timelines. We apply our framework to a real TikTok dataset and validate our results using a baseline generated from automated TikTok bots, as well as a randomized baseline. We also investigate the extent to which factors such as video viewing duration, liking, and following drive the personalization of content on TikTok. Our results demonstrate that our framework produces intuitive and explainable results, and can be used to audit and understand personalization in social media feeds. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: ACM Web Conference 2024

arXiv:2403.12331 [pdf, other]

Deep Few-view High-resolution Photon-counting Extremity CT at Halved Dose for a Clinical Trial

Authors: Mengzhou Li, Chuang Niu, Ge Wang, Maya R Amma, Krishna M Chapagain, Stefan Gabrielson, Andrew Li, Kevin Jonker, Niels de Ruiter, Jennifer A Clark, Phil Butler, Anthony Butler, Hengyong Yu

Abstract: The latest X-ray photon-counting computed tomography (PCCT) for extremity allows multi-energy high-resolution (HR) imaging for tissue characterization and material decomposition. However, both radiation dose and imaging speed need improvement for contrast-enhanced and other studies. Despite the success of deep learning methods for 2D few-view reconstruction, applying them to HR volumetric reconstr… ▽ More The latest X-ray photon-counting computed tomography (PCCT) for extremity allows multi-energy high-resolution (HR) imaging for tissue characterization and material decomposition. However, both radiation dose and imaging speed need improvement for contrast-enhanced and other studies. Despite the success of deep learning methods for 2D few-view reconstruction, applying them to HR volumetric reconstruction of extremity scans for clinical diagnosis has been limited due to GPU memory constraints, training data scarcity, and domain gap issues. In this paper, we propose a deep learning-based approach for PCCT image reconstruction at halved dose and doubled speed in a New Zealand clinical trial. Particularly, we present a patch-based volumetric refinement network to alleviate the GPU memory limitation, train network with synthetic data, and use model-based iterative refinement to bridge the gap between synthetic and real-world data. The simulation and phantom experiments demonstrate consistently improved results under different acquisition conditions on both in- and off-domain structures using a fixed network. The image quality of 8 patients from the clinical trial are evaluated by three radiologists in comparison with the standard image reconstruction with a full-view dataset. It is shown that our proposed approach is essentially identical to or better than the clinical benchmark in terms of diagnostic image quality scores. Our approach has a great potential to improve the safety and efficiency of PCCT without compromising image quality. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 9 figures, 5 tables

arXiv:2403.11778 [pdf, other]

Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms

Authors: Jonat John Mathew, Rakin Ahsan, Sae Furukawa, Jagdish Gautham Krishna Kumar, Huzaifa Pallan, Agamjeet Singh Padda, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

Abstract: Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two… ▽ More Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two deepfake audio detection models based on Resnet and LCNN architectures are implemented using the ASVspoof 2019 dataset, achieving benchmark performances compared to ASVspoof 2019 challenge baselines. The study proposes strategies and frameworks for enhancing these models, paving the way for real-time deepfake audio detection in communication platforms. This work contributes to the advancement of audio stream security, ensuring robust detection capabilities in dynamic, real-time communication scenarios. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11580 [pdf, other]

Investigation of magnetic order influenced phonon and electron dynamics in MnBi$_{2}$Te$_{4}$ and Sb doped MnBi$_{2}$Te$_{4}$ through terahertz time-domain spectroscopy

Authors: Soumya Mukherjee, Anjan Kumar NM, Subhadip Manna, Sambhu G Nath, Radha Krishna Gopal, Chiranjib Mitra, N. Kamaraju

Abstract: MnBi$_{2}$Te$_{4}$, the first topological insulator with inherent magnetic ordering, has attracted significant attention recently for providing a platform to realize several exotic quantum phenomena at relatively higher temperatures. In this work, we have carried out an exhaustive investigation of MnBi$_{2}$Te$_{4}$ and Sb doped MnBi$_{2}$Te$_{4}$ thin films using THz time-domain spectroscopy. The… ▽ More MnBi$_{2}$Te$_{4}$, the first topological insulator with inherent magnetic ordering, has attracted significant attention recently for providing a platform to realize several exotic quantum phenomena at relatively higher temperatures. In this work, we have carried out an exhaustive investigation of MnBi$_{2}$Te$_{4}$ and Sb doped MnBi$_{2}$Te$_{4}$ thin films using THz time-domain spectroscopy. The extracted real THz conductivity displays a strong IR active E$_u$ phonon absorption peak (at $\sim$1.5 THz) merged on top of the Drude-like contributions from bulk and surface electrons. The extracted parameters from the THz conductivity data fitted to the Drude-Fano-Lorentz model, show significant changes in their temperature dependence around the magnetic ordering Néel temperature of $\sim$ 25K, which is suggestive of the coupling between magnetic ordering and electronic band structure. The frequency of the E$_u$ phonon displays an anomalous blue-shift with increasing temperatures by $\sim$ 0.1 THz ($\sim$7 %) for MnBi$_{2}$Te$_{4}$ and $\sim$0.2 THz ($\sim$13 %) for Sb doped MnBi$_{2}$Te$_{4}$ between 7K and 250K. The line-shape of the E$_u$ phonon mode in Sb doped MnBi$_{2}$Te$_{4}$ shows significant Fano asymmetry compared to that of MnBi$_{2}$Te$_{4}$, indicating that Sb do** plays an important role in the Fano interference between the phonons and the electrons, in this system. These results indicate that the anomalous phonon behaviour seen in MBT arise mainly from positive cubic anharmonicity induced self energy parameter, whereas both anharmonicity and the electron phonon coupling are at play in making the relatively higher anomalous blue shift of phonons in MBST. Our studies provide the first comprehensive understanding of the phonon and electron dynamics of MnBi$_{2}$Te$_{4}$ and Sb doped MnBi$_{2}$Te$_{4}$ in the THz range using time-domain THz spectroscopy. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11085 [pdf, other]

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

Authors: Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna

Abstract: Real-world multi-modal problems are rarely solved by a single machine learning model, and often require multi-step computational plans that involve stitching several models. Tool-augmented LLMs hold tremendous promise for automating the generation of such computational plans. However, the lack of standardized benchmarks for evaluating LLMs as planners for multi-step multi-modal tasks has prevented… ▽ More Real-world multi-modal problems are rarely solved by a single machine learning model, and often require multi-step computational plans that involve stitching several models. Tool-augmented LLMs hold tremendous promise for automating the generation of such computational plans. However, the lack of standardized benchmarks for evaluating LLMs as planners for multi-step multi-modal tasks has prevented a systematic study of planner design decisions. Should LLMs generate a full plan in a single shot or step-by-step? Should they invoke tools directly with Python code or through structured data formats like JSON? Does feedback improve planning? To answer these questions and more, we introduce m&m's: a benchmark containing 4K+ multi-step multi-modal tasks involving 33 tools that include multi-modal models, (free) public APIs, and image processing modules. For each of these task queries, we provide automatically generated plans using this realistic toolset. We further provide a high-quality subset of 1,565 task plans that are human-verified and correctly executable. With m&m's, we evaluate 6 popular LLMs with 2 planning strategies (multi-step vs. step-by-step planning), 2 plan formats (JSON vs. code), and 3 types of feedback (parsing/verification/execution). Finally, we summarize takeaways from our extensive experiments. Our dataset and code are available on HuggingFace (https://huggingface.co/datasets/zixianma/mnms) and Github (https://github.com/RAIVNLab/mnms). △ Less

Submitted 21 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11041 [pdf, other]

FAGH: Accelerating Federated Learning with Approximated Global Hessian

Authors: Mrinmay Sen, A. K. Qin, Krishna Mohan C

Abstract: In federated learning (FL), the significant communication overhead due to the slow convergence speed of training the global model poses a great challenge. Specifically, a large number of communication rounds are required to achieve the convergence in FL. One potential solution is to employ the Newton-based optimization method for training, known for its quadratic convergence rate. However, the exi… ▽ More In federated learning (FL), the significant communication overhead due to the slow convergence speed of training the global model poses a great challenge. Specifically, a large number of communication rounds are required to achieve the convergence in FL. One potential solution is to employ the Newton-based optimization method for training, known for its quadratic convergence rate. However, the existing Newton-based FL training methods suffer from either memory inefficiency or high computational costs for local clients or the server. To address this issue, we propose an FL with approximated global Hessian (FAGH) method to accelerate FL training. FAGH leverages the first moment of the approximated global Hessian and the first moment of the global gradient to train the global model. By harnessing the approximated global Hessian curvature, FAGH accelerates the convergence of global model training, leading to the reduced number of communication rounds and thus the shortened training time. Experimental results verify FAGH's effectiveness in decreasing the number of communication rounds and the time required to achieve the pre-specified objectives of the global model performance in terms of training and test losses as well as test accuracy. Notably, FAGH outperforms several state-of-the-art FL training methods. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10751 [pdf, other]

LIGHTCODE: Light Analytical and Neural Codes for Channels with Feedback

Authors: Sravan Kumar Ankireddy, Krishna Narayanan, Hyeji Kim

Abstract: The design of reliable and efficient codes for channels with feedback remains a longstanding challenge in communication theory. While significant improvements have been achieved by leveraging deep learning techniques, neural codes often suffer from high computational costs, a lack of interpretability, and limited practicality in resource-constrained settings. We focus on designing low-complexity c… ▽ More The design of reliable and efficient codes for channels with feedback remains a longstanding challenge in communication theory. While significant improvements have been achieved by leveraging deep learning techniques, neural codes often suffer from high computational costs, a lack of interpretability, and limited practicality in resource-constrained settings. We focus on designing low-complexity coding schemes that are interpretable and more suitable for communication systems. We advance both analytical and neural codes. First, we demonstrate that POWERBLAST, an analytical coding scheme inspired by Schalkwijk-Kailath (SK) and Gallager-Nakiboglu (GN) schemes, achieves notable reliability improvements over both SK and GN schemes, outperforming neural codes in high signal-to-noise ratio (SNR) regions. Next, to enhance reliability in low-SNR regions, we propose LIGHTCODE, a lightweight neural code that achieves state-of-the-art reliability while using a fraction of memory and compute compared to existing deep-learning-based codes. Finally, we systematically analyze the learned codes, establishing connections between LIGHTCODE and POWERBLAST, identifying components crucial for performance, and providing interpretation aided by linear regression analysis. △ Less

Submitted 13 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 13 pages, 11 figures

arXiv:2403.10687 [pdf]

Computational Study on the Impact of Gasoline-Ethanol Blending on Autoignition and Soot/NOx Emissions under Gasoline Compression Ignition Conditions

Authors: Krishna C. Kalvakala, Harsimran Singh, Pinaki Pal, Jorge P. Gonzalez, Christopher P. Kolodziej, Suresh K. Aggarwal

Abstract: Computational fluid dynamics (CFD) simulations of a single-cylinder gasoline compression ignition engine are performed to investigate the impact of gasoline-ethanol blending on autoignition, nitrogen oxide (NOx), and soot emissions under low-load conditions. A four-component toluene primary reference fuel (TPRF) + ethanol (ETPRF) surrogate (with 10% ethanol by volume; E10) is employed to represent… ▽ More Computational fluid dynamics (CFD) simulations of a single-cylinder gasoline compression ignition engine are performed to investigate the impact of gasoline-ethanol blending on autoignition, nitrogen oxide (NOx), and soot emissions under low-load conditions. A four-component toluene primary reference fuel (TPRF) + ethanol (ETPRF) surrogate (with 10% ethanol by volume; E10) is employed to represent the test gasoline (RD5-87). A 3D engine CFD model employing finite-rate chemistry with a skeletal kinetic mechanism, adaptive mesh refinement (AMR), and hybrid method of moments (HMOM) is adopted to capture in-cylinder combustion and soot/NOx emissions. The engine CFD model is validated against experimental data for three gasoline-ethanol blends: E10, E30 and E100, with varying ethanol content by volume. Model validation is carried out for multiple start-of-injection (SOI) timings (-21, -27, -36, and -45 crank angle degrees after top-dead-center (aTDC)) with respect to in-cylinder pressure, heat release rate, combustion phasing, NOx and soot emissions. For late injection timings (-21 and -27oaTDC), E30 yields higher soot than E10; while the trend reverses for early injection cases (-36 and -45oaTDC). E100 yields the lowest amount of soot among all fuels irrespective of SOI timing. Further, E10 shows a non-monotonic trend in soot emissions with SOI timing: SOI-36>SOI-45>SOI-21>SOI-27, while soot emissions from E30 exhibit monotonic decrease with advancing SOI timing. NOx emissions from various fuels follow a trend of E10>E30>E100. NOx emissions increase as SOI timing is advanced for all fuels, with an anomaly for E10 and E100 where NOx decreases when SOI is advanced beyond -36oaTDC. Detailed analysis of the numerical results is performed to investigate the emission trends and elucidate the impact of chemical composition and physical properties on autoignition and emissions characteristics. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.10473 [pdf, ps, other]

Exploring Valence Electron Dynamics of Xenon through Laser-Induced Electron Diffraction

Authors: Fang Liu, Slawomir Skruszewicz, Julian Späthe, Yinyu Zhang, Sebastian Hell, Bo Ying, Gerhard G. Paulus, Bálint Kiss, Krishna Murari, Malin Khalil, Eric Cormier, Li Guang Jiao, Stephan Fritzsche, Matthias Kübel

Abstract: Strong-field ionization can induce electron motion in both the continuum and the valence shell of the parent ion. Here, we explore their interplay by studying laser-induced electron diffraction (LIED) patterns arising from interaction with the potentials of two-hole states of the xenon cation. The quantitative rescattering theory is used to calculate the corresponding photoelectron momentum distri… ▽ More Strong-field ionization can induce electron motion in both the continuum and the valence shell of the parent ion. Here, we explore their interplay by studying laser-induced electron diffraction (LIED) patterns arising from interaction with the potentials of two-hole states of the xenon cation. The quantitative rescattering theory is used to calculate the corresponding photoelectron momentum distributions, providing evidence that the spin-orbit dynamics could be detected by LIED. We identify the contribution of these time-evolving hole states to the angular distribution of the rescattered electrons, particularly noting a distinct change along the backward scattering angles. We benchmark numerical results with experiments using ultrabroad and femtosecond laser pulses centered at \SI{3100}{nm}. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.10059 [pdf, other]

Repoformer: Selective Retrieval for Repository-Level Code Completion

Authors: Di Wu, Wasi Uddin Ahmad, Dejiao Zhang, Murali Krishna Ramanathan, Xiaofei Ma

Abstract: Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. However, the invariable use of retrieval in existing methods exposes issues in both efficiency and robustness, with a large proportion of the retrieved contexts proving unhelpful or harmful to code language models (code LMs). In this paper, we propose a selective RAG framework to a… ▽ More Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. However, the invariable use of retrieval in existing methods exposes issues in both efficiency and robustness, with a large proportion of the retrieved contexts proving unhelpful or harmful to code language models (code LMs). In this paper, we propose a selective RAG framework to avoid retrieval when unnecessary. To power this framework, we design a self-supervised learning approach to enable a code LM to accurately self-evaluate whether retrieval can improve its output quality and robustly leverage the potentially noisy retrieved contexts. Using this LM as both the selective RAG policy and the generation model, our framework achieves state-of-the-art repository-level code completion performance on diverse benchmarks including RepoEval, CrossCodeEval, and CrossCodeLongEval, a new long-form code completion benchmark. Meanwhile, our analyses show that selectively retrieving brings as much as 70% inference speedup in the online serving setting without harming the performance. We further demonstrate that our framework is able to accommodate different generation models, retrievers, and programming languages. These advancements position our framework as an important step towards more accurate and efficient repository-level code completion. △ Less

Submitted 4 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: ICML 2024

arXiv:2403.08845 [pdf, other]

Bifurcated Attention for Single-Context Large-Batch Sampling

Authors: Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Jiacheng Guo, Liangfu Chen, Parminder Bhatia, Ramesh Nallapati, Sudipta Sengupta, Bing Xiang

Abstract: In our study, we present bifurcated attention, a method developed for language model inference in single-context batch sampling contexts. This approach aims to reduce redundant memory IO costs, a significant factor in latency for high batch sizes and long context lengths. Bifurcated attention achieves this by dividing the attention mechanism during incremental decoding into two distinct GEMM opera… ▽ More In our study, we present bifurcated attention, a method developed for language model inference in single-context batch sampling contexts. This approach aims to reduce redundant memory IO costs, a significant factor in latency for high batch sizes and long context lengths. Bifurcated attention achieves this by dividing the attention mechanism during incremental decoding into two distinct GEMM operations, focusing on the KV cache from prefill and the decoding process. This method ensures precise computation and maintains the usual computational load (FLOPs) of standard attention mechanisms, but with reduced memory IO. Bifurcated attention is also compatible with multi-query attention mechanism known for reduced memory IO for KV cache, further enabling higher batch size and context length. The resulting efficiency leads to lower latency, improving suitability for real-time applications, e.g., enabling massively-parallel answer generation without substantially increasing latency, enhancing performance when integrated with postprocessing techniques such as reranking. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08834 [pdf, other]

Predictive Analysis of Tuberculosis Treatment Outcomes Using Machine Learning: A Karnataka TB Data Study at a Scale

Authors: SeshaSai Nath Chinagudaba, Darshan Gera, Krishna Kiran Vamsi Dasu, Uma Shankar S, Kiran K, Anil Singarajpure, Shivayogappa. U, Somashekar N, Vineet Kumar Chadda, Sharath B N

Abstract: Tuberculosis (TB) remains a global health threat, ranking among the leading causes of mortality worldwide. In this context, machine learning (ML) has emerged as a transformative force, providing innovative solutions to the complexities associated with TB treatment.This study explores how machine learning, especially with tabular data, can be used to predict Tuberculosis (TB) treatment outcomes mor… ▽ More Tuberculosis (TB) remains a global health threat, ranking among the leading causes of mortality worldwide. In this context, machine learning (ML) has emerged as a transformative force, providing innovative solutions to the complexities associated with TB treatment.This study explores how machine learning, especially with tabular data, can be used to predict Tuberculosis (TB) treatment outcomes more accurately. It transforms this prediction task into a binary classification problem, generating risk scores from patient data sourced from NIKSHAY, India's national TB control program, which includes over 500,000 patient records. Data preprocessing is a critical component of the study, and the model achieved an recall of 98% and an AUC-ROC score of 0.95 on the validation set, which includes 20,000 patient records.We also explore the use of Natural Language Processing (NLP) for improved model learning. Our results, corroborated by various metrics and ablation studies, validate the effectiveness of our approach. The study concludes by discussing the potential ramifications of our research on TB eradication efforts and proposing potential avenues for future work. This study marks a significant stride in the battle against TB, showcasing the potential of machine learning in healthcare. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08688 [pdf, other]

Token Alignment via Character Matching for Subword Completion

Authors: Ben Athiwaratkun, Shiqi Wang, Mingyue Shang, Yuchen Tian, Zijian Wang, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Rob Kwiatowski, Ramesh Nallapati, Bing Xiang

Abstract: Generative models, widely utilized in various applications, can often struggle with prompts corresponding to partial tokens. This struggle stems from tokenization, where partial tokens fall out of distribution during inference, leading to incorrect or nonsensical outputs. This paper examines a technique to alleviate the tokenization artifact on text completion in generative models, maintaining per… ▽ More Generative models, widely utilized in various applications, can often struggle with prompts corresponding to partial tokens. This struggle stems from tokenization, where partial tokens fall out of distribution during inference, leading to incorrect or nonsensical outputs. This paper examines a technique to alleviate the tokenization artifact on text completion in generative models, maintaining performance even in regular non-subword cases. The method, termed token alignment, involves backtracking to the last complete tokens and ensuring the model's generation aligns with the prompt. This approach showcases marked improvement across many partial token scenarios, including nuanced cases like space-prefix and partial indentation, with only a minor time increase. The technique and analysis detailed in this paper contribute to the continuous advancement of generative models in handling partial inputs, bearing relevance for applications like code completion and text autocompletion. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08345 [pdf, other]

From human experts to machines: An LLM supported approach to ontology and knowledge graph construction

Authors: Vamsi Krishna Kommineni, Birgitta König-Ries, Sheeba Samuel

Abstract: The conventional process of building Ontologies and Knowledge Graphs (KGs) heavily relies on human domain experts to define entities and relationship types, establish hierarchies, maintain relevance to the domain, fill the ABox (or populate with instances), and ensure data quality (including amongst others accuracy and completeness). On the other hand, Large Language Models (LLMs) have recently ga… ▽ More The conventional process of building Ontologies and Knowledge Graphs (KGs) heavily relies on human domain experts to define entities and relationship types, establish hierarchies, maintain relevance to the domain, fill the ABox (or populate with instances), and ensure data quality (including amongst others accuracy and completeness). On the other hand, Large Language Models (LLMs) have recently gained popularity for their ability to understand and generate human-like natural language, offering promising ways to automate aspects of this process. This work explores the (semi-)automatic construction of KGs facilitated by open-source LLMs. Our pipeline involves formulating competency questions (CQs), develo** an ontology (TBox) based on these CQs, constructing KGs using the developed ontology, and evaluating the resultant KG with minimal to no involvement of human experts. We showcase the feasibility of our semi-automated pipeline by creating a KG on deep learning methodologies by exploiting scholarly publications. To evaluate the answers generated via Retrieval-Augmented-Generation (RAG) as well as the KG concepts automatically extracted using LLMs, we design a judge LLM, which rates the generated content based on ground truth. Our findings suggest that employing LLMs could potentially reduce the human effort involved in the construction of KGs, although a human-in-the-loop approach is recommended to evaluate automatically generated KGs. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.07953 [pdf, other]

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

Authors: Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna

Abstract: Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the growing computation need of modern DNNs. However, in practice, sparse DNN acceleration still faces a key challenge. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparse hardware support recently, which provides limited flexibility and requires extra model fine-tun… ▽ More Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the growing computation need of modern DNNs. However, in practice, sparse DNN acceleration still faces a key challenge. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparse hardware support recently, which provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse hardware cannot be accelerated by other structured hardware. To bridge the gap between sparse DNN models and hardware, this paper proposes tensor approximation via structured decomposition (TASD), which leverages the distributive property in linear algebra to turn any sparse tensor into a series of structured sparse tensors. Next, we develop a software framework, TASDER, to accelerate DNNs by searching layer-wise, high-quality structured decomposition for both weight and activation tensors so that they can be accelerated by any systems with structured sparse hardware support. Evaluation results show that, by exploiting prior structured sparse hardware baselines, our method can accelerate off-the-shelf dense and sparse DNNs without fine-tuning and improves energy-delay-product by up to 83% and 74% on average. △ Less

Submitted 31 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07786 [pdf, other]

Generative deep learning-enabled ultra-large field-of-view lens-free imaging

Authors: Ronald B. Liu, Zhe Liu, Max G. A. Wolf, Krishna P. Purohit, Gregor Fritz, Yi Feng, Carsten G. Hansen, Pierre O. Bagnaninchi, Xavier Casadevall i Solvas, Yunjie Yang

Abstract: Advancements in high-throughput biomedical applications necessitate real-time, large field-of-view (FOV) imaging capabilities. Conventional lens-free imaging (LFI) systems, while addressing the limitations of physical lenses, have been constrained by dynamic, hard-to-model optical fields, resulting in a limited one-shot FOV of approximately 20 $mm^2$. This restriction has been a major bottleneck i… ▽ More Advancements in high-throughput biomedical applications necessitate real-time, large field-of-view (FOV) imaging capabilities. Conventional lens-free imaging (LFI) systems, while addressing the limitations of physical lenses, have been constrained by dynamic, hard-to-model optical fields, resulting in a limited one-shot FOV of approximately 20 $mm^2$. This restriction has been a major bottleneck in applications like live-cell imaging and automation of microfluidic systems for biomedical research. Here, we present a deep-learning(DL)-based imaging framework - GenLFI - leveraging generative artificial intelligence (AI) for holographic image reconstruction. We demonstrate that GenLFI can achieve a real-time FOV over 550 $mm^2$, surpassing the current LFI system by more than 20-fold, and even larger than the world's largest confocal microscope by 1.76 times. The resolution is at the sub-pixel level of 5.52 $μm$, without the need for a shifting light source. The unsupervised learning-based reconstruction does not require optical field modeling, making imaging dynamic 3D samples (e.g., droplet-based microfluidics and 3D cell models) in complex optical fields possible. This GenLFI framework unlocks the potential of LFI systems, offering a robust tool to tackle new frontiers in high-throughput biomedical applications such as drug discovery. △ Less

Submitted 22 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07299 [pdf]

Modelling response time contrasts in superconducting nanowire single photon detectors

Authors: Souvik Haldar, Arun Sehrawat, Krishna B. Balasubramanian

Abstract: Superconducting Nanowire Single Photon Detector (SNSPD) emerges as a potential candidate in the multiple fields requiring sensitive and fast photodetection. While nanowires of low temperature superconducting detectors are mature with commercial solutions, other material options with higher transition temperature and faster responses are currently being explored. Towards this goal, we develop a gen… ▽ More Superconducting Nanowire Single Photon Detector (SNSPD) emerges as a potential candidate in the multiple fields requiring sensitive and fast photodetection. While nanowires of low temperature superconducting detectors are mature with commercial solutions, other material options with higher transition temperature and faster responses are currently being explored. Towards this goal, we develop a generalized numerical model that incorporates the thermodynamic properties of the superconducting material and identifies the minimum resolvable photon count for a given bias and device parameters. A phase diagram of detection and latching phases with the minimum number of photons as a function of biasing current and biasing temperature for each material system is presented. We show using the developed model that while low temperature superconducting (LTS) nanowires are more sensitive to the incident photon at different wavelengths, the ultimate limit of a single photon can be achieved using high temperature superconducting (HTS) material such as YBa2Cu3O7-δ, albeit at stringent biasing conditions. On the contrary, ultrafast response time with three orders of magnitude smaller response times can be achieved in select HTS materials making it an appealing for several practical applications. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.05982 [pdf]

Enhanced Auto Language Prediction with Dictionary Capsule -- A Novel Approach

Authors: Pinni Venkata Abhiram, Ananya Rathore, Abhir Mirikar, Hari Krishna S, Sheena Christabel Pravin, Vishwanath Kamath Pethri, Manjunath Lokanath Belgod, Reetika Gupta, K Muthukumaran

Abstract: The paper presents a novel Auto Language Prediction Dictionary Capsule (ALPDC) framework for language prediction and machine translation. The model uses a combination of neural networks and symbolic representations to predict the language of a given input text and then translate it to a target language using pre-built dictionaries. This research work also aims to translate the text of various lang… ▽ More The paper presents a novel Auto Language Prediction Dictionary Capsule (ALPDC) framework for language prediction and machine translation. The model uses a combination of neural networks and symbolic representations to predict the language of a given input text and then translate it to a target language using pre-built dictionaries. This research work also aims to translate the text of various languages to its literal meaning in English. The proposed model achieves state-of-the-art results on several benchmark datasets and significantly improves translation accuracy compared to existing methods. The results show the potential of the proposed method for practical use in multilingual communication and natural language processing tasks. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 21 Pages

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05527 [pdf, other]

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

Authors: Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao

Abstract: Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference. However, the growing cache demand with increasing sequence length has transformed LLM inference to be a memory bound problem, significantly constraining the system throughput. Existing methods rely on drop** unimportant tokens or quantizing all entries uniformly. Such methods… ▽ More Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference. However, the growing cache demand with increasing sequence length has transformed LLM inference to be a memory bound problem, significantly constraining the system throughput. Existing methods rely on drop** unimportant tokens or quantizing all entries uniformly. Such methods, however, often incur high approximation errors to represent the compressed matrices. The autoregressive decoding process further compounds the error of each step, resulting in critical deviation in model generation and deterioration of performance. To tackle this challenge, we propose GEAR, an efficient KV cache compression framework that achieves near-lossless high-ratio compression. GEAR first applies quantization to majority of entries of similar magnitudes to ultra-low precision. It then employs a low rank matrix to approximate the quantization error, and a sparse matrix to remedy individual errors from outlier entries. By adeptly integrating three techniques, GEAR is able to fully exploit their synergistic potentials. Our experiments demonstrate that compared to alternatives, GEAR achieves near-lossless 4-bit KV cache compression with up to 2.38x throughput improvement, while reducing peak-memory size up to 2.29x. Our code is publicly available at https://github.com/HaoKang-Timmy/GEAR. △ Less

Submitted 11 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05465 [pdf, other]

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

Authors: Akshat Ramachandran, Zishen Wan, Geonhwa Jeong, John Gustafson, Tushar Krishna

Abstract: Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and intensive quantization-aware training. In this study, we introduce Logarithmic Posits (LP), an adaptive, hardware-friendly data type inspired by posits that dynamica… ▽ More Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and intensive quantization-aware training. In this study, we introduce Logarithmic Posits (LP), an adaptive, hardware-friendly data type inspired by posits that dynamically adapts to DNN weight/activation distributions by parameterizing LP bit fields. We also develop a novel genetic-algorithm based framework, LP Quantization (LPQ), to find optimal layer-wise LP parameters while reducing representational divergence between quantized and full-precision models through a novel global-local contrastive objective. Additionally, we design a unified mixed-precision LP accelerator (LPA) architecture comprising of processing elements (PEs) incorporating LP in the computational datapath. Our algorithm-hardware co-design demonstrates on average <1% drop in top-1 accuracy across various CNN and ViT models. It also achieves ~ 2x improvements in performance per unit area and 2.2x gains in energy efficiency compared to state-of-the-art quantization accelerators using different data types. △ Less

Submitted 26 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: 2024 61st IEEE/ACM Design Automation Conference (DAC)

Showing 151–200 of 3,387 results for author: Krishna