Search | arXiv e-print repository

Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

Authors: Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan

Abstract: An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recor… ▽ More An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recorded demonstration video. Recent advances in Vision Language Models (VLMs) have shown a promising path in achieving this goal as it demonstrates capabilities in perceiving and reasoning about multimodal inputs. However, VLMs are typically trained to predict textual output and it is an open research question about how to best utilize them in navigation. To solve MINT, we present Mobility VLA, a hierarchical Vision-Language-Action (VLA) navigation policy that combines the environment understanding and common sense reasoning power of long-context VLMs and a robust low-level navigation policy based on topological graphs. The high-level policy consists of a long-context VLM that takes the demonstration tour video and the multimodal user instruction as input to find the goal frame in the tour video. Next, a low-level policy uses the goal frame and an offline constructed topological graph to generate robot actions at every timestep. We evaluated Mobility VLA in a 836m^2 real world environment and show that Mobility VLA has a high end-to-end success rates on previously unsolved multimodal instructions such as "Where should I return this?" while holding a plastic bin. A video demonstrating Mobility VLA can be found here: https://youtu.be/-Tof__Q8_5s △ Less

Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.05902 [pdf, other]

Towards Photon-Number-Encoded High-dimensional Entanglement from a Sequentially Excited Quantum Three-Level System

Authors: Daniel A. Vajner, Nils D. Kewitz, Martin von Helversen, Stephen C. Wein, Yusuf Karli, Florian Kappe, Vikas Remesh, Saimon F. Covre da Silva, Armando Rastelli, Gregor Weihs, Carlos Anton-Solanas, Tobias Heindel

Abstract: The sequential resonant excitation of a 2-level quantum system results in the emission of a state of light showing time-entanglement encoded in the photon-number-basis - notions that can be extended to 3-level quantum systems as discussed in a recent proposal. Here, we report the experimental implementation of a sequential two-photon resonant excitation process of a solid-state 3-level system, con… ▽ More The sequential resonant excitation of a 2-level quantum system results in the emission of a state of light showing time-entanglement encoded in the photon-number-basis - notions that can be extended to 3-level quantum systems as discussed in a recent proposal. Here, we report the experimental implementation of a sequential two-photon resonant excitation process of a solid-state 3-level system, constituted by the biexciton-, exciton-, and ground-state of a semiconductor quantum dot. The resulting light state exhibits entanglement in time and energy, encoded in the photon-number basis, which could be used in quantum information applications, e.g., dense information encoding or quantum communication protocols. Performing energy- and time-resolved correlation experiments in combination with extensive theoretical modelling, we are able to partially retrieve the entanglement structure of the generated state. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 14 pages (including 5 figures, 56 citations)

arXiv:2407.03906 [pdf]

Color-map recommendation for MR relaxometry maps

Authors: Miha Fuderer, Barbara Wichtmann, Fabio Crameri, Nandita M. deSouza, Bettina Baeßler, Vikas Gulani, Meiyun Wang, Dirk Poot, Ruud de Boer, Matt Cashmore, Wolter de Graaf, Kathryn E. Keenan, Dan Ma, Carolin Pirkl, Nico Sollmann, Sebastian Weingärtner, Stefano Mandija, Xavier Golay

Abstract: Purpose: To harmonize the use of color for MR relaxometry maps and therefore recommend the use of specific color-maps for representing T1 and T2 maps. Methods: Perceptually linearized color-maps were chosen to have similar color settings as those proposed by Griswold et al. in 2018. A Delphi process, polling the opinion of a panel of 81 experts, was used to generate consensus on the suitability of… ▽ More Purpose: To harmonize the use of color for MR relaxometry maps and therefore recommend the use of specific color-maps for representing T1 and T2 maps. Methods: Perceptually linearized color-maps were chosen to have similar color settings as those proposed by Griswold et al. in 2018. A Delphi process, polling the opinion of a panel of 81 experts, was used to generate consensus on the suitability of these maps. Results: Consensus was reached on the suitability of the logarithm-processed Lipari color-map for T1 and the logarithm-processed Navia color-map for T2. There was consensus on color bars being mandatory and on the use of a specific value indicating invalidity. There was no consensus on whether the ranges should be fixed per anatomy. Conclusion: The authors recommend the use of the logarithm-processed Lipari color map for displaying quantitative T1 maps and R1 maps; likewise, the authors recommend the logarithm-processed Navia color-map for displaying T2, T2*, R2 and R2* maps. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 22 pages; embedded are 5 figures and 5 tables; contact the first author for supplementary material. Submitted to Magnetic Resonance in Medicine

arXiv:2407.03648 [pdf, other]

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching

Authors: Gael Le Lan, Bowen Shi, Zhaoheng Ni, Sidd Srinivasan, Anurag Kumar, Brian Ellis, David Kant, Varun Nagaraja, Ernie Chang, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra

Abstract: We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates the information loss drawback of discrete representations. Based on a diffusion transformer architecture trained on a flow-matching objective the model… ▽ More We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates the information loss drawback of discrete representations. Based on a diffusion transformer architecture trained on a flow-matching objective the model can generate and edit diverse high quality stereo samples of variable duration, with simple text descriptions. We also explore a new regularized latent inversion method for zero-shot test-time text-guided editing and demonstrate its superior performance over naive denoising diffusion implicit model (DDIM) inversion for variety of music editing prompts. Evaluations are conducted on both objective and subjective metrics and demonstrate that the proposed model is not only competitive to the evaluated baselines on a standard text-to-music benchmark - quality and efficiency-wise - but also outperforms previous state of the art for music editing when combined with our proposed latent inversion. Samples are available at https://melodyflow.github.io. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02921 [pdf, other]

In-Memory Mirroring: Cloning Without Reading

Authors: Simranjeet Singh, Ankit Bende, Chandan Kumar Jha, Vikas Rana, Rolf Drechsler, Sachin Patkar, Farhad Merchant

Abstract: In-memory computing (IMC) has gained significant attention recently as it attempts to reduce the impact of memory bottlenecks. Numerous schemes for digital IMC are presented in the literature, focusing on logic operations. Often, an application's description has data dependencies that must be resolved. Contemporary IMC architectures perform read followed by write operations for this purpose, which… ▽ More In-memory computing (IMC) has gained significant attention recently as it attempts to reduce the impact of memory bottlenecks. Numerous schemes for digital IMC are presented in the literature, focusing on logic operations. Often, an application's description has data dependencies that must be resolved. Contemporary IMC architectures perform read followed by write operations for this purpose, which results in performance and energy penalties. To solve this fundamental problem, this paper presents in-memory mirroring (IMM). IMM eliminates the need for read and write-back steps, thus avoiding energy and performance penalties. Instead, we perform data movement within memory, involving row-wise and column-wise data transfers. Additionally, the IMM scheme enables parallel cloning of entire row (word) with a complexity of $\mathcal{O}(1)$. Moreover, our analysis of the energy consumption of the proposed technique using resistive random-access memory crossbar and experimentally validated JART VCM v1b model. The IMM increases energy efficiency and shows 2$\times$ performance improvement compared to conventional data movement methods. △ Less

Submitted 4 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted in IFIP/IEEE VLSI-SoC 2024

arXiv:2407.01865 [pdf, other]

Geometric Static Modeling Framework for Piecewise-Continuous Curved-Link Multi Point-of-Contact Tensegrity Robots

Authors: Lauren Ervin, Vishesh Vikas

Abstract: Tensegrities synergistically combine tensile (cable) and rigid (link) elements to achieve structural integrity, making them lightweight, packable, and impact resistant. Consequently, they have high potential for locomotion in unstructured environments. This research presents geometric modeling of a Tensegrity eXploratory Robot (TeXploR) comprised of two semi-circular, curved links held together by… ▽ More Tensegrities synergistically combine tensile (cable) and rigid (link) elements to achieve structural integrity, making them lightweight, packable, and impact resistant. Consequently, they have high potential for locomotion in unstructured environments. This research presents geometric modeling of a Tensegrity eXploratory Robot (TeXploR) comprised of two semi-circular, curved links held together by 12 prestressed cables and actuated with an internal mass shifting along each link. This design allows for efficient rolling with stability (e.g., tip-over on an incline). However, the unique design poses static and dynamic modeling challenges given the discontinuous nature of the semi-circular, curved links, two changing points of contact with the surface plane, and instantaneous movement of the masses along the links. The robot is modeled using a geometric approach where the holonomic constraints confirm the experimentally observed four-state hybrid system, proving TeXploR rolls along one link while pivoting about the end of the other. It also identifies the quasi-static state transition boundaries that enable a continuous change in the robot states via internal mass shifting. This is the first time in literature a non-spherical two-point contact system is kinematically and geometrically modeled. Furthermore, the static solutions are closed-form and do not require numerical exploration of the solution. The MATLAB simulations are experimentally validated on a tetherless prototype with mean absolute error of 4.36°. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: This work has been submitted to the IEEE RA-L for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2407.01597 [pdf]

Machine learning driven high-resolution Raman spectral generation for accurate molecular feature recognition

Authors: Vikas Yadav, Abhay Kumar Tiwari, Soumik Siddhanta

Abstract: Through the probing of light-matter interactions, Raman spectroscopy provides invaluable insights into the composition, structure, and dynamics of materials, and obtaining such data from portable and cheap instruments is of immense practical relevance. Here, we propose the integration of a Generative Adversarial Network (GAN) with low-resolution Raman spectroscopy with a portable hand-held spectro… ▽ More Through the probing of light-matter interactions, Raman spectroscopy provides invaluable insights into the composition, structure, and dynamics of materials, and obtaining such data from portable and cheap instruments is of immense practical relevance. Here, we propose the integration of a Generative Adversarial Network (GAN) with low-resolution Raman spectroscopy with a portable hand-held spectrometer to facilitate concurrent spectral analysis and compound classification. Portable spectrometers generally have a lower resolution, and the Raman signal is usually buried under the background noise. The GAN-based model could not only generate high-resolution data but also reduced the spectral noise significantly. The generated data was used further to train an Artificial Neural Network (ANN)-based model for the classification of organic and pharmaceutical drug molecules. The high-resolution generated Raman data was subsequently used for spectral barcoding for identification of the pharmaceutical drugs. GAN also demonstrated enhanced robustness in extracting weak signals compared to conventional noise removal methods. This integrated system holds the potential for achieving accurate and real-time monitoring of noisy inputs to obtain high throughput output, thereby opening new avenues for applications in different domains. This synergy between spectroscopy and machine learning (ML) facilitates improved data processing, noise reduction, and feature extraction and opens avenues for predictive modeling and automated decision-making using cost-effective portable devices. △ Less

Submitted 25 June, 2024; originally announced July 2024.

Comments: 37 Pages

arXiv:2407.00616 [pdf, other]

DADEE: Well-calibrated uncertainty quantification in neural networks for barriers-based robot safety

Authors: Masoud Ataei, Vikas Dhiman

Abstract: Uncertainty-aware controllers that guarantee safety are critical for safety critical applications. Among such controllers, Control Barrier Functions (CBFs) based approaches are popular because they are fast, yet safe. However, most such works depend on Gaussian Processes (GPs) or MC-Dropout for learning and uncertainty estimation, and both approaches come with drawbacks: GPs are non-parametric met… ▽ More Uncertainty-aware controllers that guarantee safety are critical for safety critical applications. Among such controllers, Control Barrier Functions (CBFs) based approaches are popular because they are fast, yet safe. However, most such works depend on Gaussian Processes (GPs) or MC-Dropout for learning and uncertainty estimation, and both approaches come with drawbacks: GPs are non-parametric methods that are slow, while MC-Dropout does not capture aleatoric uncertainty. On the other hand, modern Bayesian learning algorithms have shown promise in uncertainty quantification. The application of modern Bayesian learning methods to CBF-based controllers has not yet been studied. We aim to fill this gap by surveying uncertainty quantification algorithms and evaluating them on CBF-based safe controllers. We find that model variance-based algorithms (for example, Deep ensembles, MC-dropout, etc.) and direct estimation-based algorithms (such as DEUP) have complementary strengths. Algorithms in the former category can only estimate uncertainty accurately out-of-domain, while those in the latter category can only do so in-domain. We combine the two approaches to obtain more accurate uncertainty estimates both in- and out-of-domain. As measured by the failure rate of a simulated robot, this results in a safer CBF-based robot controller. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.19800 [pdf, other]

Modeling the Real World with High-Density Visual Particle Dynamics

Authors: William F. Whitney, Jacob Varley, Deepali Jain, Krzysztof Choromanski, Sumeet Singh, Vikas Sindhwani

Abstract: We present High-Density Visual Particle Dynamics (HD-VPD), a learned world model that can emulate the physical dynamics of real scenes by processing massive latent point clouds containing 100K+ particles. To enable efficiency at this scale, we introduce a novel family of Point Cloud Transformers (PCTs) called Interlacers leveraging intertwined linear-attention Performer layers and graph-based neig… ▽ More We present High-Density Visual Particle Dynamics (HD-VPD), a learned world model that can emulate the physical dynamics of real scenes by processing massive latent point clouds containing 100K+ particles. To enable efficiency at this scale, we introduce a novel family of Point Cloud Transformers (PCTs) called Interlacers leveraging intertwined linear-attention Performer layers and graph-based neighbour attention layers. We demonstrate the capabilities of HD-VPD by modeling the dynamics of high degree-of-freedom bi-manual robots with two RGB-D cameras. Compared to the previous graph neural network approach, our Interlacer dynamics is twice as fast with the same prediction quality, and can achieve higher quality using 4x as many particles. We illustrate how HD-VPD can evaluate motion plan quality with robotic box pushing and can gras** tasks. See videos and particle dynamics rendered by HD-VPD at https://sites.google.com/view/hd-vpd. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.17990 [pdf, other]

Explicit Diversity Conditions for Effective Question Answer Generation with Large Language Models

Authors: Vikas Yadav, Hyuk Joon Kwon, Vijay Srinivasan, Hongxia **

Abstract: Question Answer Generation (QAG) is an effective data augmentation technique to improve the accuracy of question answering systems, especially in low-resource domains. While recent pretrained and large language model-based QAG methods have made substantial progress, they face the critical issue of redundant QA pair generation, affecting downstream QA systems. Implicit diversity techniques such as… ▽ More Question Answer Generation (QAG) is an effective data augmentation technique to improve the accuracy of question answering systems, especially in low-resource domains. While recent pretrained and large language model-based QAG methods have made substantial progress, they face the critical issue of redundant QA pair generation, affecting downstream QA systems. Implicit diversity techniques such as sampling and diverse beam search are proven effective solutions but often yield smaller diversity. We present explicit diversity conditions for QAG, focusing on spatial aspects, question types, and entities, substantially increasing diversity in QA generation. Our work emphasizes the need of explicit diversity conditions for generating diverse question-answer synthetic data by showing significant improvements in downstream QA task over existing widely adopted implicit diversity techniques. In particular, generated QA pairs from explicit diversity conditions when used to train the downstream QA model results in an average 4.1% exact match and 4.5% F1 improvement over QAG from implicit sampling techniques on SQuADDU. Our work emphasizes the need for explicit diversity conditions even more in low-resource datasets (SubjQA), where average downstream QA performance improvements are around 12% EM. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Published at COLING 2024

arXiv:2406.17740 [pdf, other]

Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning

Authors: Arijit Sehanobish, Avinava Dubey, Krzysztof Choromanski, Somnath Basu Roy Chowdhury, Deepali Jain, Vikas Sindhwani, Snigdha Chaturvedi

Abstract: Recent efforts to scale Transformer models have demonstrated rapid progress across a wide range of tasks (Wei et al., 2022). However, fine-tuning these models for downstream tasks is expensive due to their large parameter counts. Parameter-efficient fine-tuning (PEFT) approaches have emerged as a viable alternative by allowing us to fine-tune models by updating only a small number of parameters. I… ▽ More Recent efforts to scale Transformer models have demonstrated rapid progress across a wide range of tasks (Wei et al., 2022). However, fine-tuning these models for downstream tasks is expensive due to their large parameter counts. Parameter-efficient fine-tuning (PEFT) approaches have emerged as a viable alternative by allowing us to fine-tune models by updating only a small number of parameters. In this work, we propose a general framework for parameter efficient fine-tuning (PEFT), based on structured unrestricted-rank matrices (SURM) which can serve as a drop-in replacement for popular approaches such as Adapters and LoRA. Unlike other methods like LoRA, SURMs provides more flexibility in finding the right balance between compactness and expressiveness. This is achieved by using low displacement rank matrices (LDRMs), which hasn't been used in this context before. SURMs remain competitive with baselines, often providing significant quality improvements while using a smaller parameter budget. SURMs achieve 5-7% accuracy gains on various image classification tasks while replacing low-rank matrices in LoRA. It also results in up to 12x reduction of the number of parameters in adapters (with virtually no loss in quality) on the GLUE benchmark. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Work in progress

arXiv:2406.17415 [pdf, other]

Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels

Authors: Razvan-Gabriel Dumitru, Vikas Yadav, Rishabh Maheshwary, Paul-Ioan Clotan, Sathwik Tejaswi Madhusudhan, Mihai Surdeanu

Abstract: We present a simple variable quantization approach that quantizes different layers of a large language model (LLM) at different bit levels. Specifically, we quantize the most important layers to higher bit precision and less important layers to lower bits to achieve floating point quantization levels. We propose two effective strategies to measure the importance of layers within LLMs: the first me… ▽ More We present a simple variable quantization approach that quantizes different layers of a large language model (LLM) at different bit levels. Specifically, we quantize the most important layers to higher bit precision and less important layers to lower bits to achieve floating point quantization levels. We propose two effective strategies to measure the importance of layers within LLMs: the first measures the importance of a layer based on how different its output embeddings are from the input embeddings (the higher the better); the second estimates the importance of a layer using the number of layer weights that are much larger than average (the smaller the better). We show that quantizing different layers at varying bits according to our importance scores results in minimal performance drop with a far more compressed model size. Finally, we present several practical key takeaways from our variable layer-wise quantization experiments: (a) LLM performance under variable quantization remains close to the original model until 25-50% of layers are moved in lower quantization using our proposed ordering but only until 5-10% if moved using no specific ordering; (b) Quantizing LLMs to lower bits performs substantially better than pruning unless extreme quantization (2-bit) is used; and (c) Layer-wise quantization to lower bits works better in the case of larger LLMs with more layers compared to smaller LLMs with fewer layers. The code used to run the experiments is available at: https://github.com/RazvanDu/LayerwiseQuant. △ Less

Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: submitted to EMNLP, 15 pages, 10 figures, 4 tables

ACM Class: I.2.7; I.2.0

arXiv:2406.17163 [pdf, other]

Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors

Authors: Vikas Yadav, Zheng Tang, Vijay Srinivasan

Abstract: Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address thes… ▽ More Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address these critical issues, we introduce Paraphrase and AGgregate (PAG)-LLM approach wherein an LLM generates multiple paraphrases of the input query (parallel queries), performs multi-class classification for the original query and each paraphrase, and at the end aggregate all the classification labels based on their confidence scores. We evaluate PAG-LLM on two large multi-class classication datasets: CLINC, and Banking and show 22.7% and 15.1% error reduction. We show that PAG-LLM is especially effective for hard examples where LLM is uncertain, and reduces the critical misclassification and hallucinated label generation errors △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted at SIGIR 2024

arXiv:2406.16783 [pdf, other]

M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models

Authors: Rishabh Maheshwary, Vikas Yadav, Hoang Nguyen, Khyati Mahajan, Sathwik Tejaswi Madhusudhan

Abstract: Instruction finetuning (IFT) is critical for aligning Large Language Models (LLMs) to follow instructions. While many effective IFT datasets have been introduced recently, they predominantly focus on high-resource languages like English. To better align LLMs across a broad spectrum of languages and tasks, we propose a fully synthetic, novel taxonomy (Evol) guided Multilingual, Multi-turn instructi… ▽ More Instruction finetuning (IFT) is critical for aligning Large Language Models (LLMs) to follow instructions. While many effective IFT datasets have been introduced recently, they predominantly focus on high-resource languages like English. To better align LLMs across a broad spectrum of languages and tasks, we propose a fully synthetic, novel taxonomy (Evol) guided Multilingual, Multi-turn instruction finetuning dataset, called M2Lingual. It is constructed by first selecting a diverse set of seed examples and then utilizing the proposed Evol taxonomy to convert these seeds into complex and challenging multi-turn instructions. We demonstrate the effectiveness of M2Lingual by training LLMs of varying sizes and showcasing the enhanced performance across a diverse set of languages. We contribute the 2 step Evol taxonomy with the guided generation code: https://github.com/ServiceNow/M2Lingual, as well as the first fully synthetic, general and task-oriented, multi-turn, multilingual dataset built with Evol - M2Lingual: https://huggingface.co/datasets/ServiceNow-AI/ M2Lingual - containing 182K total IFT pairs, covering 70 languages and 17+ NLP tasks. △ Less

Submitted 28 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: 39 pages

arXiv:2406.15475 [pdf]

Green pathway of Urea Synthesis through Plasma-Ice Interaction Optimization and Mechanistic Insights with N2 + CO2 and NH3 + CO2 Gas Mixtures

Authors: Vikas Rathore, Vyom Desai, Nirav I. Jamnapara, Sudhir Kumar Nema

Abstract: This study explores a green pathway for urea synthesis using plasma-ice interaction with gas mixtures of N2 + CO2 and NH3 + CO2. Electrical and optical emission spectroscopy were employed to characterize the plasmas, revealing that urea formation involves complex reactions driven by high-energy species, producing reactive nitrogen and carbon intermediates that further react to form urea. Physicoch… ▽ More This study explores a green pathway for urea synthesis using plasma-ice interaction with gas mixtures of N2 + CO2 and NH3 + CO2. Electrical and optical emission spectroscopy were employed to characterize the plasmas, revealing that urea formation involves complex reactions driven by high-energy species, producing reactive nitrogen and carbon intermediates that further react to form urea. Physicochemical analyses of plasma-treated ice showed increased pH, electrical conductivity (EC), and reduced oxidation-reduction potential (ORP). Optimization of plasma process parameters (gas pressure, applied voltage, and treatment time) was performed to enhance urea formation. Among these parameters, plasma treatment time had the most substantial influence. Increasing treatment time from 20 to 60 minutes significantly impacted physicochemical properties: for N2 + CO2 plasma, pH increased by 21.05%, EC by 184.7%, and ORP decreased by 27.48%; for NH3 + CO2 plasma, pH increased by 27.37%, EC by 239.05%, and ORP decreased by 72.67%, respectively. The study shows that NH3 + CO2 plasma produces a significantly higher concentration of urea (7.7 mg L-1) compared to N2 + CO2 plasma (0.55 mg L-1). This is attributed to the direct availability and reactivity of ammonia, which simplifies reaction pathways and enhances intermediate formation. These findings highlight the potential of plasma-ice interaction as an energy-efficient and environmentally friendly method for urea synthesis, offering a sustainable alternative to conventional processes. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10824 [pdf, other]

Citation-Based Summarization of Landmark Judgments

Authors: Purnima Bindal, Vikas Kumar, Vasudha Bhatnagar, Parikshet Sirohi, Ashwini Siwal

Abstract: Landmark judgments are of prime importance in the Common Law System because of their exceptional jurisprudence and frequent references in other judgments. In this work, we leverage contextual references available in citing judgments to create an extractive summary of the target judgment. We evaluate the proposed algorithm on two datasets curated from the judgments of Indian Courts and find the res… ▽ More Landmark judgments are of prime importance in the Common Law System because of their exceptional jurisprudence and frequent references in other judgments. In this work, we leverage contextual references available in citing judgments to create an extractive summary of the target judgment. We evaluate the proposed algorithm on two datasets curated from the judgments of Indian Courts and find the results promising. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Accepted for publication at ICON 2023

arXiv:2406.09722 [pdf, other]

Cross-view geo-localization: a survey

Authors: Abhilash Durgam, Sidike Paheding, Vikas Dhiman, Vijay Devabhaktuni

Abstract: Cross-view geo-localization has garnered notable attention in the realm of computer vision, spurred by the widespread availability of copious geotagged datasets and the advancements in machine learning techniques. This paper provides a thorough survey of cutting-edge methodologies, techniques, and associated challenges that are integral to this domain, with a focus on feature-based and deep learni… ▽ More Cross-view geo-localization has garnered notable attention in the realm of computer vision, spurred by the widespread availability of copious geotagged datasets and the advancements in machine learning techniques. This paper provides a thorough survey of cutting-edge methodologies, techniques, and associated challenges that are integral to this domain, with a focus on feature-based and deep learning strategies. Feature-based methods capitalize on unique features to establish correspondences across disparate viewpoints, whereas deep learning-based methodologies deploy convolutional neural networks to embed view-invariant attributes. This work also delineates the multifaceted challenges encountered in cross-view geo-localization, such as variations in viewpoints and illumination, the occurrence of occlusions, and it elucidates innovative solutions that have been formulated to tackle these issues. Furthermore, we delineate benchmark datasets and relevant evaluation metrics, and also perform a comparative analysis of state-of-the-art techniques. Finally, we conclude the paper with a discussion on prospective avenues for future research and the burgeoning applications of cross-view geo-localization in an intricately interconnected global landscape. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.07097 [pdf, other]

High-purity and stable single-photon emission in bilayer WSe$_2$ via phonon-assisted excitation

Authors: Claudia Piccinini, Athanasios Paralikis, José Ferreira Neto, Abdulmalik Abdulkadir Madigawa, Paweł Wyborski, Vikas Remesh, Luca Vannucci, Niels Gregersen, Battulga Munkhbat

Abstract: The excitation scheme is essential for single-photon sources as it prepares the exciton state, defines the decay dynamics, and influences the spectral diffusion of the emitted single photons. Here, we investigate the impact of different optical excitation strategies on the single-photon emission characteristics of bilayer WSe$_2$ quantum emitters. Under phonon-assisted excitation, we achieve narro… ▽ More The excitation scheme is essential for single-photon sources as it prepares the exciton state, defines the decay dynamics, and influences the spectral diffusion of the emitted single photons. Here, we investigate the impact of different optical excitation strategies on the single-photon emission characteristics of bilayer WSe$_2$ quantum emitters. Under phonon-assisted excitation, we achieve narrow and stable single-photon emission with an excellent purity reaching $ 0.94\pm 0.02\,$. Furthermore, the decay time is reduced by more than an order of magnitude from $(16.65 \pm 2.39)\,$ns for above-band excitation to $(1.33 \pm 0.04)\,$ns for phonon-assisted excitation. Finally, we observe a suppressed spectral wandering along with a two-fold reduction of the spectral linewidth. Our comprehensive investigation highlights the critical role of the excitation method in optimizing the performance of WSe$_2$-based quantum emitters. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.03164 [pdf, other]

Topological Neural Networks go Persistent, Equivariant, and Continuous

Authors: Yogesh Verma, Amauri H Souza, Vikas Garg

Abstract: Topological Neural Networks (TNNs) incorporate higher-order relational information beyond pairwise interactions, enabling richer representations than Graph Neural Networks (GNNs). Concurrently, topological descriptors based on persistent homology (PH) are being increasingly employed to augment the GNNs. We investigate the benefits of integrating these two paradigms. Specifically, we introduce TopN… ▽ More Topological Neural Networks (TNNs) incorporate higher-order relational information beyond pairwise interactions, enabling richer representations than Graph Neural Networks (GNNs). Concurrently, topological descriptors based on persistent homology (PH) are being increasingly employed to augment the GNNs. We investigate the benefits of integrating these two paradigms. Specifically, we introduce TopNets as a broad framework that subsumes and unifies various methods in the intersection of GNNs/TNNs and PH such as (generalizations of) RePHINE and TOGL. TopNets can also be readily adapted to handle (symmetries in) geometric complexes, extending the scope of TNNs and PH to spatial settings. Theoretically, we show that PH descriptors can provably enhance the expressivity of simplicial message-passing networks. Empirically, (continuous and E(n)-equivariant extensions of) TopNets achieve strong performance across diverse tasks, including antibody design, molecular dynamics simulation, and drug property prediction. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted to ICML 2024

arXiv:2405.20433 [pdf, other]

Efficient Industrial Refrigeration Scheduling with Peak Pricing

Authors: Rohit Konda, Jordan Prescott, Vikas Chandan, Jesse Crossno, Blake Pollard, Dan Walsh, Rick Bohonek, Jason R. Marden

Abstract: The widespread use of industrial refrigeration systems across various sectors contribute significantly to global energy consumption, highlighting substantial opportunities for energy conservation through intelligent control design. As such, this work focuses on control algorithm design in industrial refrigeration that minimize operational costs and provide efficient heat extraction. By adopting to… ▽ More The widespread use of industrial refrigeration systems across various sectors contribute significantly to global energy consumption, highlighting substantial opportunities for energy conservation through intelligent control design. As such, this work focuses on control algorithm design in industrial refrigeration that minimize operational costs and provide efficient heat extraction. By adopting tools from inventory control, we characterize the structure of these optimal control policies, exploring the impact of different energy cost-rate structures such as time-of-use (TOU) pricing and peak pricing. While classical threshold policies are optimal under TOU costs, introducing peak pricing challenges their optimality, emphasizing the need for carefully designed control strategies in the presence of significant peak costs. We provide theoretical findings and simulation studies on this phenomenon, offering insights for more efficient industrial refrigeration management. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17656 [pdf, other]

Alignment is Key for Applying Diffusion Models to Retrosynthesis

Authors: Najwa Laabid, Severi Rissanen, Markus Heinonen, Arno Solin, Vikas Garg

Abstract: Retrosynthesis, the task of identifying precursors for a given molecule, can be naturally framed as a conditional graph generation task. Diffusion models are a particularly promising modelling approach, enabling post-hoc conditioning and trading off quality for speed during generation. We show mathematically that permutation equivariant denoisers severely limit the expressiveness of graph diffusio… ▽ More Retrosynthesis, the task of identifying precursors for a given molecule, can be naturally framed as a conditional graph generation task. Diffusion models are a particularly promising modelling approach, enabling post-hoc conditioning and trading off quality for speed during generation. We show mathematically that permutation equivariant denoisers severely limit the expressiveness of graph diffusion models and thus their adaptation to retrosynthesis. To address this limitation, we relax the equivariance requirement such that it only applies to aligned permutations of the conditioning and the generated graphs obtained through atom map**. Our new denoiser achieves the highest top-$1$ accuracy ($54.7$\%) across template-free and template-based methods on USPTO-50k. We also demonstrate the ability for flexible post-training conditioning and good sample quality with small diffusion step counts, highlighting the potential for interactive applications and additional controls for multi-step planning. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 9 figures

arXiv:2405.17448 [pdf]

Green Synthesis of Ammonium Nitrate (NH$_4$NO$_3$) Fertilizer Production: via Plasma Water Ice Interaction with Air and NH$_3$ Plasma

Authors: Vikas Rathore, Vyom Desai, Nirav I Jamnapara, Sudhir Kumar Nema

Abstract: This study presents a novel and ecofriendly method for synthesizing ammonium nitrate using activated prepared through air and ammonia plasma treatments. Initially, PAW containing nitrate ions is produced by treating water with air plasma. This PAW air is then frozen and exposed to low pressure NH$_3$ plasma, introducing ammonium ions to from NH$_4$NO$_3$. We systematically investigate the voltage… ▽ More This study presents a novel and ecofriendly method for synthesizing ammonium nitrate using activated prepared through air and ammonia plasma treatments. Initially, PAW containing nitrate ions is produced by treating water with air plasma. This PAW air is then frozen and exposed to low pressure NH$_3$ plasma, introducing ammonium ions to from NH$_4$NO$_3$. We systematically investigate the voltage current characteristics of the air and NH$_3$ plasma, analyze the generated species and radicals to understand the mechanism of NH$_4$NO$_3$ formation, and evaluate the effects of process parameters such as NH$_3$ gas pressure, applied voltage, and treatment time on the properties of PAW. Our results indicate that all examined process parameters positively influence the properties of PAW. Among these parameters, the duration of NH$_3$ plasma treatment of PAW ice exerts the most significant effect. Specifically, the concentration of NH4 ions increased by 134.2 percent when the NH$_3$ treatment time was extended from 0.5 h to 1 h, compared to 12.7 and 33.3 percent increases for NH$_3$ pressure, ranging from 0.25 to 0.55 mbar, and applied voltage, ranging from 500 to 700 V, respectively. Similarly, variations in pH, oxidation reduction potential, and electrical conductivity were substantially higher with increased treatment time than with changes in gas pressure and applied voltage. The PAW exhibited a neutral to slightly basic pH, making it ideal for soil applications, thereby addressing the existing issue of the high acidity of PAW and its use in agriculture. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.17247 [pdf, other]

An Introduction to Vision-Language Modeling

Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technology. However, there are many challenges that need to be addressed to improve the reliability of those models. While language is discrete, vision evolves in a much higher dimensional space in which concepts cannot always be easily discretized. To better understand the mechanics behind map** vision to language, we present this introduction to VLMs which we hope will help anyone who would like to enter the field. First, we introduce what VLMs are, how they work, and how to train them. Then, we present and discuss approaches to evaluate VLMs. Although this work primarily focuses on map** images to language, we also discuss extending VLMs to videos. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16406 [pdf, other]

SpinQuant: LLM quantization with learned rotations

Authors: Zechun Liu, Changsheng Zhao, Igor Fedorov, Bilge Soran, Dhruv Choudhary, Raghuraman Krishnamoorthi, Vikas Chandra, Yuandong Tian, Tijmen Blankevoort

Abstract: Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Recent findings suggest that rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a… ▽ More Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Recent findings suggest that rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a collection of applicable rotation parameterizations that lead to identical outputs in full-precision Transformer architectures, and find that some random rotations lead to much better quantization than others, with an up to 13 points difference in downstream zero-shot reasoning performance. As a result, we propose SpinQuant that optimizes (or learns) the rotation matrices with Cayley optimization on a small validation set. With 4-bit quantization of weight, activation, and KV-cache, SpinQuant narrows the accuracy gap on zero-shot reasoning tasks with full precision to merely 2.9 points on the LLaMA-2 7B model, surpassing LLM-QAT by 19.1 points and SmoothQuant by 25.0 points. SpinQuant also outperforms concurrent work QuaRot, which applies random rotations to remove outliers. In particular, for LLaMA-2 7B/LLaMA-3 8B models that are hard to quantize, SpinQuant reduces the gap to full precision by 30.2%/34.1% relative to QuaRot. △ Less

Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15877 [pdf, other]

Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications

Authors: Yang Li, Changsheng Zhao, Hyungtak Lee, Ernie Chang, Yangyang Shi, Vikas Chandra

Abstract: Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as personal computers and mobile/wearable devices, and results in substantial inference costs in resource-rich environments like cloud servers. To extend the use of L… ▽ More Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as personal computers and mobile/wearable devices, and results in substantial inference costs in resource-rich environments like cloud servers. To extend the use of LLMs, we introduce a low-rank decomposition approach to effectively compress these models, tailored to the requirements of specific applications. We observe that LLMs pretrained on general datasets contain many redundant components not needed for particular applications. Our method focuses on identifying and removing these redundant parts, retaining only the necessary elements for the target applications. Specifically, we represent the weight matrices of LLMs as a linear combination of base components. We then prune the irrelevant bases and enhance the model with new bases beneficial for specific applications. Deep compression results on the Llama 2-7b and -13B models, conducted on target applications including mathematical reasoning and code generation, show that our method significantly reduces model size while maintaining comparable accuracy to state-of-the-art low-rank compression techniques. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15217 [pdf, other]

NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation

Authors: Vikas Thamizharasan, Difan Liu, Matthew Fisher, Nanxuan Zhao, Evangelos Kalogerakis, Michal Lukac

Abstract: The success of denoising diffusion models in representing rich data distributions over 2D raster images has prompted research on extending them to other data representations, such as vector graphics. Unfortunately due to their variable structure and scarcity of vector training data, directly applying diffusion models on this domain remains a challenging problem. Using workarounds like optimization… ▽ More The success of denoising diffusion models in representing rich data distributions over 2D raster images has prompted research on extending them to other data representations, such as vector graphics. Unfortunately due to their variable structure and scarcity of vector training data, directly applying diffusion models on this domain remains a challenging problem. Using workarounds like optimization via Score Distillation Sampling (SDS) is also fraught with difficulty, as vector representations are non trivial to directly optimize and tend to result in implausible geometries such as redundant or self-intersecting shapes. NIVeL addresses these challenges by reinterpreting the problem on an alternative, intermediate domain which preserves the desirable properties of vector graphics -- mainly sparsity of representation and resolution-independence. This alternative domain is based on neural implicit fields expressed in a set of decomposable, editable layers. Based on our experiments, NIVeL produces text-to-vector graphics results of significantly better quality than the state-of-the-art. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14657 [pdf, other]

Heteroscedastic Preferential Bayesian Optimization with Informative Noise Distributions

Authors: Marshal Arijona Sinaga, Julien Martinelli, Vikas Garg, Samuel Kaski

Abstract: Preferential Bayesian optimization (PBO) is a sample-efficient framework for learning human preferences between candidate designs. PBO classically relies on homoscedastic noise models to represent human aleatoric uncertainty. Yet, such noise fails to accurately capture the varying levels of human aleatoric uncertainty, particularly when the user possesses partial knowledge among different pairs of… ▽ More Preferential Bayesian optimization (PBO) is a sample-efficient framework for learning human preferences between candidate designs. PBO classically relies on homoscedastic noise models to represent human aleatoric uncertainty. Yet, such noise fails to accurately capture the varying levels of human aleatoric uncertainty, particularly when the user possesses partial knowledge among different pairs of candidates. For instance, a chemist with solid expertise in glucose-related molecules may easily compare two compounds from that family while struggling to compare alcohol-related molecules. Currently, PBO overlooks this uncertainty during the search for a new candidate through the maximization of the acquisition function, consequently underestimating the risk associated with human uncertainty. To address this issue, we propose a heteroscedastic noise model to capture human aleatoric uncertainty. This model adaptively assigns noise levels based on the distance of a specific input to a predefined set of reliable inputs known as anchors provided by the human. Anchors encapsulate partial knowledge and offer insight into the comparative difficulty of evaluating different candidate pairs. Such a model can be seamlessly integrated into the acquisition function, thus leading to candidate design pairs that elegantly trade informativeness and ease of comparison for the human expert. We perform an extensive empirical evaluation of the proposed approach, demonstrating a consistent improvement over homoscedastic PBO. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.10434 [pdf, other]

Circuit-based leakage-to-erasure conversion in a neutral atom quantum processor

Authors: Matthew N. H. Chow, Vikas Buchemmavari, Sivaprasad Omanakuttan, Bethany J. Little, Saurabh Pandey, Ivan H. Deutsch, Yuan-Yu Jau

Abstract: Leakage out of the computational subspace is a major limitation of current state-of-the-art neutral-atom quantum computers and a significant challenge for scalable systems. In a quantum processor with cesium atoms, we demonstrate proof-of-principle circuit-based conversion of leakage errors to erasure errors via Leakage Detection Units (LDUs), which non-destructively map information about the pres… ▽ More Leakage out of the computational subspace is a major limitation of current state-of-the-art neutral-atom quantum computers and a significant challenge for scalable systems. In a quantum processor with cesium atoms, we demonstrate proof-of-principle circuit-based conversion of leakage errors to erasure errors via Leakage Detection Units (LDUs), which non-destructively map information about the presence or absence of the qubit onto the state of an ancilla. With a standard LDU circuit, we successfully convert leakage errors to erasure errors for all major leakage pathways while preserving the quantum information in the case that no leakage occurred. We benchmark the performance of the LDU using a three-outcome low-loss state detection method and also explore the advantages of three-outcome measurements for LDUs. We find that the LDU detects atom-loss errors with ~93.4% accuracy, limited by technical imperfections of our apparatus. We further compile and execute a SWAP LDU, wherein the roles of the original data atom and ancilla atom are exchanged under the action of the LDU, providing 'free refilling' of atoms in the case of leakage errors. This circuit-based leakage-to-erasure error conversion is a critical component of a neutral-atom quantum processor where the quantum information may significantly outlive the lifetime of any individual atom in the quantum register. △ Less

Submitted 9 July, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.05837 [pdf, other]

Role of coupled electrochemistry and stress on the Li-anode instability: A continuum approach

Authors: Shabnam Konica, Brian W. Sheldon, Vikas Srivastava

Abstract: We present a coupled mechanistic approach that elucidates the intricate interplay between stress and electrochemistry, enabling the prediction of the onset of instabilities in Li-metal anodes and the solid electrolyte interphase (SEI) in liquid-electrolyte Li-metal batteries. Our continuum theory considers a two-way coupling between stress and electrochemistry, includes Li and electron transport t… ▽ More We present a coupled mechanistic approach that elucidates the intricate interplay between stress and electrochemistry, enabling the prediction of the onset of instabilities in Li-metal anodes and the solid electrolyte interphase (SEI) in liquid-electrolyte Li-metal batteries. Our continuum theory considers a two-way coupling between stress and electrochemistry, includes Li and electron transport through SEI, incorporates effects of Li viscoplasticity, includes SEI and electrolyte interface surface energy and evaluates crucial roles of these mechanistic effects on the continuously evolving anode surface due to the viscoplastic deformation of lithium. In the model, spatial current density evolves with the stress-induced potential across the deformed anode/SEI interface. We assume SEI as a homogeneous, artificial layer on the Li-anode, which allows the investigation of the mechanical and electrochemical properties of the SEI systematically. Subsequently, we solve a set of coupled electrochemistry and displacement equations within the SEI and anode domains. The model is implemented numerically by writing a user element subroutine in Abaqus/Standard. We conduct numerical simulations under various galvanostatic conditions and SEI properties and predict conditions for anode instability. We find that Li viscoplasticity is one of the key attributes that drives instability in the Li-anode and show that applying a soft artificial SEI layer on the Li-anode to minimize viscoplastic deformation can be an effective method. We also report the role of artificial SEI elasticity and thickness on anode stability. Selected stability maps are provided as a design aid for artificial SEI. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.03482 [pdf]

Managing Renewable Energy Resources Using Equity-Market Risk Tools - the Efficient Frontiers

Authors: Haim Grebel, Divya Vikas, Jim Shi

Abstract: The energy market, and specifically the renewable sector carries volatility and risks, similar to the financial market. Here, we leverage on a well-established, return-risk approach, commonly used by equity portfolio-managers and apply it to energy resources. We visualize the relationship between the resources' costs and their risks in terms of efficient frontiers. We apply this analysis to public… ▽ More The energy market, and specifically the renewable sector carries volatility and risks, similar to the financial market. Here, we leverage on a well-established, return-risk approach, commonly used by equity portfolio-managers and apply it to energy resources. We visualize the relationship between the resources' costs and their risks in terms of efficient frontiers. We apply this analysis to publically available data for various US regions: Central, Eastern and Western coasts. Since risk management is contingent on costs, this approach sheds useful light in assessing dynamic pricing in modern electrical grids. By integrating geographical and temporal dimensions into our research, we aim at providing more nuanced and context-specific recommendations for energy resource allocation. This approach may help decision-makers in the renewable energy sector to make informed choices that account for regional variations, climatic conditions, and long-term performance trends. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 9 pages, 3 figures, 10 ref

arXiv:2405.00389 [pdf, other]

Employing Federated Learning for Training Autonomous HVAC Systems

Authors: Fredrik Hagström, Vikas Garg, Fabricio Oliveira

Abstract: Buildings account for 40 % of global energy consumption. A considerable portion of building energy consumption stems from heating, ventilation, and air conditioning (HVAC), and thus implementing smart, energy-efficient HVAC systems has the potential to significantly impact the course of climate change. In recent years, model-free reinforcement learning algorithms have been increasingly assessed fo… ▽ More Buildings account for 40 % of global energy consumption. A considerable portion of building energy consumption stems from heating, ventilation, and air conditioning (HVAC), and thus implementing smart, energy-efficient HVAC systems has the potential to significantly impact the course of climate change. In recent years, model-free reinforcement learning algorithms have been increasingly assessed for this purpose due to their ability to learn and adapt purely from experience. They have been shown to outperform classical controllers in terms of energy cost and consumption, as well as thermal comfort. However, their weakness lies in their relatively poor data efficiency, requiring long periods of training to reach acceptable policies, making them inapplicable to real-world controllers directly. Hence, common research goals are to improve the learning speed, as well as to improve their ability to generalize, in order to facilitate transfer learning to unseen building environments. In this paper, we take a federated learning approach to training the reinforcement learning controller of an HVAC system. A global control policy is learned by aggregating local policies trained on multiple data centers located in different climate zones. The goal of the policy is to simultaneously minimize energy consumption and maximize thermal comfort. The federated optimization strategy indirectly increases both the rate at which experience data is collected and the variation in the data. We demonstrate through experimental evaluation that these effects lead to a faster learning speed, as well as greater generalization capabilities in the federated policy compared to any individually trained policy. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.16910 [pdf]

A Nitrogen Alternative: Use of Plasma Activated Water as Nitrogen Source in Hydroponic Solution for Radish Growth

Authors: Vikas Rathore, Sudhir Kumar Nema

Abstract: The study investigates the potential of Plasma-Activated Water (PAW) as a nitrogen supplement in hydroponic cultivation (HS-N+PAW), specifically focusing on radish seed germination and plant growth. PAW, produced using a dielectric barrier discharge pencil plasma jet using air as plasma forming gas, is compared against conventional hydroponic solution (HS) and hydroponic solution without nitrogen… ▽ More The study investigates the potential of Plasma-Activated Water (PAW) as a nitrogen supplement in hydroponic cultivation (HS-N+PAW), specifically focusing on radish seed germination and plant growth. PAW, produced using a dielectric barrier discharge pencil plasma jet using air as plasma forming gas, is compared against conventional hydroponic solution (HS) and hydroponic solution without nitrogen (HS-N). PAW treatment completely eliminates microbial growth in seeds. Radish plants cultivated with HS-N+PAW display approximately 30% and 3% longer roots compared to those grown with HS-N and HS, respectively, with shoot length increasing by ~16.5% (HS-N) and <1% (HS). Root weight sees a substantial increase of ~51% with HS-N+PAW compared to HS-N, while the increase with HS is not significant. Similarly, shoot fresh weight sees a notable increase of 50% (HS-N) and 10% (HS). In terms of biochemical composition, radish roots show a significant increase of approximately 15.3% in soluble sugar concentration with HS-N+PAW compared to HS-N. Protein concentration in radish leaves increases by ~5.1% and ~19.0% with HS-N+PAW compared to HS-N and HS, respectively. Heightened soluble sugar and protein concentrations in HS-N+PAW-grown plants, indicating enhanced metabolic activity and nutrient uptake. However, variations in chlorophyll and carotenoid concentrations in leaves among different growth media are statistically insignificant. H2O2 concentration root and shoot remains consistent across growth media, electrolytic and phenolic leakage, along with antioxidant enzyme activities, exhibit differential responses, underscoring the impact of growth conditions on plant stress responses. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.13521 [pdf, other]

doi 10.1145/3613904.3642822

Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces

Authors: Yue Jiang, Changkong Zhou, Vikas Garg, Antti Oulasvirta

Abstract: Present-day graphical user interfaces (GUIs) exhibit diverse arrangements of text, graphics, and interactive elements such as buttons and menus, but representations of GUIs have not kept up. They do not encapsulate both semantic and visuo-spatial relationships among elements. To seize machine learning's potential for GUIs more efficiently, Graph4GUI exploits graph neural networks to capture indivi… ▽ More Present-day graphical user interfaces (GUIs) exhibit diverse arrangements of text, graphics, and interactive elements such as buttons and menus, but representations of GUIs have not kept up. They do not encapsulate both semantic and visuo-spatial relationships among elements. To seize machine learning's potential for GUIs more efficiently, Graph4GUI exploits graph neural networks to capture individual elements' properties and their semantic-visuo-spatial constraints in a layout. The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task, which involved predicting the positions of remaining unplaced elements in a partially completed GUI. The new model's suggestions showed alignment and visual appeal superior to the baseline method and received higher subjective ratings for preference. Furthermore, we demonstrate the practical benefits and efficiency advantages designers perceive when utilizing our model as an autocompletion plug-in. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 18 pages

arXiv:2404.12497 [pdf]

Novel indium phosphide charged particle detector characterization with a 120 GeV proton beam

Authors: Sungjoon Kim, Manoj B. Jadhav, Vikas Berry, Jessica E. Metcalfe, Anirudha V. Sumant

Abstract: Thin film detectors which incorporate semiconductor materials other than silicon have the potential to build upon their unique material properties and offer advantages such as faster response times, operation at room temperature, and radiation hardness. To explore the possibility, promising candidate materials were selected, and particle tracking detectors were fabricated. An indium phosphide dete… ▽ More Thin film detectors which incorporate semiconductor materials other than silicon have the potential to build upon their unique material properties and offer advantages such as faster response times, operation at room temperature, and radiation hardness. To explore the possibility, promising candidate materials were selected, and particle tracking detectors were fabricated. An indium phosphide detector with a metal-insulator-metal (MIM) structure has been fabricated for particle tracking. The detector was tested using radioactive sources and a high energy proton beam at Fermi National Accelerator Laboratory. In addition to its simplistic design and fabrication process, the indium phosphide particle detector showed a very fast response time of hundreds of picoseconds for the 120 GeV protons, which are comparable to the ultra-fast silicon detectors. This fast-timing response is attributed to the high electron mobility of indium phosphide. Such material properties can be leveraged to build novel detectors with superlative performance. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 10 pages, 7 figures

arXiv:2404.10708 [pdf, other]

Kee** the photon in the dark: Enabling full quantum dot control by chirped pulses and magnetic fields

Authors: Florian Kappe, René Schwarz, Yusuf Karli, Thomas Bracht, Vollrath M. Axt, Armando Rastelli, Vikas Remesh, Doris E. Reiter, Gregor Weihs

Abstract: Because dark excitons in quantum dots are not directly optically accessible, so far they have not played a significant role in using quantum dots for photon generation. They possess significantly longer lifetimes than their brighter counterparts and hence offer enormous potential for photon storage or manipulation. In this work, we demonstrate an all-optical storage and retrieval of the spin-forbi… ▽ More Because dark excitons in quantum dots are not directly optically accessible, so far they have not played a significant role in using quantum dots for photon generation. They possess significantly longer lifetimes than their brighter counterparts and hence offer enormous potential for photon storage or manipulation. In this work, we demonstrate an all-optical storage and retrieval of the spin-forbidden dark exciton in a quantum dot from the ground state employing chirped pulses and an in-plane magnetic field. Our experimental findings are in excellent agreement with theoretical predictions of the dynamics calculated using state-of-the-art product tensor methods. Our scheme enables an all-optical control of dark states without relying on any preceding decays. This opens up a new dimension for optimal quantum control and time-bin entangled photon pair generation from quantum dots. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.10659 [pdf, other]

Cybersecurity in the Quantum Era: Assessing the Impact of Quantum Computing on Infrastructure

Authors: Yaser Baseri, Vikas Chouhan, Ali Ghorbani

Abstract: The emergence of quantum computing presents a double-edged sword for cybersecurity. While its immense power holds promise for advancements in various fields, it also threatens to crack the foundation of current encryption methods. This analysis explores the impact of quantum computing on critical infrastructure and cloud services, meticulously evaluating potential vulnerabilities across various la… ▽ More The emergence of quantum computing presents a double-edged sword for cybersecurity. While its immense power holds promise for advancements in various fields, it also threatens to crack the foundation of current encryption methods. This analysis explores the impact of quantum computing on critical infrastructure and cloud services, meticulously evaluating potential vulnerabilities across various layers, including applications, data, runtime, middleware, operating systems, virtualization, hardware, storage, and networks. We advocate for proactive security strategies and collaboration between sectors to develop and implement quantum-resistant cryptography. This crucial shift necessitates a comprehensive approach, and the paper introduces a tailored security blueprint encompassing nine critical infrastructure components. This blueprint strengthens each area's defenses against potential quantum-induced cyber threats. Our strategic vulnerability and risk assessment equips stakeholders with the knowledge to navigate the complex quantum threat landscape. This empowers them to make informed decisions about design, implementation, and policy formulation, ultimately bolstering the resilience of critical infrastructure. In essence, this analysis not only forecasts quantum threats but also offers a sophisticated, actionable framework for fortifying infrastructure and cloud environments against the multifaceted challenges of the quantum era. This proactive approach will ensure continued data security and a thriving digital landscape in the years to come △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.10024 [pdf, other]

ClimODE: Climate and Weather Forecasting with Physics-informed Neural ODEs

Authors: Yogesh Verma, Markus Heinonen, Vikas Garg

Abstract: Climate and weather prediction traditionally relies on complex numerical simulations of atmospheric physics. Deep learning approaches, such as transformers, have recently challenged the simulation paradigm with complex network forecasts. However, they often act as data-driven black-box models that neglect the underlying physics and lack uncertainty quantification. We address these limitations with… ▽ More Climate and weather prediction traditionally relies on complex numerical simulations of atmospheric physics. Deep learning approaches, such as transformers, have recently challenged the simulation paradigm with complex network forecasts. However, they often act as data-driven black-box models that neglect the underlying physics and lack uncertainty quantification. We address these limitations with ClimODE, a spatiotemporal continuous-time process that implements a key principle of advection from statistical mechanics, namely, weather changes due to a spatial movement of quantities over time. ClimODE models precise weather evolution with value-conserving dynamics, learning global weather transport as a neural flow, which also enables estimating the uncertainty in predictions. Our approach outperforms existing data-driven methods in global and regional forecasting with an order of magnitude smaller parameterization, establishing a new state of the art. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted as ICLR 2024 Oral. Project website: https://yogeshverma1998.github.io/ClimODE/

arXiv:2404.09818 [pdf, other]

Error Detection and Correction Codes for Safe In-Memory Computations

Authors: Luca Parrini, Taha Soliman, Benjamin Hettwer, Jan Micha Borrmann, Simranjeet Singh, Ankit Bende, Vikas Rana, Farhad Merchant, Norbert Wehn

Abstract: In-Memory Computing (IMC) introduces a new paradigm of computation that offers high efficiency in terms of latency and power consumption for AI accelerators. However, the non-idealities and defects of emerging technologies used in advanced IMC can severely degrade the accuracy of inferred Neural Networks (NN) and lead to malfunctions in safety-critical applications. In this paper, we investigate a… ▽ More In-Memory Computing (IMC) introduces a new paradigm of computation that offers high efficiency in terms of latency and power consumption for AI accelerators. However, the non-idealities and defects of emerging technologies used in advanced IMC can severely degrade the accuracy of inferred Neural Networks (NN) and lead to malfunctions in safety-critical applications. In this paper, we investigate an architectural-level mitigation technique based on the coordinated action of multiple checksum codes, to detect and correct errors at run-time. This implementation demonstrates higher efficiency in recovering accuracy across different AI algorithms and technologies compared to more traditional methods such as Triple Modular Redundancy (TMR). The results show that several configurations of our implementation recover more than 91% of the original accuracy with less than half of the area required by TMR and less than 40% of latency overhead. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: This paper will be presented at 29th IEEE European Test Symposium 2024 (ETS) 2024

arXiv:2404.08348 [pdf, other]

Theory of time-bin entangled photons from quantum emitters

Authors: Thomas K. Bracht, Florian Kappe, Moritz Cygorek, Tim Seidelmann, Yusuf Karli, Vikas Remesh, Gregor Weihs, Vollrath Martin Axt, Doris E. Reiter

Abstract: Entangled photon pairs form the foundation for many applications in the realm of quantum communication. For fiber-optic transfer of entangled photon pairs, time-bin encoding can potentially offer an improved stability compared to polarization encoded qubits. Here, we lay the theoretical foundations to describe the measurement of time-bin entangled photons. We derive multi-time correlation function… ▽ More Entangled photon pairs form the foundation for many applications in the realm of quantum communication. For fiber-optic transfer of entangled photon pairs, time-bin encoding can potentially offer an improved stability compared to polarization encoded qubits. Here, we lay the theoretical foundations to describe the measurement of time-bin entangled photons. We derive multi-time correlation functions of the time-bin encoded photon pairs, corresponding to quantum state tomographic measurements. Our theory can be the starting point to extend the simulations to include all kinds of loss or decoherence effects that apply in a specific quantum system for realistic simulation for time-bin entanglement from quantum emitters. △ Less

Submitted 24 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: 8 pages, 3 figures

arXiv:2404.08232 [pdf]

doi 10.1016/j.cose.2024.103883

Navigating Quantum Security Risks in Networked Environments: A Comprehensive Study of Quantum-Safe Network Protocols

Authors: Yaser Baseri, Vikas Chouhan, Abdelhakim Hafid

Abstract: The emergence of quantum computing poses a formidable security challenge to network protocols traditionally safeguarded by classical cryptographic algorithms. This paper provides an exhaustive analysis of vulnerabilities introduced by quantum computing in a diverse array of widely utilized security protocols across the layers of the TCP/IP model, including TLS, IPsec, SSH, PGP, and more. Our inves… ▽ More The emergence of quantum computing poses a formidable security challenge to network protocols traditionally safeguarded by classical cryptographic algorithms. This paper provides an exhaustive analysis of vulnerabilities introduced by quantum computing in a diverse array of widely utilized security protocols across the layers of the TCP/IP model, including TLS, IPsec, SSH, PGP, and more. Our investigation focuses on precisely identifying vulnerabilities susceptible to exploitation by quantum adversaries at various migration stages for each protocol while also assessing the associated risks and consequences for secure communication. We delve deep into the impact of quantum computing on each protocol, emphasizing potential threats posed by quantum attacks and scrutinizing the effectiveness of post-quantum cryptographic solutions. Through carefully evaluating vulnerabilities and risks that network protocols face in the post-quantum era, this study provides invaluable insights to guide the development of appropriate countermeasures. Our findings contribute to a broader comprehension of quantum computing's influence on network security and offer practical guidance for protocol designers, implementers, and policymakers in addressing the challenges stemming from the advancement of quantum computing. This comprehensive study is a crucial step toward fortifying the security of networked environments in the quantum age. △ Less

Submitted 6 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Journal ref: Computers & Security, Volume 142, July 2024, 103883

arXiv:2404.08231 [pdf, other]

doi 10.2139/ssrn.4750609

Evaluation Framework for Quantum Security Risk Assessment: A Comprehensive Study for Quantum-Safe Migration

Authors: Yaser Baseri, Vikas Chouhan, Ali Ghorbani, Aaron Chow

Abstract: The rise of large-scale quantum computing poses a significant threat to traditional cryptographic security measures. Quantum attacks undermine current asymmetric cryptographic algorithms, rendering them ineffective. Even symmetric key cryptography is vulnerable, albeit to a lesser extent, suggesting longer keys or extended hash functions for security. Thus, current cryptographic solutions are inad… ▽ More The rise of large-scale quantum computing poses a significant threat to traditional cryptographic security measures. Quantum attacks undermine current asymmetric cryptographic algorithms, rendering them ineffective. Even symmetric key cryptography is vulnerable, albeit to a lesser extent, suggesting longer keys or extended hash functions for security. Thus, current cryptographic solutions are inadequate against emerging quantum threats. Organizations must transition to quantum-safe environments with robust continuity plans and meticulous risk management. This study explores the challenges of migrating to quantum-safe cryptographic states, introducing a comprehensive security risk assessment framework. We propose a security risk assessment framework that examines vulnerabilities across algorithms, certificates, and protocols throughout the migration process (pre-migration, during migration, post-migration). We link these vulnerabilities to the STRIDE threat model to assess their impact and likelihood. Then, we discuss practical mitigation strategies for critical components like algorithms, public key infrastructures, and protocols. Our study not only identifies potential attacks and vulnerabilities at each layer and migration stage but also suggests possible countermeasures and alternatives to enhance system resilience, empowering organizations to construct a secure infrastructure for the quantum era. Through these efforts, we establish the foundation for enduring security in networked systems amid the challenges of the quantum era. △ Less

Submitted 22 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.08143 [pdf]

doi 10.4018/IJMDEM.341792

A-DisETrac Advanced Analytic Dashboard for Distributed Eye Tracking

Authors: Yasasi Abeysinghe, Bhanuka Mahanama, Gavindya Jayawardena, Yasith Jayawardana, Mohan Sunkara, Andrew T. Duchowski, Vikas Ashok, Sampath Jayarathna

Abstract: Understanding how individuals focus and perform visual searches during collaborative tasks can help improve user engagement. Eye tracking measures provide informative cues for such understanding. This article presents A-DisETrac, an advanced analytic dashboard for distributed eye tracking. It uses off-the-shelf eye trackers to monitor multiple users in parallel, compute both traditional and advanc… ▽ More Understanding how individuals focus and perform visual searches during collaborative tasks can help improve user engagement. Eye tracking measures provide informative cues for such understanding. This article presents A-DisETrac, an advanced analytic dashboard for distributed eye tracking. It uses off-the-shelf eye trackers to monitor multiple users in parallel, compute both traditional and advanced gaze measures in real-time, and display them on an interactive dashboard. Using two pilot studies, the system was evaluated in terms of user experience and utility, and compared with existing work. Moreover, the system was used to study how advanced gaze measures such as ambient-focal coefficient K and real-time index of pupillary activity relate to collaborative behavior. It was observed that the time a group takes to complete a puzzle is related to the ambient visual scanning behavior quantified and groups that spent more time had more scanning behavior. User experience questionnaire results suggest that their dashboard provides a comparatively good user experience. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Journal ref: International Journal of Multimedia Data Engineering and Management (IJMDEM) 15.1 (2024) 1-20

arXiv:2404.07395 [pdf, other]

Global versus Local: Evaluating AlexNet Architectures for Tropical Cyclone Intensity Estimation

Authors: Vikas Dwivedi

Abstract: Given the destructive impacts of tropical cyclones, it is critical to have a reliable system for cyclone intensity detection. Various techniques are available for this purpose, each with differing levels of accuracy. In this paper, we introduce two ensemble-based models based on AlexNet architecture to estimate tropical cyclone intensity using visible satellite images. The first model, trained on… ▽ More Given the destructive impacts of tropical cyclones, it is critical to have a reliable system for cyclone intensity detection. Various techniques are available for this purpose, each with differing levels of accuracy. In this paper, we introduce two ensemble-based models based on AlexNet architecture to estimate tropical cyclone intensity using visible satellite images. The first model, trained on the entire dataset, is called the global AlexNet model. The second model is a distributed version of AlexNet in which multiple AlexNets are trained separately on subsets of the training data categorized according to the Saffir-Simpson wind speed scale prescribed by the meterologists. We evaluated the performance of both models against a deep learning benchmark model called \textit{Deepti} using a publicly available cyclone image dataset. Results indicate that both the global model (with a root mean square error (RMSE) of 9.03 knots) and the distributed model (with a RMSE of 9.3 knots) outperform the benchmark model (with a RMSE of 13.62 knots). We provide a thorough discussion of our solution approach, including an explanantion of the AlexNet's performance using gradient class activation maps (grad-CAM). Our proposed solution strategy allows future experimentation with various deep learning models in both single and multi-channel settings. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.05985 [pdf]

Boosting Digital Safeguards: Blending Cryptography and Steganography

Authors: Anamitra Maiti, Subham Laha, Rishav Upadhaya, Soumyajit Biswas, Vikas Chaudhary, Biplab Kar, Nikhil Kumar, Jaydip Sen

Abstract: In today's digital age, the internet is essential for communication and the sharing of information, creating a critical need for sophisticated data security measures to prevent unauthorized access and exploitation. Cryptography encrypts messages into a cipher text that is incomprehensible to unauthorized readers, thus safeguarding data during its transmission. Steganography, on the other hand, ori… ▽ More In today's digital age, the internet is essential for communication and the sharing of information, creating a critical need for sophisticated data security measures to prevent unauthorized access and exploitation. Cryptography encrypts messages into a cipher text that is incomprehensible to unauthorized readers, thus safeguarding data during its transmission. Steganography, on the other hand, originates from the Greek term for "covered writing" and involves the art of hiding data within another medium, thereby facilitating covert communication by making the message invisible. This proposed approach takes advantage of the latest advancements in Artificial Intelligence (AI) and Deep Learning (DL), especially through the application of Generative Adversarial Networks (GANs), to improve upon traditional steganographic methods. By embedding encrypted data within another medium, our method ensures that the communication remains hidden from prying eyes. The application of GANs enables a smart, secure system that utilizes the inherent sensitivity of neural networks to slight alterations in data, enhancing the protection against detection. By merging the encryption techniques of cryptography with the hiding capabilities of steganography, and augmenting these with the strengths of AI, we introduce a comprehensive security system designed to maintain both the privacy and integrity of information. This system is crafted not just to prevent unauthorized access or modification of data, but also to keep the existence of the data hidden. This fusion of technologies tackles the core challenges of data security in the current era of open digital communication, presenting an advanced solution with the potential to transform the landscape of information security. △ Less

Submitted 11 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: This report pertains to the Capstone Project done by Group 3 of the Fall batch of 2023 students at Praxis Tech School, Kolkata, India. The reports consists of 36 pages and it includes 11 figures and 5 tables

arXiv:2404.03570 [pdf, other]

Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity

Authors: Jake Varley, Sumeet Singh, Deepali Jain, Krzysztof Choromanski, Andy Zeng, Somnath Basu Roy Chowdhury, Avinava Dubey, Vikas Sindhwani

Abstract: We present an embodied AI system which receives open-ended natural language instructions from a human, and controls two arms to collaboratively accomplish potentially long-horizon tasks over a large workspace. Our system is modular: it deploys state of the art Large Language Models for task planning,Vision-Language models for semantic perception, and Point Cloud transformers for gras**. With sem… ▽ More We present an embodied AI system which receives open-ended natural language instructions from a human, and controls two arms to collaboratively accomplish potentially long-horizon tasks over a large workspace. Our system is modular: it deploys state of the art Large Language Models for task planning,Vision-Language models for semantic perception, and Point Cloud transformers for gras**. With semantic and physical safety in mind, these modules are interfaced with a real-time trajectory optimizer and a compliant tracking controller to enable human-robot proximity. We demonstrate performance for the following tasks: bi-arm sorting, bottle opening, and trash disposal tasks. These are done zero-shot where the models used have not been trained with any real world data from this bi-arm robot, scenes or workspace.Composing both learning- and non-learning-based components in a modular fashion with interpretable inputs and outputs allows the user to easily debug points of failures and fragilities. One may also in-place swap modules to improve the robustness of the overall platform, for instance with imitation-learned policies. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.19495 [pdf, other]

CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

Authors: Avinash Paliwal, Wei Ye, **hui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari

Abstract: The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured p… ▽ More The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (e.g., 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Project page: https://people.engr.tamu.edu/nimak/Papers/CoherentGS

arXiv:2403.18247 [pdf, other]

An Experimentally Validated Feasible Quantum Protocol for Identity-Based Signature with Application to Secure Email Communication

Authors: Tapaswini Mohanty, Vikas Srivastava, Sumit Kumar Debnath, Debasish Roy, Kouichi Sakurai, Sourav Mukhopadhyay

Abstract: Digital signatures are one of the simplest cryptographic building blocks that provide appealing security characteristics such as authenticity, unforgeability, and undeniability. In 1984, Shamir developed the first Identity-based signature (IBS) to simplify public key infrastructure and circumvent the need for certificates. It makes the process uncomplicated by enabling users to verify digital sign… ▽ More Digital signatures are one of the simplest cryptographic building blocks that provide appealing security characteristics such as authenticity, unforgeability, and undeniability. In 1984, Shamir developed the first Identity-based signature (IBS) to simplify public key infrastructure and circumvent the need for certificates. It makes the process uncomplicated by enabling users to verify digital signatures using only the identifiers of signers, such as email, phone number, etc. Nearly all existing IBS protocols rely on several theoretical assumption-based hard problems. Unfortunately, these hard problems are unsafe and pose a hazard in the quantum realm. Thus, designing IBS algorithms that can withstand quantum attacks and ensure long-term security is an important direction for future research. Quantum cryptography (QC) is one such approach. In this paper, we propose an IBS based on QC. Our scheme's security is based on the laws of quantum mechanics. It thereby achieves long-term security and provides resistance against quantum attacks. We verify the proposed design's correctness and feasibility by simulating it in a prototype quantum device and the IBM Qiskit quantum simulator. The implementation code in qiskit with Jupyternotebook is provided in the Annexure. Moreover, we discuss the application of our design in secure email communication. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.11418 [pdf, other]

Variational Sampling of Temporal Trajectories

Authors: Jurijs Nazarovs, Zhichun Huang, Xingjian Zhen, Sourav Pal, Rudrasis Chakraborty, Vikas Singh

Abstract: A deterministic temporal process can be determined by its trajectory, an element in the product space of (a) initial condition $z_0 \in \mathcal{Z}$ and (b) transition function $f: (\mathcal{Z}, \mathcal{T}) \to \mathcal{Z}$ often influenced by the control of the underlying dynamical system. Existing methods often model the transition function as a differential equation or as a recurrent neural ne… ▽ More A deterministic temporal process can be determined by its trajectory, an element in the product space of (a) initial condition $z_0 \in \mathcal{Z}$ and (b) transition function $f: (\mathcal{Z}, \mathcal{T}) \to \mathcal{Z}$ often influenced by the control of the underlying dynamical system. Existing methods often model the transition function as a differential equation or as a recurrent neural network. Despite their effectiveness in predicting future measurements, few results have successfully established a method for sampling and statistical inference of trajectories using neural networks, partially due to constraints in the parameterization. In this work, we introduce a mechanism to learn the distribution of trajectories by parameterizing the transition function $f$ explicitly as an element in a function space. Our framework allows efficient synthesis of novel trajectories, while also directly providing a convenient tool for inference, i.e., uncertainty estimation, likelihood evaluations and out of distribution detection for abnormal trajectories. These capabilities can have implications for various downstream tasks, e.g., simulation and evaluation for reinforcement learning. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11228 [pdf]

Routing Algorithms

Authors: Ujjwal Sinha, Vikas Kumar, Shubham Kumar Singh

Abstract: Routing algorithms play a crucial role in the efficient transmission of data within computer networks by determining the optimal paths for packet forwarding. This paper presents a comprehensive exploration of routing algorithms, focusing on their fundamental principles, classification, challenges, recent advancements, and practical applications. Beginning with an overview of the significance of ro… ▽ More Routing algorithms play a crucial role in the efficient transmission of data within computer networks by determining the optimal paths for packet forwarding. This paper presents a comprehensive exploration of routing algorithms, focusing on their fundamental principles, classification, challenges, recent advancements, and practical applications. Beginning with an overview of the significance of routing in modern communication networks, the paper delves into the historical evolution of routing algorithms, tracing their development from early approaches to contemporary techniques. Key categories of routing algorithms, including distance vector, link-state, and path vector algorithms, are examined in detail, along with hybrid approaches that integrate multiple routing paradigms. Common challenges faced by routing algorithms, such as routing loops and scalability issues, are identified, and current research efforts aimed at addressing these challenges are discussed. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.10944 [pdf, other]

Human Centered AI for Indian Legal Text Analytics

Authors: Sudipto Ghosh, Devanshu Verma, Balaji Ganesan, Purnima Bindal, Vikas Kumar, Vasudha Bhatnagar

Abstract: Legal research is a crucial task in the practice of law. It requires intense human effort and intellectual prudence to research a legal case and prepare arguments. Recent boom in generative AI has not translated to proportionate rise in impactful legal applications, because of low trustworthiness and and the scarcity of specialized datasets for training Large Language Models (LLMs). This position… ▽ More Legal research is a crucial task in the practice of law. It requires intense human effort and intellectual prudence to research a legal case and prepare arguments. Recent boom in generative AI has not translated to proportionate rise in impactful legal applications, because of low trustworthiness and and the scarcity of specialized datasets for training Large Language Models (LLMs). This position paper explores the potential of LLMs within Legal Text Analytics (LTA), highlighting specific areas where the integration of human expertise can significantly enhance their performance to match that of experts. We introduce a novel dataset and describe a human centered, compound AI system that principally incorporates human inputs for performing LTA tasks with LLMs. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 7 pages, 7 figures

Showing 1–50 of 705 results for author: Vikas