Search | arXiv e-print repository

Data-driven Model Reduction for Soft Robots via Lagrangian Operator Inference

Authors: Harsh Sharma, Iman Adibnazari, Jacobo Cervera-Torralba, Michael T. Tolley, Boris Kramer

Abstract: Data-driven model reduction methods provide a nonintrusive way of constructing computationally efficient surrogates of high-fidelity models for real-time control of soft robots. This work leverages the Lagrangian nature of the model equations to derive structure-preserving linear reduced-order models via Lagrangian Operator Inference and compares their performance with prominent linear model reduc… ▽ More Data-driven model reduction methods provide a nonintrusive way of constructing computationally efficient surrogates of high-fidelity models for real-time control of soft robots. This work leverages the Lagrangian nature of the model equations to derive structure-preserving linear reduced-order models via Lagrangian Operator Inference and compares their performance with prominent linear model reduction techniques through an anguilliform swimming soft robot model example with 231,336 degrees of freedom. The case studies demonstrate that preserving the underlying Lagrangian structure leads to learned models with higher predictive accuracy and robustness to unseen inputs. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.07969 [pdf, other]

Entanglement asymmetry in conformal field theory and holography

Authors: Francesco Benini, Victor Godet, Amartya Harsh Singh

Abstract: Entanglement asymmetry is a measure of symmetry breaking in quantum subsystems, inspired by quantum information theory, particularly suited to study out-of-equilibrium states. We study the entanglement asymmetry of a class of excited "coherent states" in conformal quantum field theories with a U(1) symmetry, employing Euclidean path-integral methods with topological symmetry defects and the replic… ▽ More Entanglement asymmetry is a measure of symmetry breaking in quantum subsystems, inspired by quantum information theory, particularly suited to study out-of-equilibrium states. We study the entanglement asymmetry of a class of excited "coherent states" in conformal quantum field theories with a U(1) symmetry, employing Euclidean path-integral methods with topological symmetry defects and the replica formalism. We compute, at leading order in perturbation theory, the asymmetry for a variety of subsystems, including finite spherical subregions in flat space, in finite volume, and at positive temperature. We also study its Lorentzian time evolution, showcasing the dynamical restoration of the symmetry due to thermalization, as well as the presence of a quantum Mpemba effect. Our results are universal, and apply in any number of dimensions. We also show that the perturbative entanglement asymmetry is related to the Fisher information metric, which has a known holographic dual called Hollands-Wald canonical energy, and that it is captured by the AdS bulk charge contained in the entanglement wedge. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 29 pages plus appendices, 11 figures

Report number: SISSA 14/2024/FISI

arXiv:2407.07946 [pdf, other]

The Type I Superluminous Supernova Catalog I: Light Curve Properties, Models, and Catalog Description

Authors: Sebastian Gomez, Matt Nicholl, Edo Berger, Peter K. Blanchard, V. Ashley Villar, Sofia Rest, Griffin Hosseinzadeh, Aysha Aamer, Yukta Ajay, Wasundara Athukoralalage, David C. Coulter, Tarraneh Eftekhari, Achille Fiore, Noah Franz, Ori Fox, Alexander Gagliano, Daichi Hiramatsu, D. Andrew Howell, Brian Hsu, Mitchell Karmen, Matthew R. Siebert, Réka Könyves-Tóth, Harsh Kumar, Curtis McCully, Craig Pellegrino , et al. (3 additional authors not shown)

Abstract: We present the most comprehensive catalog to date of Type I Superluminous Supernovae (SLSNe), a class of stripped envelope supernovae (SNe) characterized by exceptionally high luminosities. We have compiled a sample of 262 SLSNe reported through 2022 December 31. We verified the spectroscopic classification of each SLSN and collated an exhaustive data set of UV, optical and IR photometry from both… ▽ More We present the most comprehensive catalog to date of Type I Superluminous Supernovae (SLSNe), a class of stripped envelope supernovae (SNe) characterized by exceptionally high luminosities. We have compiled a sample of 262 SLSNe reported through 2022 December 31. We verified the spectroscopic classification of each SLSN and collated an exhaustive data set of UV, optical and IR photometry from both publicly available data and our own FLEET observational follow-up program, totaling over 30,000 photometric detections. Using these data we derive observational parameters such as the peak absolute magnitudes, rise and decline timescales, as well as bolometric luminosities, temperature and photospheric radius evolution for all SLSNe. Additionally, we model all light curves using a hybrid model that includes contributions from both a magnetar central engine and the radioactive decay of $^{56}$Ni. We explore correlations among various physical and observational parameters, and recover the previously found relation between ejecta mass and magnetar spin, as well as the overall progenitor pre-explosion mass distribution with a peak at $\approx 6.5$ M$_\odot$. We find no significant redshift dependence for any parameter, and no evidence for distinct sub-types of SLSNe. We find that $< 3$\% of SLSNe are best fit with a significant contribution from radioactive decay $\gtrsim 50$\%, representing a set of relatively dim and slowly declining SNe. We provide several analytical tools designed to simulate typical SLSN light curves across a broad range of wavelengths and phases, enabling accurate K-corrections, bolometric scaling calculations, and inclusion of SLSNe in survey simulations or future comparison works. The complete catalog, including all of the photometry, models, and derived parameters, is made available as an open-source resource on GitHub. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 59 pages, 22 Figures, Submitted to MNRAS

arXiv:2407.07526 [pdf, other]

ler : LVK (LIGO-Virgo-KAGRA collaboration) event (compact-binary mergers) rate calculator and simulator

Authors: Hemantakumar Phurailatpam, Anupreeta More, Harsh Narola, Ng Chung Yin, Justin Janquart, Chris Van Den Broeck, Otto Akseli Hannukkala, Neha Singh, David Keitel

Abstract: '$ler$' is a statistics-based Python package specifically designed for computing detectable rates of both lensed and unlensed GW events, catering to the requirements of the LIGO-Virgo-KAGRA Scientific Collaboration and astrophysics research scholars. The core functionality of '$ler$' intricately hinges upon the interplay of various components which include sampling the properties of compact-binary… ▽ More '$ler$' is a statistics-based Python package specifically designed for computing detectable rates of both lensed and unlensed GW events, catering to the requirements of the LIGO-Virgo-KAGRA Scientific Collaboration and astrophysics research scholars. The core functionality of '$ler$' intricately hinges upon the interplay of various components which include sampling the properties of compact-binary sources, lens galaxies characteristics, solving lens equations to derive properties of resultant images, and computing detectable GW rates. This comprehensive functionality builds on the leveraging of array operations and linear algebra from the $numpy$ library, enhanced by interpolation methods from $scipy$ and Python's $multiprocessing$ capabilities. Efficiency is further boosted by the $numba$ library's Just-In-Time ($njit$) compilation, optimizing extensive numerical computations and employing the inverse transform sampling method to replace more cumbersome rejection sampling. The modular design of '$ler$' not only optimizes speed and functionality but also ensures adaptability and upgradability, supporting the integration of additional statistics as research evolves. Currently, '$ler$' is an important tool in generating simulated GW events, both lensed and unlensed, and provides astrophysically accurate distributions of event-related parameters for both detectable and non-detectable events. This functionality aids in event validation and enhances the forecasting of detection capabilities across various GW detectors to study such events. The architecture of the '$ler$' API facilitates seamless compatibility with other software packages, allowing researchers to integrate and utilize its functionalities based on specific scientific requirements. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 5 pages, 1 Logo in each of the pages, this is for the JOSS publication

arXiv:2407.06093 [pdf, other]

Artificial Intuition: Efficient Classification of Scientific Abstracts

Authors: Harsh Sakhrani, Naseela Pervez, Anirudh Ravi Kumar, Fred Morstatter, Alexandra Graddy Reed, Andrea Belz

Abstract: It is desirable to coarsely classify short scientific texts, such as grant or publication abstracts, for strategic insight or research portfolio management. These texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation. Yet this task is remarkably difficult to automate because of brevity and the absence of context. To address this gap, we h… ▽ More It is desirable to coarsely classify short scientific texts, such as grant or publication abstracts, for strategic insight or research portfolio management. These texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation. Yet this task is remarkably difficult to automate because of brevity and the absence of context. To address this gap, we have developed a novel approach to generate and appropriately assign coarse domain-specific labels. We show that a Large Language Model (LLM) can provide metadata essential to the task, in a process akin to the augmentation of supplemental knowledge representing human intuition, and propose a workflow. As a pilot study, we use a corpus of award abstracts from the National Aeronautics and Space Administration (NASA). We develop new assessment tools in concert with established performance metrics. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.02520 [pdf, other]

RaCIL: Ray Tracing based Multi-UAV Obstacle Avoidance through Composite Imitation Learning

Authors: Harsh Bansal, Vyom Goyal, Bhaskar Joshi, Akhil Gupta, Harikumar Kandath

Abstract: In this study, we address the challenge of obstacle avoidance for Unmanned Aerial Vehicles (UAVs) through an innovative composite imitation learning approach that combines Proximal Policy Optimization (PPO) with Behavior Cloning (BC) and Generative Adversarial Imitation Learning (GAIL), enriched by the integration of ray-tracing techniques. Our research underscores the significant role of ray-trac… ▽ More In this study, we address the challenge of obstacle avoidance for Unmanned Aerial Vehicles (UAVs) through an innovative composite imitation learning approach that combines Proximal Policy Optimization (PPO) with Behavior Cloning (BC) and Generative Adversarial Imitation Learning (GAIL), enriched by the integration of ray-tracing techniques. Our research underscores the significant role of ray-tracing in enhancing obstacle detection and avoidance capabilities. Moreover, we demonstrate the effectiveness of incorporating GAIL in coordinating the flight paths of two UAVs, showcasing improved collision avoidance capabilities. Extending our methodology, we apply our combined PPO, BC, GAIL, and ray-tracing framework to scenarios involving four UAVs, illustrating its scalability and adaptability to more complex scenarios. The findings indicate that our approach not only improves the reliability of basic PPO based obstacle avoidance but also paves the way for advanced autonomous UAV operations in crowded or dynamic environments. △ Less

Submitted 24 June, 2024; originally announced July 2024.

arXiv:2407.02519 [pdf, other]

Anvil: An integration of artificial intelligence, sampling techniques, and a combined CAD-CFD tool

Authors: Harsh Vardhan, Umesh Timalsina, Michael Sandborn, David Hyde, Peter Volgyesi, Janos Sztipanovits

Abstract: In this work, we introduce an open-source integrated CAD-CFD tool, Anvil, which combines FreeCAD for CAD modeling and OpenFOAM for CFD analysis, along with an AI-based optimization method (Bayesian optimization) and other sampling algorithms. Anvil serves as a scientific machine learning tool for shape optimization in three modes: data generation, CFD evaluation, and shape optimization. In data ge… ▽ More In this work, we introduce an open-source integrated CAD-CFD tool, Anvil, which combines FreeCAD for CAD modeling and OpenFOAM for CFD analysis, along with an AI-based optimization method (Bayesian optimization) and other sampling algorithms. Anvil serves as a scientific machine learning tool for shape optimization in three modes: data generation, CFD evaluation, and shape optimization. In data generation mode, it automatically runs CFD evaluations and generates data for training a surrogate model. In optimization mode, it searches for the optimal design under given requirements and optimization metrics. In CFD mode, a single CAD file can be evaluated with a single OpenFOAM run. To use Anvil, experimenters provide a JSON configuration file and a parametric CAD seed design. Anvil can be used to study solid-fluid dynamics for any subsonic flow conditions and has been demonstrated in various simulation and optimization use cases. The open-source code for the tool, installation process, artifacts (such as CAD seed designs and example STL models), experimentation results, and detailed documentation can be found at \url{https://github.com/symbench/Anvil}. △ Less

Submitted 24 June, 2024; originally announced July 2024.

arXiv:2406.18595 [pdf, other]

Realtime Dynamic Gaze Target Tracking and Depth-Level Estimation

Authors: Esmaeil Seraj, Harsh Bhate, Walter Talamonti

Abstract: The integration of Transparent Displays (TD) in various applications, such as Heads-Up Displays (HUDs) in vehicles, is a burgeoning field, poised to revolutionize user experiences. However, this innovation brings forth significant challenges in realtime human-device interaction, particularly in accurately identifying and tracking a user's gaze on dynamically changing TDs. In this paper, we present… ▽ More The integration of Transparent Displays (TD) in various applications, such as Heads-Up Displays (HUDs) in vehicles, is a burgeoning field, poised to revolutionize user experiences. However, this innovation brings forth significant challenges in realtime human-device interaction, particularly in accurately identifying and tracking a user's gaze on dynamically changing TDs. In this paper, we present a two-fold robust and efficient systematic solution for realtime gaze monitoring, comprised of: (1) a tree-based algorithm for identifying and dynamically tracking gaze targets (i.e., moving, size-changing, and overlap** 2D content) projected on a transparent display, in realtime; (2) a multi-stream self-attention architecture to estimate the depth-level of human gaze from eye tracking data, to account for the display's transparency and preventing undesired interactions with the TD. We collected a real-world eye-tracking dataset to train and test our gaze monitoring system. We present extensive results and ablation studies, including inference experiments on System on Chip (SoC) evaluation boards, demonstrating our model's scalability, precision, and realtime feasibility in both static and dynamic contexts. Our solution marks a significant stride in enhancing next-generation user-device interaction and experience, setting a new benchmark for algorithmic gaze monitoring technology in dynamic transparent displays. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.14739 [pdf, other]

Learning to Retrieve Iteratively for In-Context Learning

Authors: Yunmo Chen, Tongfei Chen, Harsh Jhamtani, Patrick Xia, Richard Shin, Jason Eisner, Benjamin Van Durme

Abstract: We introduce iterative retrieval, a novel framework that empowers retrievers to make iterative decisions through policy optimization. Finding an optimal portfolio of retrieved items is a combinatorial optimization problem, generally considered NP-hard. This approach provides a learned approximation to such a solution, meeting specific task requirements under a given family of large language models… ▽ More We introduce iterative retrieval, a novel framework that empowers retrievers to make iterative decisions through policy optimization. Finding an optimal portfolio of retrieved items is a combinatorial optimization problem, generally considered NP-hard. This approach provides a learned approximation to such a solution, meeting specific task requirements under a given family of large language models (LLMs). We propose a training procedure based on reinforcement learning, incorporating feedback from LLMs. We instantiate an iterative retriever for composing in-context learning (ICL) exemplars and apply it to various semantic parsing tasks that demand synthesized programs as outputs. By adding only 4M additional parameters for state encoding, we convert an off-the-shelf dense retriever into a stateful iterative retriever, outperforming previous methods in selecting ICL exemplars on semantic parsing datasets such as CalFlow, TreeDST, and MTOP. Additionally, the trained iterative retriever generalizes across different inference LLMs beyond the one used during training. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.11106 [pdf, other]

From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models

Authors: Harsh Nishant Lalai, Aashish Anantha Ramakrishnan, Raj Sanjay Shah, Dongwon Lee

Abstract: With the rapid growth of Large Language Models (LLMs), safeguarding textual content against unauthorized use is crucial. Text watermarking offers a vital solution, protecting both - LLM-generated and plain text sources. This paper presents a unified overview of different perspectives behind designing watermarking techniques, through a comprehensive survey of the research literature. Our work has t… ▽ More With the rapid growth of Large Language Models (LLMs), safeguarding textual content against unauthorized use is crucial. Text watermarking offers a vital solution, protecting both - LLM-generated and plain text sources. This paper presents a unified overview of different perspectives behind designing watermarking techniques, through a comprehensive survey of the research literature. Our work has two key advantages, (1) we analyze research based on the specific intentions behind different watermarking techniques, evaluation datasets used, watermarking addition, and removal methods to construct a cohesive taxonomy. (2) We highlight the gaps and open challenges in text watermarking to promote research in protecting text authorship. This extensive coverage and detailed analysis sets our work apart, offering valuable insights into the evolving landscape of text watermarking in language models. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10197 [pdf, other]

Crafting Parts for Expressive Object Composition

Authors: Harsh Rangwani, Aishwarya Agarwal, Kuldeep Kulkarni, R. Venkatesh Babu, Srikrishna Karanam

Abstract: Text-to-image generation from large generative models like Stable Diffusion, DALLE-2, etc., have become a common base for various tasks due to their superior quality and extensive knowledge bases. As image composition and generation are creative processes the artists need control over various parts of the images being generated. We find that just adding details about parts in the base text prompt… ▽ More Text-to-image generation from large generative models like Stable Diffusion, DALLE-2, etc., have become a common base for various tasks due to their superior quality and extensive knowledge bases. As image composition and generation are creative processes the artists need control over various parts of the images being generated. We find that just adding details about parts in the base text prompt either leads to an entirely different image (e.g., missing/incorrect identity) or the extra part details simply being ignored. To mitigate these issues, we introduce PartCraft, which enables image generation based on fine-grained part-level details specified for objects in the base text prompt. This allows more control for artists and enables novel object compositions by combining distinctive object parts. PartCraft first localizes object parts by denoising the object region from a specific diffusion process. This enables each part token to be localized to the right object region. After obtaining part masks, we run a localized diffusion process in each of the part regions based on fine-grained part descriptions and combine them to produce the final image. All the stages of PartCraft are based on repurposing a pre-trained diffusion model, which enables it to generalize across various domains without training. We demonstrate the effectiveness of part-level control provided by PartCraft qualitatively through visual examples and quantitatively in comparison to the contemporary baselines. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Project Page Will Be Here: https://rangwani-harsh.github.io/PartCraft

arXiv:2406.07904 [pdf, other]

Grounding Multimodal Large Language Models in Actions

Authors: Andrew Szot, Bogdan Mazoure, Harsh Agrawal, Devon Hjelm, Zsolt Kira, Alexander Toshev

Abstract: Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with the goal of leveraging the multimodal world knowledge of the MLLM. We first generalize a number of methods through a unified architecture and the lens… ▽ More Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with the goal of leveraging the multimodal world knowledge of the MLLM. We first generalize a number of methods through a unified architecture and the lens of action space adaptors. For continuous actions, we show that a learned tokenization allows for sufficient modeling precision, yielding the best performance on downstream tasks. For discrete actions, we demonstrate that semantically aligning these actions with the native output token space of the MLLM leads to the strongest performance. We arrive at these lessons via a thorough study of seven action space adapters on five different environments, encompassing over 114 embodied tasks. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07571 [pdf, other]

Supporting Self-Reflection at Scale with Large Language Models: Insights from Randomized Field Experiments in Classrooms

Authors: Harsh Kumar, Ruiwei Xiao, Benjamin Lawson, Ilya Musabirov, Jiakai Shi, Xinyuan Wang, Huayin Luo, Joseph Jay Williams, Anna Rafferty, John Stamper, Michael Liut

Abstract: Self-reflection on learning experiences constitutes a fundamental cognitive process, essential for the consolidation of knowledge and the enhancement of learning efficacy. However, traditional methods to facilitate reflection often face challenges in personalization, immediacy of feedback, engagement, and scalability. Integration of Large Language Models (LLMs) into the reflection process could mi… ▽ More Self-reflection on learning experiences constitutes a fundamental cognitive process, essential for the consolidation of knowledge and the enhancement of learning efficacy. However, traditional methods to facilitate reflection often face challenges in personalization, immediacy of feedback, engagement, and scalability. Integration of Large Language Models (LLMs) into the reflection process could mitigate these limitations. In this paper, we conducted two randomized field experiments in undergraduate computer science courses to investigate the potential of LLMs to help students engage in post-lesson reflection. In the first experiment (N=145), students completed a take-home assignment with the support of an LLM assistant; half of these students were then provided access to an LLM designed to facilitate self-reflection. The results indicated that the students assigned to LLM-guided reflection reported increased self-confidence and performed better on a subsequent exam two weeks later than their peers in the control condition. In the second experiment (N=112), we evaluated the impact of LLM-guided self-reflection against other scalable reflection methods, such as questionnaire-based activities and review of key lecture slides, after assignment. Our findings suggest that the students in the questionnaire and LLM-based reflection groups performed equally well and better than those who were only exposed to lecture slides, according to their scores on a proctored exam two weeks later on the same subject matter. These results underscore the utility of LLM-guided reflection and questionnaire-based activities in improving learning outcomes. Our work highlights that focusing solely on the accuracy of LLMs can overlook their potential to enhance metacognitive skills through practices such as self-reflection. We discuss the implications of our research for the Edtech community. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: Accepted at L@S'24

arXiv:2406.07250 [pdf, other]

Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

Authors: Tomoya Nishida, Noboru Harada, Daisuke Niizumi, Davide Albertini, Roberto Sannino, Simone Pradolini, Filippo Augusti, Keisuke Imoto, Kota Dohi, Harsh Purohit, Takashi Endo, Yohei Kawaguchi

Abstract: We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge Task 2: First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring. Continuing from last year's DCASE 2023 Challenge Task 2, we organize the task as a first-shot problem under domain generalization required settings. The main goal of the first-shot… ▽ More We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge Task 2: First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring. Continuing from last year's DCASE 2023 Challenge Task 2, we organize the task as a first-shot problem under domain generalization required settings. The main goal of the first-shot problem is to enable rapid deployment of ASD systems for new kinds of machines without the need for machine-specific hyperparameter tunings. This problem setting was realized by (1) giving only one section for each machine type and (2) having completely different machine types for the development and evaluation datasets. For the DCASE 2024 Challenge Task 2, data of completely new machine types were newly collected and provided as the evaluation dataset. In addition, attribute information such as the machine operation conditions were concealed for several machine types to mimic situations where such information are unavailable. We will add challenge results and analysis of the submissions after the challenge submission deadline. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: anomaly detection, acoustic condition monitoring, domain shift, first-shot problem, DCASE Challenge. arXiv admin note: text overlap with arXiv:2305.07828

arXiv:2406.06592 [pdf, other]

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Authors: Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi

Abstract: Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a leng… ▽ More Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a lengthy or multi-hop reasoning chain, where the intermediate outcomes are neither properly rewarded nor penalized. Process supervision addresses this limitation by assigning intermediate rewards during the reasoning process. To date, the methods used to collect process supervision data have relied on either human annotation or per-step Monte Carlo estimation, both prohibitively expensive to scale, thus hindering the broad application of this technique. In response to this challenge, we propose a novel divide-and-conquer style Monte Carlo Tree Search (MCTS) algorithm named \textit{OmegaPRM} for the efficient collection of high-quality process supervision data. This algorithm swiftly identifies the first error in the Chain of Thought (CoT) with binary search and balances the positive and negative examples, thereby ensuring both efficiency and quality. As a result, we are able to collect over 1.5 million process supervision annotations to train a Process Reward Model (PRM). Utilizing this fully automated process supervision alongside the weighted self-consistency algorithm, we have enhanced the instruction tuned Gemini Pro model's math reasoning performance, achieving a 69.4\% success rate on the MATH benchmark, a 36\% relative improvement from the 51\% base model performance. Additionally, the entire process operates without any human intervention, making our method both financially and computationally cost-effective compared to existing methods. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 18 pages, 5 figures, 1 table

arXiv:2406.06463 [pdf, other]

Galaxy lens reconstruction based on strongly lensed gravitational waves: similarity transformation degeneracy and mass-sheet degeneracy

Authors: Jason S. C. Poon, Stefano Rinaldi, Justin Janquart, Harsh Narola, Otto A. Hannuksela

Abstract: Gravitational wave (GW) galaxy lens reconstruction is a crucial step for many GW lensing science applications. However, dark siren GW lensing (i.e. lensed GW without observed electromagnetic (EM) counterpart) suffers from similarity transformation degeneracy and mass-sheet degeneracy. We review these two degeneracies and discuss their implications on GW-based lens reconstruction and two well-known… ▽ More Gravitational wave (GW) galaxy lens reconstruction is a crucial step for many GW lensing science applications. However, dark siren GW lensing (i.e. lensed GW without observed electromagnetic (EM) counterpart) suffers from similarity transformation degeneracy and mass-sheet degeneracy. We review these two degeneracies and discuss their implications on GW-based lens reconstruction and two well-known GW lensing science cases: the Hubble constant measurement and test for modified GW propagation. Building upon previous works, our conclusions are:1) GWs can only infer the scale-free lens mass model parameters, the dimensionless source position, the GW luminosity distance and the time delay scaling (a combination of Einstein radius, lens redshift, and cosmology).2) Lens reconstruction (of singular isothermal ellipsoid lens) with only two GW signals is unlikely to yield a complete lens model, while four (three) signals can measure all the above parameters accurately (with large uncertainties).3) The similarity transformation degeneracy causes the lens redshift/Einstein radius/cosmology to be degenerate in dark siren measurements. Breaking the degeneracy can be achieved by supplementing the GWs with EM observation of lens redshifts/Einstein radius (source redshift is not required).4) The mass-sheet degeneracy causes the GW luminosity distance to be entirely degenerate with a constant mass sheet.5) Contrary to expectation, the Hubble constant is degenerate with the mass-sheet even when supplemented with lens reconstruction/redshift/Einstein radius and can only be lifted with lens galaxy velocity dispersion measurement, while modified GW propagation test discussed in prior literature is unaffected by the degeneracy. These properties highlight the need for GW observations to be supplemented by EM observations, which could become accessible through a lens archival search or a rapid EM follow-up. △ Less

Submitted 14 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.02554 [pdf, other]

Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

Authors: Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian

Abstract: In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-rel… ▽ More In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-related behaviors. To facilitate this new research direction, we collected an audio-visual autism spectrum dataset (AV-ASD), currently the largest video dataset for autism screening using a behavioral approach. It covers an extensive range of autism-associated behaviors, including those related to social communication and interaction. To pave the way for further research on this new problem, we intensively explored leveraging foundation models and multimodal large language models across different modalities. Our experiments on the AV-ASD dataset demonstrate that integrating audio, visual, and speech modalities significantly enhances the performance in autism behavior recognition. Additionally, we explored the use of a post-hoc to ad-hoc pipeline in a multimodal large language model to investigate its potential to augment the model's explanatory capability during autism behavior recognition. We will release our dataset, code, and pre-trained models. △ Less

Submitted 22 March, 2024; originally announced June 2024.

arXiv:2406.02083 [pdf, other]

Simultaneous spectropolarimetric observations in the H$α$ and Ca II 8662 Å lines of an active region

Authors: Harsh Mathur, K. Nagaraju, Rahul Yadav, Jayant Joshi

Abstract: We present spectropolarimetric observations of an active region recorded simultaneously in the H$α$ Ca II 8662 Å lines. The sunspot exhibits multiple structures, including a lightbridge and a region where Ca II 8662 Å line core is in emission. Correspondingly, the H$α$ line core image displays brightening in the emission region, with the spectral profiles showing elevated line cores. The stratific… ▽ More We present spectropolarimetric observations of an active region recorded simultaneously in the H$α$ Ca II 8662 Å lines. The sunspot exhibits multiple structures, including a lightbridge and a region where Ca II 8662 Å line core is in emission. Correspondingly, the H$α$ line core image displays brightening in the emission region, with the spectral profiles showing elevated line cores. The stratification of the line-of-sight magnetic field is inferred through non-LTE multiline inversions of the Ca II 8662 Å line and the weak field approximation over the H$α$ line. The field strength inferred from the H$α$ line core is consistently smaller than that inferred from inversions at $\log τ_{500}$ = $-$4.5. However, the study finds no correlation between the WFA over the core of the H$α$ line and that inferred from inversions at $\log τ_{500}$ = $-$4.5. In regions exhibiting emission features, the morphology of the magnetic field at $\log τ_{500}$ = $-$4.5 resembles that at $\log τ_{500}$ = $-$1, with slightly higher or comparable field strengths. The magnetic field morphology inferred from the core of the H$α$ line is also similar to that inferred from the full spectral range of the H$α$ line in the emission region. The field strength inferred in the lightbridge at $\log τ_{500}$ = $-$1 is smaller than the surrounding umbral regions and comparable at $\log τ_{500}$ = $-$4.5. Similarly, the field strength inferred in the lightbridge from the WFA over the H$α$ line appears lower compared to the surrounding umbral regions. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: accepted to be published in ApJ on 4th June, 2024, 14 pages, 7 figures

arXiv:2406.00163 [pdf, other]

A Stochastic Incentive-based Demand Response Program for Virtual Power Plant with Solar, Battery, Electric Vehicles, and Controllable Loads

Authors: Pratik Harsh, Hongjian Sun, Debapriya Das, Goyal Awagan, **g Jiang

Abstract: The growing integration of distributed energy resources (DERs) into the power grid necessitates an effective coordination strategy to maximize their benefits. Acting as an aggregator of DERs, a virtual power plant (VPP) facilitates this coordination, thereby amplifying their impact on the transmission level of the power grid. Further, a demand response program enhances the scheduling approach by m… ▽ More The growing integration of distributed energy resources (DERs) into the power grid necessitates an effective coordination strategy to maximize their benefits. Acting as an aggregator of DERs, a virtual power plant (VPP) facilitates this coordination, thereby amplifying their impact on the transmission level of the power grid. Further, a demand response program enhances the scheduling approach by managing the energy demands in parallel with the uncertain energy outputs of the DERs. This work presents a stochastic incentive-based demand response model for the scheduling operation of VPP comprising solar-powered generating stations, battery swap** stations, electric vehicle charging stations, and consumers with controllable loads. The work also proposes a priority mechanism to consider the individual preferences of electric vehicle users and consumers with controllable loads. The scheduling approach for the VPP is framed as a multi-objective optimization problem, normalized using the utopia-tracking method. Subsequently, the normalized optimization problem is transformed into a stochastic formulation to address uncertainties in energy demand from charging stations and controllable loads. The proposed VPP scheduling approach is addressed on a 33-node distribution system simulated using MATLAB software, which is further validated using a real-time digital simulator. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 11 pages, 8 figures, submitted to IEEE Transactions on Industry Applications for potential publication

arXiv:2405.20485 [pdf, other]

Phantom: General Trigger Attacks on Retrieval Augmented Language Generation

Authors: Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A. Choquette-Choo, Milad Nasr, Cristina Nita-Rotaru, Alina Oprea

Abstract: Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs) in chatbot applications, enabling developers to adapt and personalize the LLM output without expensive training or fine-tuning. RAG systems use an external knowledge database to retrieve the most relevant documents for a given query, providing this context to the LLM generator. While RAG achieves i… ▽ More Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs) in chatbot applications, enabling developers to adapt and personalize the LLM output without expensive training or fine-tuning. RAG systems use an external knowledge database to retrieve the most relevant documents for a given query, providing this context to the LLM generator. While RAG achieves impressive utility in many applications, its adoption to enable personalized generative models introduces new security risks. In this work, we propose new attack surfaces for an adversary to compromise a victim's RAG system, by injecting a single malicious document in its knowledge database. We design Phantom, general two-step attack framework against RAG augmented LLMs. The first step involves crafting a poisoned document designed to be retrieved by the RAG system within the top-k results only when an adversarial trigger, a specific sequence of words acting as backdoor, is present in the victim's queries. In the second step, a specially crafted adversarial string within the poisoned document triggers various adversarial attacks in the LLM generator, including denial of service, reputation damage, privacy violations, and harmful behaviors. We demonstrate our attacks on multiple LLM architectures, including Gemma, Vicuna, and Llama. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.16021 [pdf, other]

VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

Authors: Michael Ahn, Montserrat Gonzalez Arenas, Matthew Bennice, Noah Brown, Christine Chan, Byron David, Anthony Francis, Gavin Gonzalez, Rainer Hessmer, Tomas Jackson, Nikhil J Joshi, Daniel Lam, Tsang-Wei Edward Lee, Alex Luong, Sharath Maddineni, Harsh Patel, Jodilyn Peralta, Jornell Quiambao, Diego Reyes, Rosario M Jauregui Ruano, Dorsa Sadigh, Pannag Sanketi, Leila Takayama, Pavel Vodenski, Fei Xia

Abstract: Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon ta… ▽ More Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon tasks with the help of humans or other robots. VADER leverages visual question answering (VQA) modules to detect visual affordances and recognize execution errors. It then generates prompts for a language model planner (LMP) which decides when to seek help from another robot or human to recover from errors in long-horizon task execution. We show the effectiveness of VADER with two long-horizon robotic tasks. Our pilot study showed that VADER is capable of performing complex long-horizon tasks by asking for help from another robot to clear a table. Our user study showed that VADER is capable of performing complex long-horizon tasks by asking for help from a human to clear a path. We gathered feedback from people (N=19) about the performance of the VADER performance vs. a robot that did not ask for help. https://google-vader.github.io/ △ Less

Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 9 pages, 4 figures

arXiv:2405.15682 [pdf, other]

The Road Less Scheduled

Authors: Aaron Defazio, Xingyu, Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

Abstract: Existing learning rate schedules that do not require specification of the optimization stop** step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stop** time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from c… ▽ More Existing learning rate schedules that do not require specification of the optimization stop** step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stop** time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available (https://github.com/facebookresearch/schedule_free). △ Less

Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.12403 [pdf, other]

Searching for gravitational wave optical counterparts with the Zwicky Transient Facility: summary of O4a

Authors: Tomás Ahumada, Shreya Anand, Michael W. Coughlin, Vaidehi Gupta, Mansi M. Kasliwal, Viraj R. Karambelkar, Robert D. Stein, Gaurav Waratkar, Vishwajeet Swain, Theophile Jegou du Laz, Akash Anumarlapudi, Igor Andreoni, Mattia Bulla, Gokul P. Srinivasaragavan, Andrew Toivonen, Avery Wold, Eric C. Bellm, S. Bradley Cenko, David L. Kaplan, Jesper Sollerman, Varun Bhalerao, Daniel Perley, Anirudh Salgundi, Aswin Suresh, K-Ryan Hinds , et al. (27 additional authors not shown)

Abstract: During the first half of the fourth observing run (O4a) of the International Gravitational Wave Network (IGWN), the Zwicky Transient Facility (ZTF) conducted a systematic search for kilonova (KN) counterparts to binary neutron star (BNS) and neutron star-black hole (NSBH) merger candidates. Here, we present a comprehensive study of the five high-significance (FAR < 1 per year) BNS and NSBH candida… ▽ More During the first half of the fourth observing run (O4a) of the International Gravitational Wave Network (IGWN), the Zwicky Transient Facility (ZTF) conducted a systematic search for kilonova (KN) counterparts to binary neutron star (BNS) and neutron star-black hole (NSBH) merger candidates. Here, we present a comprehensive study of the five high-significance (FAR < 1 per year) BNS and NSBH candidates in O4a. Our follow-up campaigns relied on both target-of-opportunity observations (ToO) and re-weighting of the nominal survey schedule to maximize coverage. We describe the toolkit we have been develo**, Fritz, an instance of SkyPortal, instrumental in coordinating and managing our telescope scheduling, candidate vetting, and follow-up observations through a user-friendly interface. ZTF covered a total of 2841 deg$^2$ within the skymaps of the high-significance GW events, reaching a median depth of g~20.2 mag. We circulated 15 candidates, but found no viable KN counterpart to any of the GW events. Based on the ZTF non-detections of the high-significance events in O4a, we used a Bayesian approach, nimbus, to quantify the posterior probability of KN model parameters that are consistent with our non-detections. Our analysis favors KNe with initial absolute magnitude fainter than -16 mag. The joint posterior probability of a GW170817-like KN associated with all our O4a follow-ups was 64%. Additionally, we use a survey simulation software, simsurvey, to determine that our combined filtered efficiency to detect a GW170817-like KN is 36%, when considering the 5 confirmed astrophysical events in O3 (1 BNS and 4 NSBH), along with our O4a follow-ups. Following Kasliwal et al. (2020), we derived joint constraints on the underlying KN luminosity function based on our O3 and O4a follow-ups, determining that no more than 76% of KNe fading at 1 mag/day can peak at a magnitude brighter than -17.5 mag. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: submitted

arXiv:2405.10777 [pdf, other]

de Sitter Teukolsky waves

Authors: Harsh, Sk Jahanur Hoque, Sitender Pratap Kashyap, Amitabh Virmani

Abstract: We present de Sitter Teukolsky waves -- linearised quadrupolar gravitational waves in the transverse-traceless gauge in de Sitter spacetime. In the cosmological constant $Λ$ going to zero limit, our solutions match to Teukolsky solutions. For non-zero $Λ$, we compare our solutions to the wider literature, where different authors have constructed linearised gravitational perturbations in de Sitter… ▽ More We present de Sitter Teukolsky waves -- linearised quadrupolar gravitational waves in the transverse-traceless gauge in de Sitter spacetime. In the cosmological constant $Λ$ going to zero limit, our solutions match to Teukolsky solutions. For non-zero $Λ$, we compare our solutions to the wider literature, where different authors have constructed linearised gravitational perturbations in de Sitter spacetime with varied motivations. For de Sitter Teukolsky waves, we compute the energy flux across future timelike infinity $\mathcal{I}^{+}$ and show that it is manifestly positive. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 45 pages, 2 figures

arXiv:2405.08879 [pdf, other]

A Diffused Background from Axion-like Particles in the Microwave Sky

Authors: Harsh Mehta, Suvodip Mukherjee

Abstract: The nature of dark matter is an unsolved cosmological problem and axions are one of the weakly interacting cold dark matter candidates. Axions or ALPs (Axion-like particles) are pseudo-scalar bosons predicted by beyond-standard model theories. The weak coupling of ALPs with photons leads to the conversion of CMB photons to ALPs in the presence of a transverse magnetic field. If they have the same… ▽ More The nature of dark matter is an unsolved cosmological problem and axions are one of the weakly interacting cold dark matter candidates. Axions or ALPs (Axion-like particles) are pseudo-scalar bosons predicted by beyond-standard model theories. The weak coupling of ALPs with photons leads to the conversion of CMB photons to ALPs in the presence of a transverse magnetic field. If they have the same mass as the effective mass of a photon in a plasma, the resonant conversion would cause a polarized spectral distortion leading to temperature fluctuations with the distortion spectrum. The probability of resonant conversion depends on the properties of the cluster such as the magnetic field, electron density, and its redshift. We show that this kind of conversion can happen in numerous unresolved galaxy clusters up to high redshifts, which will lead to a diffused polarised anisotropy signal in the microwave sky. The spectrum of the signal and its shape in the angular scale will be different from the lensed CMB polarization signal. This new polarised distortion spectrum will be correlated with the distribution of clusters in the universe and hence, with the large-scale structure. The spectrum can then be probed using its spectral and spatial variation with respect to the CMB and various foregrounds. An SNR of $\sim$ 4.36 and $\sim$ 93.87 are possible in the CMB-S4 145 GHz band and CMB-HD 150 GHz band respectively for a photon-ALPs coupling strength of $\mathrm{g_{a γ} = 10^{-12} \, GeV^{-1}}$ using galaxy clusters beyond redshift z $= 1$. The same signal would lead to additional RMS fluctuations of $\sim \mathrm{7.5 \times 10^{-2} \, μK}$ at 145 GHz. In the absence of any signal, future CMB experiments such as Simons Observatory (SO), CMB-S4, and CMB-HD can put constraints on coupling strength better than current bounds from particle physics experiment CERN Axion Solar Telescope (CAST). △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 33 pages, 20 figures, To be submitted to JCAP

arXiv:2405.08878 [pdf, other]

A power spectrum approach to search for Axion-like Particles from resolved galaxy clusters using CMB as a backlight

Authors: Harsh Mehta, Suvodip Mukherjee

Abstract: Axions or ALPs are hypothetical particles predicted by BSM theories, which make one of the dark matter candidates. These particles can convert into photons and vice-versa in the presence of magnetic field, with a probability decided by its coupling strength $\mathrm{g_{aγ}}$. One of the ways to detect these particles is using the CMB as a backlight. As the CMB photons pass through a galaxy cluster… ▽ More Axions or ALPs are hypothetical particles predicted by BSM theories, which make one of the dark matter candidates. These particles can convert into photons and vice-versa in the presence of magnetic field, with a probability decided by its coupling strength $\mathrm{g_{aγ}}$. One of the ways to detect these particles is using the CMB as a backlight. As the CMB photons pass through a galaxy cluster, they can get converted into ALPs in the mass range $10^{-15}$ eV to $10^{-11}$ eV through resonant conversion in the presence of cluster magnetic fields. This leads to a polarized spectral distortion ($α$-distortion) in the CMB as the photon polarization parallel to the magnetic field in the galaxy cluster is involved in the conversion. The fluctuations in the magnetic field and electron density in a galaxy cluster lead to spatially varying $α$-distortion around the cluster, with a power spectrum that is different from the lensed CMB polarization power spectrum for the standard model of cosmology. By measuring the difference in the polarization power spectrum around a galaxy cluster from the all-sky signal, one can find new $α$-distortion in the sky. For galaxy clusters resolvable in multiple EM bands, one can measure the coupling strength $\mathrm{g_{aγ}}$ from the ALP power spectrum. Using multi-frequency techniques like ILC to clean the foregrounds, we show that the new power spectrum-based approach of the resolved galaxy clusters from upcoming CMB experiments such as Simons Observatory and CMB-S4 can detect (or put constraints) on the ALP-photon coupling strength of $\mathrm{g_{aγ} < 5.24 \times 10^{-12} \, GeV^{-1}}$ and $\mathrm{g_{aγ} < 3.61 \times 10^{-12} \, GeV^{-1}}$ at 95\% C.I. respectively for ALPs of masses $10^{-13}$ eV or for smaller $\mathrm{g_{aγ}}$ for lighter ALP masses (Abridged). △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 31 pages, 17 figures, To be submitted to JCAP

arXiv:2405.06835 [pdf, other]

Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs

Authors: Harsh Patel, Buvaneswari A. Ramanan, Manzoor A. Khan, Thomas Williams, Brian Friedman, Lawrence Drabeck

Abstract: This paper explores the possibilities of the current generation of Large Language Models for incorporating Machine Learning Operations (MLOps) functionalities into ML training code bases. We evaluate the performance of OpenAI (gpt-3.5-turbo) and WizardCoder (open-source, 15B parameters) models on the automated accomplishment of various MLOps functionalities in different settings. We perform a benc… ▽ More This paper explores the possibilities of the current generation of Large Language Models for incorporating Machine Learning Operations (MLOps) functionalities into ML training code bases. We evaluate the performance of OpenAI (gpt-3.5-turbo) and WizardCoder (open-source, 15B parameters) models on the automated accomplishment of various MLOps functionalities in different settings. We perform a benchmarking study that assesses the ability of these models to: (1) adapt existing code samples (Inlining) with component-specific MLOps functionality such as MLflow and Weights & Biases for experiment tracking, Optuna for hyperparameter optimization etc., and (2) perform the task of Translation from one component of an MLOps functionality to another, e.g., translating existing GitPython library based version control code to Data Version Control library based. We also propose three different approaches that involve teaching LLMs to comprehend the API documentation of the components as a reference while accomplishing the Translation tasks. In our evaluations, the gpt-3.5-turbo model significantly outperforms WizardCoder by achieving impressive Pass@3 accuracy in model optimization (55% compared to 0% by WizardCoder), experiment tracking (100%, compared to 62.5% by WizardCoder), model registration (92% compared to 42% by WizardCoder) and hyperparameter optimization (83% compared to 58% by WizardCoder) on average, in their best possible settings, showcasing its superior code adaptability performance in complex MLOps tasks. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: The work was completed during 2Q, 3Q of Year 2023, when WizardCoder was the top performing Open source LLM for coding. Newer and better models have emerged since then. The processes and methodologies utilized for this benchmarking can still be utilized for evaluating the current SoTA models

arXiv:2405.00379 [pdf, other]

Planar Hall Effect in Quasi-Two-Dimensional Materials

Authors: Koushik Ghorai, Sunit Das, Harsh Varshney, Amit Agarwal

Abstract: The planar Hall effect in 3D systems is an effective probe for their Berry curvature, topology, and electronic properties. However, the Berry curvature-induced conventional planar Hall effect is forbidden in 2D systems as the out-of-plane Berry curvature cannot couple to the band velocity of the electrons moving in the 2D plane. Here, we demonstrate a unique 2D planar Hall effect (2DPHE) originati… ▽ More The planar Hall effect in 3D systems is an effective probe for their Berry curvature, topology, and electronic properties. However, the Berry curvature-induced conventional planar Hall effect is forbidden in 2D systems as the out-of-plane Berry curvature cannot couple to the band velocity of the electrons moving in the 2D plane. Here, we demonstrate a unique 2D planar Hall effect (2DPHE) originating from the hidden planar components of the Berry curvature and orbital magnetic moment in quasi-2D materials. We identify all planar band geometric contributions to 2DPHE and classify their crystalline symmetry restrictions. Using gated bilayer graphene as an example, we show that in addition to capturing the hidden band geometric effects, 2DPHE is also sensitive to the Lifshitz transitions. Our work motivates further exploration of hidden planar band geometry-induced 2DPHE and related transport phenomena for innovative applications. △ Less

Submitted 1 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: Minor changes, Supplementary file included

arXiv:2404.18546 [pdf, other]

ir_explain: a Python Library of Explainable IR Methods

Authors: Sourav Saha, Harsh Agarwal, Swastik Mohanty, Mandar Mitra, Debapriyo Majumdar

Abstract: While recent advancements in Neural Ranking Models have resulted in significant improvements over traditional statistical retrieval models, it is generally acknowledged that the use of large neural architectures and the application of complex language models in Information Retrieval (IR) have reduced the transparency of retrieval methods. Consequently, Explainability and Interpretability have emer… ▽ More While recent advancements in Neural Ranking Models have resulted in significant improvements over traditional statistical retrieval models, it is generally acknowledged that the use of large neural architectures and the application of complex language models in Information Retrieval (IR) have reduced the transparency of retrieval methods. Consequently, Explainability and Interpretability have emerged as important research topics in IR. Several axiomatic and post-hoc explanation methods, as well as approaches that attempt to be interpretable-by-design, have been proposed. This article presents \irexplain, an open-source Python library that implements a variety of well-known techniques for Explainable IR (ExIR) within a common, extensible framework. \irexplain supports the three standard categories of post-hoc explanations, namely pointwise, pairwise, and listwise explanations. The library is designed to make it easy to reproduce state-of-the-art ExIR baselines on standard test collections, as well as to explore new approaches to explaining IR models and methods. To facilitate adoption, \irexplain is well-integrated with widely-used toolkits such as Pyserini and \irdatasets. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18160 [pdf, ps, other]

Quantum $U$-channels on $S$-spaces

Authors: Priyabrata Bag, Azad Rohilla, Harsh Trivedi

Abstract: If the symmetry, (an operator $J$ satisfying $J=J^*=J^{-1}$) which defines the Krein space, is replaced by a (not necessarily self-adjoint) unitary, then we have the notion of an $S$-space which was introduced by Szafraniec. In this paper, we consider $S$-spaces and study the structure of completely $U$-positive maps between the algebras of bounded linear operators. We first give a Stinespring-typ… ▽ More If the symmetry, (an operator $J$ satisfying $J=J^*=J^{-1}$) which defines the Krein space, is replaced by a (not necessarily self-adjoint) unitary, then we have the notion of an $S$-space which was introduced by Szafraniec. In this paper, we consider $S$-spaces and study the structure of completely $U$-positive maps between the algebras of bounded linear operators. We first give a Stinespring-type representation for a completely $U$-positive map. On the other hand, we introduce Choi $U$-matrix of a linear map and establish the equivalence of the Kraus $U$-decompositions and Choi $U$-matrices. Then we study properties of nilpotent completely $U$-positive maps. We develop the $U$-PPT criterion for separability of quantum $U$-states and discuss the entanglement breaking condition of quantum $U$-channels and explore $U$-PPT squared conjecture. Finally, we give concrete examples of completely $U$-positive maps and examples of $3 \otimes 3$ quantum $U$-states which are $U$-entangled and $U$-separable. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 22 pages

MSC Class: 46E22; ~46L05; ~46L08; ~47B50; ~81T05

arXiv:2404.16588 [pdf, ps, other]

Proving Behavioural Apartness

Authors: Ruben Turkenburg, Harsh Beohar, Clemens Kupke, Jurriaan Rot

Abstract: Bisimilarity is a central notion for coalgebras. In recent work, Geuvers and Jacobs suggest to focus on apartness, which they define by dualising coalgebraic bisimulations. This yields the possibility of finite proofs of distinguishability for a wide variety of state-based systems. We propose behavioural apartness, defined by dualising behavioural equivalence rather than bisimulations. A motivat… ▽ More Bisimilarity is a central notion for coalgebras. In recent work, Geuvers and Jacobs suggest to focus on apartness, which they define by dualising coalgebraic bisimulations. This yields the possibility of finite proofs of distinguishability for a wide variety of state-based systems. We propose behavioural apartness, defined by dualising behavioural equivalence rather than bisimulations. A motivating example is the subdistribution functor, where the proof system based on bisimilarity requires an infinite quantification over couplings, whereas behavioural apartness instantiates to a finite rule. In addition, we provide optimised proof rules for behavioural apartness and show their use in several examples. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16217 [pdf, other]

Fault-Tolerant Bounded Flow Preservers

Authors: Shivam Bansal, Keerti Choudhary, Harkirat Dhanoa, Harsh Wardhan

Abstract: Given a directed graph $G = (V, E)$ with $n$ vertices, $m$ edges and a designated source vertex $s\in V$, we consider the question of finding a sparse subgraph $H$ of $G$ that preserves the flow from $s$ up to a given threshold $λ$ even after failure of $k$ edges. We refer to such subgraphs as $(λ,k)$-fault-tolerant bounded-flow-preserver ($(λ,k)$-FT-BFP). Formally, for any $F \subseteq E$ of at m… ▽ More Given a directed graph $G = (V, E)$ with $n$ vertices, $m$ edges and a designated source vertex $s\in V$, we consider the question of finding a sparse subgraph $H$ of $G$ that preserves the flow from $s$ up to a given threshold $λ$ even after failure of $k$ edges. We refer to such subgraphs as $(λ,k)$-fault-tolerant bounded-flow-preserver ($(λ,k)$-FT-BFP). Formally, for any $F \subseteq E$ of at most $k$ edges and any $v\in V$, the $(s, v)$-max-flow in $H \setminus F$ is equal to $(s, v)$-max-flow in $G \setminus F$, if the latter is bounded by $λ$, and at least $λ$ otherwise. Our contributions are summarized as follows: 1. We provide a polynomial time algorithm that given any graph $G$ constructs a $(λ,k)$-FT-BFP of $G$ with at most $λ2^kn$ edges. 2. We also prove a matching lower bound of $Ω(λ2^kn)$ on the size of $(λ,k)$-FT-BFP. In particular, we show that for every $λ,k,n\geq 1$, there exists an $n$-vertex directed graph whose optimal $(λ,k)$-FT-BFP contains $Ω(\min\{2^kλn,n^2\})$ edges. 3. Furthermore, we show that the problem of computing approximate $(λ,k)$-FT-BFP is NP-hard for any approximation ratio that is better than $O(\log(λ^{-1} n))$. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 12 pages, 2 figures

arXiv:2404.10231 [pdf, ps, other]

doi 10.1109/LRA.2024.3391026

Improving Disturbance Estimation and Suppression via Learning among Systems with Mismatched Dynamics

Authors: Harsh Modi, Zhu Chen, Xiao Liang, Minghui Zheng

Abstract: Iterative learning control (ILC) is a method for reducing system tracking or estimation errors over multiple iterations by using information from past iterations. The disturbance observer (DOB) is used to estimate and mitigate disturbances within the system, while the system is being affected by them. ILC enhances system performance by introducing a feedforward signal in each iteration. However, i… ▽ More Iterative learning control (ILC) is a method for reducing system tracking or estimation errors over multiple iterations by using information from past iterations. The disturbance observer (DOB) is used to estimate and mitigate disturbances within the system, while the system is being affected by them. ILC enhances system performance by introducing a feedforward signal in each iteration. However, its effectiveness may diminish if the conditions change during the iterations. On the other hand, although DOB effectively mitigates the effects of new disturbances, it cannot entirely eliminate them as it operates reactively. Therefore, neither ILC nor DOB alone can ensure sufficient robustness in challenging scenarios. This study focuses on the simultaneous utilization of ILC and DOB to enhance system robustness. The proposed methodology specifically targets dynamically different linearized systems performing repetitive tasks. The systems share similar forms but differ in dynamics (e.g. sizes, masses, and controllers). Consequently, the design of learning filters must account for these differences in dynamics. To validate the approach, the study establishes a theoretical framework for designing learning filters in conjunction with DOB. The validity of the framework is then confirmed through numerical studies and experimental tests conducted on unmanned aerial vehicles (UAVs). Although UAVs are nonlinear systems, the study employs a linearized controller as they operate in proximity to the hover condition. A video introduction of this paper is available via this link: https://zh.engr.tamu.edu/wp-content/uploads/sites/310/2024/02/ILCDOB_v3f.mp4. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.07044 [pdf, ps, other]

On the Performance of IRS-Assisted SSK and RPM over Rician Fading Channels

Authors: Harsh Raj, Ugrasen Singh, B. R. Manoj

Abstract: This paper presents the index modulation, that is, the space-shift keying (SSK) and reflection phase modulation (RPM) schemes for intelligent reflecting surface (IRS)-assisted wireless network. IRS simultaneously reflects the incoming information signal from the base station and explicitly encodes the local information bits in the reflection phase shift of IRS elements. The phase shift of the IRS… ▽ More This paper presents the index modulation, that is, the space-shift keying (SSK) and reflection phase modulation (RPM) schemes for intelligent reflecting surface (IRS)-assisted wireless network. IRS simultaneously reflects the incoming information signal from the base station and explicitly encodes the local information bits in the reflection phase shift of IRS elements. The phase shift of the IRS elements is employed according to local data from the RPM constellation. A joint detection using a maximum-likelihood (ML) decoder is performed for the SSK and RPM symbols over a realistic fading scenario modeled as the Rician fading channel. The pairwise error probability over Rician fading channels is derived and utilized to determine the average bit error rate. In addition, the ergodic capacity of the presented system is derived. The derived analytical results are verified and are in exact agreement with Monte-Carlo simulations. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 5 pages, 3 figures, to be published in proceedings of IEEE 99th Vehicular Technology Conference (VTC) 2024

arXiv:2404.05040 [pdf, other]

doi 10.1016/j.cma.2024.116865

Lagrangian operator inference enhanced with structure-preserving machine learning for nonintrusive model reduction of mechanical systems

Authors: Harsh Sharma, David A. Najera-Flores, Michael D. Todd, Boris Kramer

Abstract: Complex mechanical systems often exhibit strongly nonlinear behavior due to the presence of nonlinearities in the energy dissipation mechanisms, material constitutive relationships, or geometric/connectivity mechanics. Numerical modeling of these systems leads to nonlinear full-order models that possess an underlying Lagrangian structure. This work proposes a Lagrangian operator inference method e… ▽ More Complex mechanical systems often exhibit strongly nonlinear behavior due to the presence of nonlinearities in the energy dissipation mechanisms, material constitutive relationships, or geometric/connectivity mechanics. Numerical modeling of these systems leads to nonlinear full-order models that possess an underlying Lagrangian structure. This work proposes a Lagrangian operator inference method enhanced with structure-preserving machine learning to learn nonlinear reduced-order models (ROMs) of nonlinear mechanical systems. This two-step approach first learns the best-fit linear Lagrangian ROM via Lagrangian operator inference and then presents a structure-preserving machine learning method to learn nonlinearities in the reduced space. The proposed approach can learn a structure-preserving nonlinear ROM purely from data, unlike the existing operator inference approaches that require knowledge about the mathematical form of nonlinear terms. From a machine learning perspective, it accelerates the training of the structure-preserving neural network by providing an informed prior, and it reduces the computational cost of the network training by operating on the reduced space. The method is first demonstrated on two simulated examples: a conservative nonlinear rod model and a two-dimensional nonlinear membrane with nonlinear internal dam**. Finally, the method is demonstrated on an experimental dataset consisting of digital image correlation measurements taken from a lap-joint beam structure from which a predictive model is learned that captures amplitude-dependent frequency and dam** characteristics accurately. The numerical results demonstrate that the proposed approach yields generalizable nonlinear ROMs that exhibit bounded energy error, capture the nonlinear characteristics reliably, and provide accurate long-time predictions outside the training data regime. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04237 [pdf, other]

Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents

Authors: Harsh Kohli, Huan Sun

Abstract: The rapid progress of large language models (LLMs) has seen them excel and frequently surpass human performance on standard benchmarks. This has enabled many downstream applications, such as LLM agents, to rely on their sophisticated reasoning to navigate complex task requirements. However, LLMs are known to unexpectedly falter in simple tasks and under seemingly straightforward circumstances - un… ▽ More The rapid progress of large language models (LLMs) has seen them excel and frequently surpass human performance on standard benchmarks. This has enabled many downstream applications, such as LLM agents, to rely on their sophisticated reasoning to navigate complex task requirements. However, LLMs are known to unexpectedly falter in simple tasks and under seemingly straightforward circumstances - underscoring the need for better and more diverse evaluation setups to measure their true capabilities. To this end, we choose to study compositional and conditional reasoning, two cornerstones of human cognition, and introduce GroundCocoa - a lexically diverse benchmark connecting these reasoning skills to the real-world problem of flight booking. Our task involves aligning detailed user preferences with available flight options presented in a multiple-choice format. Results indicate a significant disparity in performance among current state-of-the-art LLMs with even the best performing model, GPT-4 Turbo, not exceeding 67% accuracy despite advanced prompting techniques. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 18 pages, 17 figures, 3 tables. Paper under review

arXiv:2404.04221 [pdf, other]

How Lexical is Bilingual Lexicon Induction?

Authors: Harsh Kohli, Helian Feng, Nicholas Dronen, Calvin McCarter, Sina Moeini, Ali Kebarighotbi

Abstract: In contemporary machine learning approaches to bilingual lexicon induction (BLI), a model learns a map** between the embedding spaces of a language pair. Recently, retrieve-and-rank approach to BLI has achieved state of the art results on the task. However, the problem remains challenging in low-resource settings, due to the paucity of data. The task is complicated by factors such as lexical var… ▽ More In contemporary machine learning approaches to bilingual lexicon induction (BLI), a model learns a map** between the embedding spaces of a language pair. Recently, retrieve-and-rank approach to BLI has achieved state of the art results on the task. However, the problem remains challenging in low-resource settings, due to the paucity of data. The task is complicated by factors such as lexical variation across languages. We argue that the incorporation of additional lexical information into the recent retrieve-and-rank approach should improve lexicon induction. We demonstrate the efficacy of our proposed approach on XLING, improving over the previous state of the art by an average of 2\% across all language pairs. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 8 pages, 4 figures. Paper accepted at NAACL Findings 2024

arXiv:2404.02900 [pdf, other]

DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Authors: Harsh Rangwani, Pradipto Mondal, Mayank Mishra, Ashish Ramayee Asokan, R. Venkatesh Babu

Abstract: Vision Transformer (ViT) has emerged as a prominent architecture for various computer vision tasks. In ViT, we divide the input image into patch tokens and process them through a stack of self attention blocks. However, unlike Convolutional Neural Networks (CNN), ViTs simple architecture has no informative inductive bias (e.g., locality,etc. ). Due to this, ViT requires a large amount of data for… ▽ More Vision Transformer (ViT) has emerged as a prominent architecture for various computer vision tasks. In ViT, we divide the input image into patch tokens and process them through a stack of self attention blocks. However, unlike Convolutional Neural Networks (CNN), ViTs simple architecture has no informative inductive bias (e.g., locality,etc. ). Due to this, ViT requires a large amount of data for pre-training. Various data efficient approaches (DeiT) have been proposed to train ViT on balanced datasets effectively. However, limited literature discusses the use of ViT for datasets with long-tailed imbalances. In this work, we introduce DeiT-LT to tackle the problem of training ViTs from scratch on long-tailed datasets. In DeiT-LT, we introduce an efficient and effective way of distillation from CNN via distillation DIST token by using out-of-distribution images and re-weighting the distillation loss to enhance focus on tail classes. This leads to the learning of local CNN-like features in early ViT blocks, improving generalization for tail classes. Further, to mitigate overfitting, we propose distilling from a flat CNN teacher, which leads to learning low-rank generalizable features for DIST tokens across all ViT blocks. With the proposed DeiT-LT scheme, the distillation DIST token becomes an expert on the tail classes, and the classifier CLS token becomes an expert on the head classes. The experts help to effectively learn features corresponding to both the majority and minority classes using a distinct set of tokens within the same ViT architecture. We show the effectiveness of DeiT-LT for training ViT from scratch on datasets ranging from small-scale CIFAR-10 LT to large-scale iNaturalist-2018. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: CVPR 2024. Project Page: https://rangwani-harsh.github.io/DeiT-LT

arXiv:2404.00880 [pdf, other]

Rethinking the Relationship between Recurrent and Non-Recurrent Neural Networks: A Study in Sparsity

Authors: Quincy Hershey, Randy Paffenroth, Harsh Pathak, Simon Tavener

Abstract: Neural networks (NN) can be divided into two broad categories, recurrent and non-recurrent. Both types of neural networks are popular and extensively studied, but they are often treated as distinct families of machine learning algorithms. In this position paper, we argue that there is a closer relationship between these two types of neural networks than is normally appreciated. We show that many c… ▽ More Neural networks (NN) can be divided into two broad categories, recurrent and non-recurrent. Both types of neural networks are popular and extensively studied, but they are often treated as distinct families of machine learning algorithms. In this position paper, we argue that there is a closer relationship between these two types of neural networks than is normally appreciated. We show that many common neural network models, such as Recurrent Neural Networks (RNN), Multi-Layer Perceptrons (MLP), and even deep multi-layer transformers, can all be represented as iterative maps. The close relationship between RNNs and other types of NNs should not be surprising. In particular, RNNs are known to be Turing complete, and therefore capable of representing any computable function (such as any other types of NNs), but herein we argue that the relationship runs deeper and is more practical than this. For example, RNNs are often thought to be more difficult to train than other types of NNs, with RNNs being plagued by issues such as vanishing or exploding gradients. However, as we demonstrate in this paper, MLPs, RNNs, and many other NNs lie on a continuum, and this perspective leads to several insights that illuminate both theoretical and practical aspects of NNs. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.20120 [pdf, ps, other]

Privacy-Preserving Data Aggregation Techniques for Enhanced Efficiency and Security in Wireless Sensor Networks: A Comprehensive Analysis and Evaluation

Authors: Ayush Rastogi, Harsh Rastogi, Yash Rastogi, Divyansh Dubey

Abstract: In this paper, we present a multidimensional, highly effective method for aggregating data for wireless sensor networks while maintaining privacy. The suggested system is resistant to data loss and secure against both active and passive privacy compromising attacks, such as the coalition attack from a rogue base station and kidnapped sensor nodes. With regard to cluster size, it achieves consisten… ▽ More In this paper, we present a multidimensional, highly effective method for aggregating data for wireless sensor networks while maintaining privacy. The suggested system is resistant to data loss and secure against both active and passive privacy compromising attacks, such as the coalition attack from a rogue base station and kidnapped sensor nodes. With regard to cluster size, it achieves consistent communication overhead, which is helpful in large-scale WSNs. Due to its constant size communication overhead, the suggested strategy outperforms the previous privacy-preserving data aggregation scheme not only in terms of privacy preservation but also in terms of communication complexity and energy costs. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 4 pages

arXiv:2403.19073 [pdf]

Dataflow-Aware PIM-Enabled Manycore Architecture for Deep Learning Workloads

Authors: Harsh Sharma, Gaurav Narang, Janardhan Rao Doppa, Umit Ogras, Partha Pratim Pande

Abstract: Processing-in-memory (PIM) has emerged as an enabler for the energy-efficient and high-performance acceleration of deep learning (DL) workloads. Resistive random-access memory (ReRAM) is one of the most promising technologies to implement PIM. However, as the complexity of Deep convolutional neural networks (DNNs) grows, we need to design a manycore architecture with multiple ReRAM-based processin… ▽ More Processing-in-memory (PIM) has emerged as an enabler for the energy-efficient and high-performance acceleration of deep learning (DL) workloads. Resistive random-access memory (ReRAM) is one of the most promising technologies to implement PIM. However, as the complexity of Deep convolutional neural networks (DNNs) grows, we need to design a manycore architecture with multiple ReRAM-based processing elements (PEs) on a single chip. Existing PIM-based architectures mostly focus on computation while ignoring the role of communication. ReRAM-based tiled manycore architectures often involve many Processing Elements (PEs), which need to be interconnected via an efficient on-chip communication infrastructure. Simply allocating more resources (ReRAMs) to speed up only computation is ineffective if the communication infrastructure cannot keep up with it. In this paper, we highlight the design principles of a dataflow-aware PIM-enabled manycore platform tailor-made for various types of DL workloads. We consider the design challenges with both 2.5D interposer- and 3D integration-enabled architectures. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Presented at DATE Conference, Valencia, Spain 2024

arXiv:2403.18958 [pdf, other]

A State-of-the-practice Release-readiness Checklist for Generative AI-based Software Products

Authors: Harsh Patel, Dominique Boucher, Emad Fallahzadeh, Ahmed E. Hassan, Bram Adams

Abstract: This paper investigates the complexities of integrating Large Language Models (LLMs) into software products, with a focus on the challenges encountered for determining their readiness for release. Our systematic review of grey literature identifies common challenges in deploying LLMs, ranging from pre-training and fine-tuning to user experience considerations. The study introduces a comprehensive… ▽ More This paper investigates the complexities of integrating Large Language Models (LLMs) into software products, with a focus on the challenges encountered for determining their readiness for release. Our systematic review of grey literature identifies common challenges in deploying LLMs, ranging from pre-training and fine-tuning to user experience considerations. The study introduces a comprehensive checklist designed to guide practitioners in evaluating key release readiness aspects such as performance, monitoring, and deployment strategies, aiming to enhance the reliability and effectiveness of LLM-based applications in real-world settings. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18329 [pdf, ps, other]

Bulk condensation by an active interface

Authors: Raushan Kant, Rahul Kumar Gupta, Harsh Soni, A K Sood, Sriram Ramaswamy

Abstract: We present experiments, supported by mechanically detailed simulations, establishing bulk vapor-liquid condensation of a hard-bead fluid by a tiny population of orientable motile grains that self-assembles into a moving polarized monolayer. In a quasi-1D geometry two such layers, oppositely aligned, immobilize the condensed non-motile component. We account for our observations through a continuum… ▽ More We present experiments, supported by mechanically detailed simulations, establishing bulk vapor-liquid condensation of a hard-bead fluid by a tiny population of orientable motile grains that self-assembles into a moving polarized monolayer. In a quasi-1D geometry two such layers, oppositely aligned, immobilize the condensed non-motile component. We account for our observations through a continuum theory with a naturally non-reciprocal Cahn-Hilliard structure, whose predicted trends as a function of packing fraction are consistent with our observations. △ Less

Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: PDF repaired. References added; minor text changes. This article consists of thirteen pages and seven figures. For supplementary videos, please click on the link below: https://drive.google.com/drive/folders/19KhQcbNDJJTQsuF9Tr1apoEXon4UFyZv

arXiv:2403.18301 [pdf, other]

Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives

Authors: Shrinivas Ramasubramanian, Harsh Rangwani, Sho Takemori, Kunal Samanta, Yuhei Umeda, Venkatesh Babu Radhakrishnan

Abstract: The rise in internet usage has led to the generation of massive amounts of data, resulting in the adoption of various supervised and semi-supervised machine learning algorithms, which can effectively utilize the colossal amount of data to train models. However, before deploying these models in the real world, these must be strictly evaluated on performance measures like worst-case recall and satis… ▽ More The rise in internet usage has led to the generation of massive amounts of data, resulting in the adoption of various supervised and semi-supervised machine learning algorithms, which can effectively utilize the colossal amount of data to train models. However, before deploying these models in the real world, these must be strictly evaluated on performance measures like worst-case recall and satisfy constraints such as fairness. We find that current state-of-the-art empirical techniques offer sub-optimal performance on these practical, non-decomposable performance objectives. On the other hand, the theoretical techniques necessitate training a new model from scratch for each performance objective. To bridge the gap, we propose SelMix, a selective mixup-based inexpensive fine-tuning technique for pre-trained models, to optimize for the desired objective. The core idea of our framework is to determine a sampling distribution to perform a mixup of features between samples from particular classes such that it optimizes the given objective. We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. We find that proposed SelMix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: ICLR 2024 SpotLight

arXiv:2403.16532 [pdf, other]

Uncovering faint lensed gravitational-wave signals and reprioritizing their follow-up analysis using galaxy lensing forecasts with detected counterparts

Authors: Leo C. Y. Ng, Justin Janquart, Hemantakumar Phurailatpam, Harsh Narola, Jason S. C. Poon, Chris Van Den Broeck, Otto A. Hannuksela

Abstract: Like light, gravitational waves can be gravitationally lensed by massive astrophysical objects. For galaxy and galaxy-cluster lenses, one expects to see strong lensing -- forecasted to become observable in the coming years -- where the original wave is split into multiple copies with the same frequency evolution but different overall arrival times, phases, amplitudes, and signal strengths. Some of… ▽ More Like light, gravitational waves can be gravitationally lensed by massive astrophysical objects. For galaxy and galaxy-cluster lenses, one expects to see strong lensing -- forecasted to become observable in the coming years -- where the original wave is split into multiple copies with the same frequency evolution but different overall arrival times, phases, amplitudes, and signal strengths. Some of these images can be below the detection threshold and require targeted search methods, based on tailor-made template banks. These searches can be made more sensitive by using our knowledge of the typical distribution and morphology of lenses to predict the time delay, magnification, and image-type ordering of the lensed images. Here, we show that when a subset of the images is super-threshold, they can be used to construct a more constrained prediction of the arrival time of the remaining signals, enhancing our ability to identify lensing candidate signals. Our suggested method effectively reduces the list of triggers requiring follow-up and generally re-ranks the genuine counterpart higher in the lensing candidate list. Therefore, in the future, if one observes two or three lensed images, the information they provide can be used to identify their sub-threshold counterparts, thus allowing identification of additional lensed images. Finding such images would also strengthen our evidence for the event being lensed. △ Less

Submitted 5 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: 15 pages, 28 figures, 1 table

arXiv:2403.14839 [pdf, other]

Hyperspectral Neural Radiance Fields

Authors: Gerry Chen, Sunil Kumar Narayanan, Thomas Gautier Ottou, Benjamin Missaoui, Harsh Muriki, Cédric Pradalier, Yongsheng Chen

Abstract: Hyperspectral Imagery (HSI) has been used in many applications to non-destructively determine the material and/or chemical compositions of samples. There is growing interest in creating 3D hyperspectral reconstructions, which could provide both spatial and spectral information while also mitigating common HSI challenges such as non-Lambertian surfaces and translucent objects. However, traditional… ▽ More Hyperspectral Imagery (HSI) has been used in many applications to non-destructively determine the material and/or chemical compositions of samples. There is growing interest in creating 3D hyperspectral reconstructions, which could provide both spatial and spectral information while also mitigating common HSI challenges such as non-Lambertian surfaces and translucent objects. However, traditional 3D reconstruction with HSI is difficult due to technological limitations of hyperspectral cameras. In recent years, Neural Radiance Fields (NeRFs) have seen widespread success in creating high quality volumetric 3D representations of scenes captured by a variety of camera models. Leveraging recent advances in NeRFs, we propose computing a hyperspectral 3D reconstruction in which every point in space and view direction is characterized by wavelength-dependent radiance and transmittance spectra. To evaluate our approach, a dataset containing nearly 2000 hyperspectral images across 8 scenes and 2 cameras was collected. We perform comparisons against traditional RGB NeRF baselines and apply ablation testing with alternative spectra representations. Finally, we demonstrate the potential of hyperspectral NeRFs for hyperspectral super-resolution and imaging sensor simulation. We show that our hyperspectral NeRF approach enables creating fast, accurate volumetric 3D hyperspectral scenes and enables several new applications and areas for future study. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Main paper: 15 pages + 2 pages references. Supplemental/Appendix: 6 pages

arXiv:2403.11332 [pdf, other]

Graph Machine Learning based Doubly Robust Estimator for Network Causal Effects

Authors: Seyedeh Baharan Khatami, Harsh Parikh, Haowei Chen, Sudeepa Roy, Babak Salimi

Abstract: We address the challenge of inferring causal effects in social network data. This results in challenges due to interference -- where a unit's outcome is affected by neighbors' treatments -- and network-induced confounding factors. While there is extensive literature focusing on estimating causal effects in social network setups, a majority of them make prior assumptions about the form of network-i… ▽ More We address the challenge of inferring causal effects in social network data. This results in challenges due to interference -- where a unit's outcome is affected by neighbors' treatments -- and network-induced confounding factors. While there is extensive literature focusing on estimating causal effects in social network setups, a majority of them make prior assumptions about the form of network-induced confounding mechanisms. Such strong assumptions are rarely likely to hold especially in high-dimensional networks. We propose a novel methodology that combines graph machine learning approaches with the double machine learning framework to enable accurate and efficient estimation of direct and peer effects using a single observational social network. We demonstrate the semiparametric efficiency of our proposed estimator under mild regularity conditions, allowing for consistent uncertainty quantification. We demonstrate that our method is accurate, robust, and scalable via an extensive simulation study. We use our method to investigate the impact of Self-Help Group participation on financial risk tolerance. △ Less

Submitted 31 May, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.11021 [pdf, other]

Neuro-Symbolic Video Search

Authors: Minkyu Choi, Harsh Goel, Mohammad Omama, Yunhao Yang, Sahil Shah, Sandeep Chinchali

Abstract: The unprecedented surge in video data production in recent years necessitates efficient tools to extract meaningful frames from videos for downstream tasks. Long-term temporal reasoning is a key desideratum for frame retrieval systems. While state-of-the-art foundation models, like VideoLLaMA and ViCLIP, are proficient in short-term semantic understanding, they surprisingly fail at long-term reaso… ▽ More The unprecedented surge in video data production in recent years necessitates efficient tools to extract meaningful frames from videos for downstream tasks. Long-term temporal reasoning is a key desideratum for frame retrieval systems. While state-of-the-art foundation models, like VideoLLaMA and ViCLIP, are proficient in short-term semantic understanding, they surprisingly fail at long-term reasoning across frames. A key reason for this failure is that they intertwine per-frame perception and temporal reasoning into a single deep network. Hence, decoupling but co-designing semantic understanding and temporal reasoning is essential for efficient scene identification. We propose a system that leverages vision-language models for semantic understanding of individual frames but effectively reasons about the long-term evolution of events using state machines and temporal logic (TL) formulae that inherently capture memory. Our TL-based reasoning improves the F1 score of complex event identification by 9-15% compared to benchmarks that use GPT4 for reasoning on state-of-the-art self-driving datasets such as Waymo and NuScenes. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.08418 [pdf, ps, other]

doi 10.1007/s43036-024-00359-0

Powers and roots of partial isometric covariant representations

Authors: Dimple Saini, Harsh Trivedi, Shankar Veerabathiran

Abstract: Isometric covariant representations play an important role in the study of Cuntz-Pimsner algebras. In this article, we study partial isometric covariant representations and explore under what conditions powers and roots of partial isometric covariant representations are also partial isometric covariant representations. Isometric covariant representations play an important role in the study of Cuntz-Pimsner algebras. In this article, we study partial isometric covariant representations and explore under what conditions powers and roots of partial isometric covariant representations are also partial isometric covariant representations. △ Less

Submitted 17 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

MSC Class: 46L08; 47A65; 47L55; 47L80

Journal ref: Adv. Oper. Theory 9, 61 (2024)

arXiv:2403.08007 [pdf, other]

doi 10.1007/978-3-031-41498-5_17

IndicSTR12: A Dataset for Indic Scene Text Recognition

Authors: Harsh Lunia, Ajoy Mondal, C V Jawahar

Abstract: The importance of Scene Text Recognition (STR) in today's increasingly digital world cannot be overstated. Given the significance of STR, data intensive deep learning approaches that auto-learn feature map**s have primarily driven the development of STR solutions. Several benchmark datasets and substantial work on deep learning models are available for Latin languages to meet this need. On more… ▽ More The importance of Scene Text Recognition (STR) in today's increasingly digital world cannot be overstated. Given the significance of STR, data intensive deep learning approaches that auto-learn feature map**s have primarily driven the development of STR solutions. Several benchmark datasets and substantial work on deep learning models are available for Latin languages to meet this need. On more complex, syntactically and semantically, Indian languages spoken and read by 1.3 billion people, there is less work and datasets available. This paper aims to address the Indian space's lack of a comprehensive dataset by proposing the largest and most comprehensive real dataset - IndicSTR12 - and benchmarking STR performance on 12 major Indian languages. A few works have addressed the same issue, but to the best of our knowledge, they focused on a small number of Indian languages. The size and complexity of the proposed dataset are comparable to those of existing Latin contemporaries, while its multilingualism will catalyse the development of robust text detection and recognition models. It was created specifically for a group of related languages with different scripts. The dataset contains over 27000 word-images gathered from various natural scenes, with over 1000 word-images for each language. Unlike previous datasets, the images cover a broader range of realistic conditions, including blur, illumination changes, occlusion, non-iconic texts, low resolution, perspective text etc. Along with the new dataset, we provide a high-performing baseline on three models - PARSeq, CRNN, and STARNet. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Journal ref: ICDAR 2023 Workshops. Lecture Notes in Computer Science, vol 14193. Springer, Cham (2023)

Showing 1–50 of 673 results for author: Harsh