Search | arXiv e-print repository

Erdős-Rogers functions for arbitrary pairs of graphs

Authors: Dhruv Mubayi, Jacques Verstraete

Abstract: Let $f_{F,G}(n)$ be the largest size of an induced $F$-free subgraph that every $n$-vertex $G$-free graph is guaranteed to contain. We prove that for any triangle-free graph $F$, \[ f_{F,K_3}(n) = f_{K_2,K_3}(n)^{1 + o(1)} = n^{\frac{1}{2} + o(1)}.\] Along the way we give a slight improvement of a construction of Erd\H os-Frankl-Rödl for the Brown-Erd\H os-Sós $(3r-3,3)$-problem when $r$ is large.… ▽ More Let $f_{F,G}(n)$ be the largest size of an induced $F$-free subgraph that every $n$-vertex $G$-free graph is guaranteed to contain. We prove that for any triangle-free graph $F$, \[ f_{F,K_3}(n) = f_{K_2,K_3}(n)^{1 + o(1)} = n^{\frac{1}{2} + o(1)}.\] Along the way we give a slight improvement of a construction of Erd\H os-Frankl-Rödl for the Brown-Erd\H os-Sós $(3r-3,3)$-problem when $r$ is large. In contrast to our result for $K_3$, for any $K_4$-free graph $F$ containing a cycle, we prove there exists $c_F > 0$ such that $$f_{F,K_4}(n) > f_{K_2,K_4}(n)^{1 + c_F} = n^{\frac{1}{3}+c_F+o(1)}.$$ \iffalse We also observe that our earlier proof for $F=K_3$ generalizes to $f_{F,K_4}(n) = O(\sqrt{n}\log n)$ for all $F$ containing a cycle. \fi For every graph $G$, we prove that there exists $\varepsilon_G >0$ such that whenever $F$ is a non-empty graph such that $G$ is not contained in any blowup of $F$, then $f_{F,G}(n) = O(n^{1-\varepsilon_G})$. On the other hand, for graph $G$ that is not a clique, and every $\varepsilon>0$, we exhibit a $G$-free graph $F$ such that $f_{F,G}(n) = Ω(n^{1-\varepsilon})$. △ Less

Submitted 3 July, 2024; originally announced July 2024.

MSC Class: 05C55; 05D10

arXiv:2407.02598 [pdf, other]

AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction

Authors: Mustafa Khan, Hamidreza Fazlali, Dhruv Sharma, Tongtong Cao, Dongfeng Bai, Yuan Ren, Bingbing Liu

Abstract: Realistic scene reconstruction and view synthesis are essential for advancing autonomous driving systems by simulating safety-critical scenarios. 3D Gaussian Splatting excels in real-time rendering and static scene reconstructions but struggles with modeling driving scenarios due to complex backgrounds, dynamic objects, and sparse views. We propose AutoSplat, a framework employing Gaussian splatti… ▽ More Realistic scene reconstruction and view synthesis are essential for advancing autonomous driving systems by simulating safety-critical scenarios. 3D Gaussian Splatting excels in real-time rendering and static scene reconstructions but struggles with modeling driving scenarios due to complex backgrounds, dynamic objects, and sparse views. We propose AutoSplat, a framework employing Gaussian splatting to achieve highly realistic reconstructions of autonomous driving scenes. By imposing geometric constraints on Gaussians representing the road and sky regions, our method enables multi-view consistent simulation of challenging scenarios including lane changes. Leveraging 3D templates, we introduce a reflected Gaussian consistency constraint to supervise both the visible and unseen side of foreground objects. Moreover, to model the dynamic appearance of foreground objects, we estimate residual spherical harmonics for each foreground Gaussian. Extensive experiments on Pandaset and KITTI demonstrate that AutoSplat outperforms state-of-the-art methods in scene reconstruction and novel view synthesis across diverse driving scenarios. Visit our $\href{https://autosplat.github.io/}{\text{project page}}$. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01725 [pdf, other]

DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

Authors: Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark

Abstract: Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systemat… ▽ More Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systematically assess current model capabilities in discovery tasks and provide a useful resource for improving them. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering, by manually deriving discovery workflows from published papers to approximate the real-world challenges faced by researchers, where each task is defined by a dataset, its metadata, and a discovery goal in natural language. We additionally provide 903 synthetic tasks to conduct controlled evaluations across task complexity. Furthermore, our structured formalism of data-driven discovery enables a facet-based evaluation that provides useful insights into different failure modes. We evaluate several popular LLM-based reasoning frameworks using both open and closed LLMs as baselines on DiscoveryBench and find that even the best system scores only 25%. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Website: https://github.com/allenai/discoverybench

arXiv:2407.00677 [pdf, ps, other]

Combinatorial Multi-Access Coded Caching with Private Caches

Authors: Dhruv Pratap Singh, Anjana A. Mahesh, B. Sundar Rajan

Abstract: We consider a variant of the coded caching problem where users connect to two types of caches, called private and access caches. The problem setting consists of a server with a library of files and a set of access caches. Each user, equipped with a private cache, connects to a distinct $r-$subset of the access caches. The server populates both types of caches with files in uncoded format. For this… ▽ More We consider a variant of the coded caching problem where users connect to two types of caches, called private and access caches. The problem setting consists of a server with a library of files and a set of access caches. Each user, equipped with a private cache, connects to a distinct $r-$subset of the access caches. The server populates both types of caches with files in uncoded format. For this setting, we provide an achievable scheme and derive a lower bound on the number of transmissions for this scheme. We also present a lower and upper bound for the optimal worst-case rate under uncoded placement for this setting using the rates of the Maddah-Ali--Niesen scheme for dedicated and combinatorial multi-access coded caching settings, respectively. Further, we derive a lower bound on the optimal worst-case rate for any general placement policy using cut-set arguments. We also provide numerical plots comparing the rate of the proposed achievability scheme with the above bounds, from which it can be observed that the proposed scheme approaches the lower bound when the amount of memory accessed by a user is large. Finally, we discuss the optimality w.r.t worst-case rate when the system has four access caches. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 13 pages and 6 figures

arXiv:2407.00357 [pdf, other]

qLUE: A Quantum Clustering Algorithm for Multi- Dimensional Datasets

Authors: Dhruv Gopalakrishnan, Luca Dellantonio, Antonio Di Pilato, Wahid Redjeb, Felice Pantaleo, Michele Mosca

Abstract: Clustering algorithms are at the basis of several technological applications, and are fueling the development of rapidly evolving fields such as machine learning. In the recent past, however, it has become apparent that they face challenges stemming from datasets that span more spatial dimensions. In fact, the best-performing clustering algorithms scale linearly in the number of points, but quadra… ▽ More Clustering algorithms are at the basis of several technological applications, and are fueling the development of rapidly evolving fields such as machine learning. In the recent past, however, it has become apparent that they face challenges stemming from datasets that span more spatial dimensions. In fact, the best-performing clustering algorithms scale linearly in the number of points, but quadratically with respect to the local density of points. In this work, we introduce qLUE, a quantum clustering algorithm that scales linearly in both the number of points and their density. qLUE is inspired by CLUE, an algorithm developed to address the challenging time and memory budgets of Event Reconstruction (ER) in future High-Energy Physics experiments. As such, qLUE marries decades of development with the quadratic speedup provided by quantum computers. We numerically test qLUE in several scenarios, demonstrating its effectiveness and proving it to be a promising route to handle complex data analysis tasks -- especially in high-dimensional datasets with high densities of points. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.17576 [pdf, other]

Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations

Authors: Cheng Wang, Christopher Redino, Ryan Clark, Abdul Rahman, Sal Aguinaga, Sathvik Murli, Dhruv Nandakumar, Roland Rao, Lanxiao Huang, Daniel Radke, Edward Bowen

Abstract: Ransomware presents a significant and increasing threat to individuals and organizations by encrypting their systems and not releasing them until a large fee has been extracted. To bolster preparedness against potential attacks, organizations commonly conduct red teaming exercises, which involve simulated attacks to assess existing security measures. This paper proposes a novel approach utilizing… ▽ More Ransomware presents a significant and increasing threat to individuals and organizations by encrypting their systems and not releasing them until a large fee has been extracted. To bolster preparedness against potential attacks, organizations commonly conduct red teaming exercises, which involve simulated attacks to assess existing security measures. This paper proposes a novel approach utilizing reinforcement learning (RL) to simulate ransomware attacks. By training an RL agent in a simulated environment mirroring real-world networks, effective attack strategies can be learned quickly, significantly streamlining traditional, manual penetration testing processes. The attack pathways revealed by the RL agent can provide valuable insights to the defense team, hel** them identify network weak points and develop more resilient defensive measures. Experimental results on a 152-host example network confirm the effectiveness of the proposed approach, demonstrating the RL agent's capability to discover and orchestrate attacks on high-value targets while evading honeyfiles (decoy files strategically placed to detect unauthorized access). △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.12998 [pdf, other]

Articulatory Encodec: Vocal Tract Kinematics as a Codec for Speech

Authors: Cheol Jun Cho, Peter Wu, Tejas S. Prabhune, Dhruv Agarwal, Gopala K. Anumanchipalli

Abstract: Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- articulatory encodec. The articulatory encod… ▽ More Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- articulatory encodec. The articulatory encodec comprises an articulatory analysis model that infers articulatory features from speech audio, and an articulatory synthesis model that synthesizes speech audio from articulatory features. The articulatory features are kinematic traces of vocal tract articulators and source features, which are intuitively interpretable and controllable, being the actual physical interface of speech production. An additional speaker identity encoder is jointly trained with the articulatory synthesizer to inform the voice texture of individual speakers. By training on large-scale speech data, we achieve a fully intelligible, high-quality articulatory synthesizer that generalizes to unseen speakers. Furthermore, the speaker embedding is effectively disentangled from articulations, which enables accent-perserving zero-shot voice conversion. To the best of our knowledge, this is the first demonstration of universal, high-performance articulatory inference and synthesis, suggesting the proposed framework as a powerful coding system of speech. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.09366 [pdf, other]

Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

Authors: Rylan Schaeffer, Victor Lecomte, Dhruv Bhandarkar Pai, Andres Carranza, Berivan Isik, Alyssa Unell, Mikail Khona, Thomas Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

Abstract: Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to impro… ▽ More Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to improve our understanding and our utilization of MMCR. To better understand MMCR, we leverage tools from high dimensional probability to demonstrate that MMCR incentivizes alignment and uniformity of learned embeddings. We then leverage tools from information theory to show that such embeddings maximize a well-known lower bound on mutual information between views, thereby connecting the geometric perspective of MMCR to the information-theoretic perspective commonly discussed in MVSSL. To better utilize MMCR, we mathematically predict and experimentally confirm non-monotonic changes in the pretraining loss akin to double descent but with respect to atypical hyperparameters. We also discover compute scaling laws that enable predicting the pretraining loss as a function of gradients steps, batch size, embedding dimension and number of views. We then show that MMCR, originally applied to image data, is performant on multimodal image-text data. By more deeply understanding the theoretical and empirical behavior of MMCR, our work reveals insights on improving MVSSL methods. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.01799 [pdf, other]

Online Control in Population Dynamics

Authors: Noah Golowich, Elad Hazan, Zhou Lu, Dhruv Rohatgi, Y. Jennifer Sun

Abstract: The study of population dynamics originated with early sociological works but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics. Most studies on population dynamics focus on the problem of prediction rather than control. Existing mathematical models for control in population dynamics are often restricted to specific, noise-free dynamics,… ▽ More The study of population dynamics originated with early sociological works but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics. Most studies on population dynamics focus on the problem of prediction rather than control. Existing mathematical models for control in population dynamics are often restricted to specific, noise-free dynamics, while real-world population changes can be complex and adversarial. To address this gap, we propose a new framework based on the paradigm of online control. We first characterize a set of linear dynamical systems that can naturally model evolving populations. We then give an efficient gradient-based controller for these systems, with near-optimal regret bounds with respect to a broad class of linear policies. Our empirical evaluations demonstrate the effectiveness of the proposed algorithm for control in population dynamics even for non-linear models such as SIR and replicator dynamics. △ Less

Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01336 [pdf, ps, other]

A Reverse Mathematical Analysis of Hilbert's Nullstellensatz and Basis Theorem

Authors: Dhruv Kulshreshtha

Abstract: This paper presents an expository reverse-mathematical analysis of two fundamental theorems in commutative algebra: Hilbert's Nullstellensatz and Basis Theorem. In addition to its profound significance in commutative algebra and algebraic geometry, the Basis Theorem is also historically notable for its nonconstructive proof. The Nullstellensatz, on the other hand, is noteworthy as it establishes a… ▽ More This paper presents an expository reverse-mathematical analysis of two fundamental theorems in commutative algebra: Hilbert's Nullstellensatz and Basis Theorem. In addition to its profound significance in commutative algebra and algebraic geometry, the Basis Theorem is also historically notable for its nonconstructive proof. The Nullstellensatz, on the other hand, is noteworthy as it establishes a fundamental connection between the more algebraic notion of ideals and the more geometric notion of varieties. We explore the conscious shift from computational to conceptual approaches in mathematical argumentation, contextualizing Hilbert's contributions. We formalize the relative constructivity of these theorems using the framework of reverse mathematics, although we do not presuppose familiarity with reverse mathematics. Drawing from contemporary mathematical literature, we analyze the Basis Theorem's reliance on nonconstructive methods versus the more constructive nature of the Nullstellensatz. Our study employs the standard tools of reverse mathematics, in particular subsystems of second-order arithmetic, to outline the minimal set-existence axioms required for these theorems. We review results showing that certain formulations of the Nullstellensatz are provable in the weak axiom system of $\mathsf{RCA}_0$, while the Basis Theorem requires stronger axioms, such as $Σ^0_2$-Induction. Consequently, we position these theorems separately within the Friedman-Simpson hierarchy. This analysis contributes to a deeper understanding of the foundational requirements for these pivotal results in algebra. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 35 pages, 2 figures

MSC Class: 03B30; 03F35

arXiv:2405.20254 [pdf, other]

Conversational Agents to Facilitate Deliberation on Harmful Content in WhatsApp Groups

Authors: Dhruv Agarwal, Farhana Shahid, Aditya Vashistha

Abstract: WhatsApp groups have become a hotbed for the propagation of harmful content including misinformation, hate speech, polarizing content, and rumors, especially in Global South countries. Given the platform's end-to-end encryption, moderation responsibilities lie on group admins and members, who rarely contest such content. Another approach is fact-checking, which is unscalable, and can only contest… ▽ More WhatsApp groups have become a hotbed for the propagation of harmful content including misinformation, hate speech, polarizing content, and rumors, especially in Global South countries. Given the platform's end-to-end encryption, moderation responsibilities lie on group admins and members, who rarely contest such content. Another approach is fact-checking, which is unscalable, and can only contest factual content (e.g., misinformation) but not subjective content (e.g., hate speech). Drawing on recent literature, we explore deliberation -- open and inclusive discussion -- as an alternative. We investigate the role of a conversational agent in facilitating deliberation on harmful content in WhatsApp groups. We conducted semi-structured interviews with 21 Indian WhatsApp users, employing a design probe to showcase an example agent. Participants expressed the need for anonymity and recommended AI assistance to reduce the effort required in deliberation. They appreciated the agent's neutrality but pointed out the futility of deliberation in echo chamber groups. Our findings highlight design tensions for such an agent, including privacy versus group dynamics and freedom of speech in private spaces. We discuss the efficacy of deliberation using deliberative theory as a lens, compare deliberation with moderation and fact-checking, and provide design recommendations for future such systems. Ultimately, this work advances CSCW by offering insights into designing deliberative systems for combating harmful content in private group chats on social media. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: To appear at CSCW 2024

arXiv:2405.16406 [pdf, other]

SpinQuant: LLM quantization with learned rotations

Authors: Zechun Liu, Changsheng Zhao, Igor Fedorov, Bilge Soran, Dhruv Choudhary, Raghuraman Krishnamoorthi, Vikas Chandra, Yuandong Tian, Tijmen Blankevoort

Abstract: Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Recent findings suggest that rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a… ▽ More Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Recent findings suggest that rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a collection of applicable rotation parameterizations that lead to identical outputs in full-precision Transformer architectures, and find that some random rotations lead to much better quantization than others, with an up to 13 points difference in downstream zero-shot reasoning performance. As a result, we propose SpinQuant that optimizes (or learns) the rotation matrices with Cayley optimization on a small validation set. With 4-bit quantization of weight, activation, and KV-cache, SpinQuant narrows the accuracy gap on zero-shot reasoning tasks with full precision to merely 2.9 points on the LLaMA-2 7B model, surpassing LLM-QAT by 19.1 points and SmoothQuant by 25.0 points. SpinQuant also outperforms concurrent work QuaRot, which applies random rotations to remove outliers. In particular, for LLaMA-2 7B/LLaMA-3 8B models that are hard to quantize, SpinQuant reduces the gap to full precision by 30.2%/34.1% relative to QuaRot. △ Less

Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.13067 [pdf, ps, other]

Can the second time-derivative of the orbital frequency of binary pulsars be used for testing general relativity?

Authors: Dhruv Pathak, Debarati Chatterjee

Abstract: With precision pulsar timing, measured values of a large set of pulsar parameters are obtainable. For some of those parameters, such as the time-derivatives of spin or orbital periods (in the case of binary pulsars), the measured values are not the intrinsic values of the parameters as they contain contributions from the dynamical effects. In the case of orbital period derivatives, the intrinsic v… ▽ More With precision pulsar timing, measured values of a large set of pulsar parameters are obtainable. For some of those parameters, such as the time-derivatives of spin or orbital periods (in the case of binary pulsars), the measured values are not the intrinsic values of the parameters as they contain contributions from the dynamical effects. In the case of orbital period derivatives, the intrinsic values are essentially the general relativistic results. Pulsar timing solution also provides measurement of higher time-derivatives of orbital frequency for some pulsars. We specifically focus on the second time-derivative of the orbital frequency to explore its application in testing general relativity. In this work, we have provided a formalism to estimate the general relativistic contribution to the second derivative of the orbital frequency. We have calculated the dynamical effect contributions as well as the general relativistic contributions to the second time-derivative of the orbital period for real as well as synthetic pulsars. We find that the general relativistic contribution to the second time-derivative of the orbital period is negligibly small compared to the observed values of the real pulsars. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 11 pages, 1 figure

arXiv:2405.12199 [pdf, ps, other]

Bond percolation games on the $2$-dimensional square lattice, and ergodicity of associated probabilistic cellular automata

Authors: Dhruv Bhasin, Sayar Karmakar, Moumanti Podder, Souvik Roy

Abstract: We consider bond percolation games on the $2$-dimensional square lattice in which each edge (that is either between the sites $(x,y)$ and $(x+1,y)$ or between the sites $(x,y)$ and $(x,y+1)$, for all $(x,y) \in \mathbb{Z}^{2}$) has been assigned, independently, a label that reads "trap" with probability $p$, "target" with probability $q$, and "open" with probability $1-p-q$. Once a realization of… ▽ More We consider bond percolation games on the $2$-dimensional square lattice in which each edge (that is either between the sites $(x,y)$ and $(x+1,y)$ or between the sites $(x,y)$ and $(x,y+1)$, for all $(x,y) \in \mathbb{Z}^{2}$) has been assigned, independently, a label that reads "trap" with probability $p$, "target" with probability $q$, and "open" with probability $1-p-q$. Once a realization of this labeling is generated, it is revealed in its entirety to the players before the game starts. The game involves a single token, initially placed at the origin, and two players who take turns to make moves. A move involves relocating the token from where it is currently located, say the site $(x,y)$, to any one of $(x+1,y)$ and $(x,y+1)$. A player wins if she is able to move the token along an edge labeled a target, or if she is able to force her opponent to move the token along an edge labeled a trap. The game is said to result in a draw if it continues indefinitely (i.e.\ with the token always being moved along open edges). We ask the question: for what values of $p$ and $q$ is the probability of draw equal to $0$? By establishing a close connection between the event of draw and the ergodicity of a suitably defined probabilistic cellualar automaton, we are able to show that the probability of draw is $0$ when $p > 0.157175$ and $q=0$, and when $p=q \geqslant 0.10883$. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: Main body of the paper: 21 pages. Appendix: from page 21 till page 43. No. of figures: 4

arXiv:2405.11487 [pdf, other]

"Previously on ..." From Recaps to Story Summarization

Authors: Aditya Kumar Singh, Dhruv Srivastava, Makarand Tapaswi

Abstract: We introduce multimodal story summarization by leveraging TV episode recaps - short video sequences interweaving key story moments from previous episodes to bring viewers up to speed. We propose PlotSnap, a dataset featuring two crime thriller TV shows with rich recaps and long episodes of 40 minutes. Story summarization labels are unlocked by matching recap shots to corresponding sub-stories in t… ▽ More We introduce multimodal story summarization by leveraging TV episode recaps - short video sequences interweaving key story moments from previous episodes to bring viewers up to speed. We propose PlotSnap, a dataset featuring two crime thriller TV shows with rich recaps and long episodes of 40 minutes. Story summarization labels are unlocked by matching recap shots to corresponding sub-stories in the episode. We propose a hierarchical model TaleSumm that processes entire episodes by creating compact shot and dialog representations, and predicts importance scores for each video shot and dialog utterance by enabling interactions between local story groups. Unlike traditional summarization, our method extracts multiple plot points from long videos. We present a thorough evaluation on story summarization, including promising cross-series generalization. TaleSumm also shows good results on classic video summarization benchmarks. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: CVPR 2024; Project page: https://katha-ai.github.io/projects/recap-story-summ/

arXiv:2405.10391 [pdf, other]

Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance

Authors: Anish Bhattacharya, Nishanth Rao, Dhruv Parikh, Pratik Kunapuli, Nikolai Matni, Vijay Kumar

Abstract: We demonstrate the capabilities of an attention-based end-to-end approach for high-speed quadrotor obstacle avoidance in dense, cluttered environments, with comparison to various state-of-the-art architectures. Quadrotor unmanned aerial vehicles (UAVs) have tremendous maneuverability when flown fast; however, as flight speed increases, traditional vision-based navigation via independent map**, p… ▽ More We demonstrate the capabilities of an attention-based end-to-end approach for high-speed quadrotor obstacle avoidance in dense, cluttered environments, with comparison to various state-of-the-art architectures. Quadrotor unmanned aerial vehicles (UAVs) have tremendous maneuverability when flown fast; however, as flight speed increases, traditional vision-based navigation via independent map**, planning, and control modules breaks down due to increased sensor noise, compounding errors, and increased processing latency. Thus, learning-based, end-to-end planning and control networks have shown to be effective for online control of these fast robots through cluttered environments. We train and compare convolutional, U-Net, and recurrent architectures against vision transformer models for depth-based end-to-end control, in a photorealistic, high-physics-fidelity simulator as well as in hardware, and observe that the attention-based models are more effective as quadrotor speeds increase, while recurrent models with many layers provide smoother commands at lower speeds. To the best of our knowledge, this is the first work to utilize vision transformers for end-to-end vision-based quadrotor control. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 8 pages, 10 figures, 3 tables

arXiv:2405.05852 [pdf, other]

Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control

Authors: Gunshi Gupta, Karmesh Yadav, Yarin Gal, Dhruv Batra, Zsolt Kira, Cong Lu, Tim G. J. Rudner

Abstract: Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs. Such capabilities are difficult to learn solely from task-specific data. This has led to the emergence of pre-trained vision-language models as a tool for transferring representations learned from internet-scale data to downstream tasks and new domains. However, commonly used… ▽ More Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs. Such capabilities are difficult to learn solely from task-specific data. This has led to the emergence of pre-trained vision-language models as a tool for transferring representations learned from internet-scale data to downstream tasks and new domains. However, commonly used contrastively trained representations such as in CLIP have been shown to fail at enabling embodied agents to gain a sufficiently fine-grained scene understanding -- a capability vital for control. To address this shortcoming, we consider representations from pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts and as such, contain text-conditioned representations that reflect highly fine-grained visuo-spatial information. Using pre-trained text-to-image diffusion models, we construct Stable Control Representations which allow learning downstream control policies that generalize to complex, open-ended environments. We show that policies learned using Stable Control Representations are competitive with state-of-the-art representation learning approaches across a broad range of simulated control settings, encompassing challenging manipulation and navigation tasks. Most notably, we show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05033 [pdf, other]

Multi-fidelity Hamiltonian Monte Carlo

Authors: Dhruv V. Patel, Jonghyun Lee, Matthew W. Farthing, Peter K. Kitanidis, Eric F. Darve

Abstract: Numerous applications in biology, statistics, science, and engineering require generating samples from high-dimensional probability distributions. In recent years, the Hamiltonian Monte Carlo (HMC) method has emerged as a state-of-the-art Markov chain Monte Carlo technique, exploiting the shape of such high-dimensional target distributions to efficiently generate samples. Despite its impressive em… ▽ More Numerous applications in biology, statistics, science, and engineering require generating samples from high-dimensional probability distributions. In recent years, the Hamiltonian Monte Carlo (HMC) method has emerged as a state-of-the-art Markov chain Monte Carlo technique, exploiting the shape of such high-dimensional target distributions to efficiently generate samples. Despite its impressive empirical success and increasing popularity, its wide-scale adoption remains limited due to the high computational cost of gradient calculation. Moreover, applying this method is impossible when the gradient of the posterior cannot be computed (for example, with black-box simulators). To overcome these challenges, we propose a novel two-stage Hamiltonian Monte Carlo algorithm with a surrogate model. In this multi-fidelity algorithm, the acceptance probability is computed in the first stage via a standard HMC proposal using an inexpensive differentiable surrogate model, and if the proposal is accepted, the posterior is evaluated in the second stage using the high-fidelity (HF) numerical solver. Splitting the standard HMC algorithm into these two stages allows for approximating the gradient of the posterior efficiently, while producing accurate posterior samples by using HF numerical solvers in the second stage. We demonstrate the effectiveness of this algorithm for a range of problems, including linear and nonlinear Bayesian inverse problems with in-silico data and experimental data. The proposed algorithm is shown to seamlessly integrate with various low-fidelity and HF models, priors, and datasets. Remarkably, our proposed method outperforms the traditional HMC algorithm in both computational and statistical efficiency by several orders of magnitude, all while retaining or improving the accuracy in computed posterior statistics. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.03455 [pdf, ps, other]

Big line or big convex polygon

Authors: David Conlon, Jacob Fox, Xiaoyu He, Dhruv Mubayi, Andrew Suk, Jacques Verstraete

Abstract: Let $ES_{\ell}(n)$ be the minimum $N$ such that every $N$-element point set in the plane contains either $\ell$ collinear members or $n$ points in convex position. We prove that there is a constant $C>0$ such that, for each $\ell, n \ge 3$, $$ (3\ell - 1) \cdot 2^{n-5} < ES_{\ell}(n) < \ell^2 \cdot 2^{n+ C\sqrt{n\log n}}.$$ A similar extension of the well-known Erd\H os--Szekeres cups-caps theor… ▽ More Let $ES_{\ell}(n)$ be the minimum $N$ such that every $N$-element point set in the plane contains either $\ell$ collinear members or $n$ points in convex position. We prove that there is a constant $C>0$ such that, for each $\ell, n \ge 3$, $$ (3\ell - 1) \cdot 2^{n-5} < ES_{\ell}(n) < \ell^2 \cdot 2^{n+ C\sqrt{n\log n}}.$$ A similar extension of the well-known Erd\H os--Szekeres cups-caps theorem is also proved. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.03375 [pdf, other]

Three-temperature radiation hydrodynamics with PLUTO: Thermal and kinematic signatures of accreting protoplanets

Authors: Dhruv Muley, Julio David Melon Fuksman, Hubert Klahr

Abstract: In circumstellar disks around young stars, the gravitational influence of nascent planets produces telltale patterns in density, temperature, and kinematics. To better understand these signatures, we first performed 3D hydrodynamical simulations of a 0.012 $M_{\odot}$ disk, with a Saturn-mass planet orbiting circularly in-plane at 40 au. We tested four different disk thermodynamic prescriptions (i… ▽ More In circumstellar disks around young stars, the gravitational influence of nascent planets produces telltale patterns in density, temperature, and kinematics. To better understand these signatures, we first performed 3D hydrodynamical simulations of a 0.012 $M_{\odot}$ disk, with a Saturn-mass planet orbiting circularly in-plane at 40 au. We tested four different disk thermodynamic prescriptions (in increasing order of complexity, local isothermality, $β$-cooling, two-temperature radiation hydrodynamics, and three-temperature radiation hydrodynamics), finding that $β$-cooling offers a reasonable approximation for the three-temperature approach when the planet is not massive or luminous enough to substantially alter the background temperature and density structure. Thereafter, using the three-temperature scheme, we relaxed this assumption, simulating a range of different planet masses (Neptune-mass, Saturn-mass, Jupiter-mass) and accretion luminosities (0, $10^{-3} L_{\odot}$) in the same disk. Our investigation revealed that signatures of disk-planet interaction strengthen with increasing planet mass, with circumplanetary flows becoming prominent in the high-planet-mass regime. Accretion luminosity, which adds pressure support around the planet, was found to weaken the midplane Doppler-flip, potentially visible in optically thin tracers like C$^{18}$O, while strengthening the spiral signature, particularly in upper disk layers sensitive to thicker lines, like those of $^{12}$CO. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Accepted to Astronomy and Astrophysics; 22 pages, 19 figures incl. Appendix. Comments and questions welcome

arXiv:2405.03112 [pdf, ps, other]

Inducibility of rainbow graphs

Authors: Emily Cairncross, Clayton Mizgerd, Dhruv Mubayi

Abstract: Fix $k\ge 11$ and a rainbow $k$-clique $R$. We prove that the inducibility of $R$ is $k!/(k^k-k)$. An extremal construction is a balanced recursive blow-up of $R$. This answers a question posed by Huang, that is a generalization of an old problem of Erd\H os and Sós. It remains open to determine the minimum $k$ for which our result is true. More generally, we prove that there is an absolute consta… ▽ More Fix $k\ge 11$ and a rainbow $k$-clique $R$. We prove that the inducibility of $R$ is $k!/(k^k-k)$. An extremal construction is a balanced recursive blow-up of $R$. This answers a question posed by Huang, that is a generalization of an old problem of Erd\H os and Sós. It remains open to determine the minimum $k$ for which our result is true. More generally, we prove that there is an absolute constant $C>0$ such that every $k$-vertex connected rainbow graph with minimum degree at least $C\log k$ has inducibility $k!/(k^k-k)$. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 27 pages

arXiv:2405.01394 [pdf, other]

Analysis of a Modular Autonomous Driving Architecture: The Top Submission to CARLA Leaderboard 2.0 Challenge

Authors: Weize Zhang, Mohammed Elmahgiubi, Kasra Rezaee, Behzad Khamidehi, Hamidreza Mirkhani, Fazel Arasteh, Chunlin Li, Muhammad Ahsan Kaleem, Eduardo R. Corral-Soto, Dhruv Sharma, Tongtong Cao

Abstract: In this paper we present the architecture of the Kyber-E2E submission to the map track of CARLA Leaderboard 2.0 Autonomous Driving (AD) challenge 2023, which achieved first place. We employed a modular architecture for our solution consists of five main components: sensing, localization, perception, tracking/prediction, and planning/control. Our solution leverages state-of-the-art language-assiste… ▽ More In this paper we present the architecture of the Kyber-E2E submission to the map track of CARLA Leaderboard 2.0 Autonomous Driving (AD) challenge 2023, which achieved first place. We employed a modular architecture for our solution consists of five main components: sensing, localization, perception, tracking/prediction, and planning/control. Our solution leverages state-of-the-art language-assisted perception models to help our planner perform more reliably in highly challenging traffic scenarios. We use open-source driving datasets in conjunction with Inverse Reinforcement Learning (IRL) to enhance the performance of our motion planner. We provide insight into our design choices and trade-offs made to achieve this solution. We also explore the impact of each component in the overall performance of our solution, with the intent of providing a guideline where allocation of resources can have the greatest impact. △ Less

Submitted 21 March, 2024; originally announced May 2024.

arXiv:2405.00831 [pdf, other]

Numerical investigation of three-dimensional effects of cavitating flow in a venturi-type hydrodynamic cavitation reactor

Authors: Dhruv Apte, Mingming Ge, Olivier Coutier-Delgosha

Abstract: The concept of Hydrodynamic Cavitation (HC) has emerged as a promising method for wastewater treatment, bio-diesel production and multiple other environmental processes with Venturi-type cavitation reactors showing particular advantages. However, numerical simulations of a venturi-type reactor with an elucidated explanation of the underlying flow physics remain inadequate. The present study numeri… ▽ More The concept of Hydrodynamic Cavitation (HC) has emerged as a promising method for wastewater treatment, bio-diesel production and multiple other environmental processes with Venturi-type cavitation reactors showing particular advantages. However, numerical simulations of a venturi-type reactor with an elucidated explanation of the underlying flow physics remain inadequate. The present study numerically investigates and analyzes the flow inside a venturi-type reactor from both global cavity dynamics and localized turbulence statistics perspectives. Some models in the Detached Eddy Simulation (DES) family are employed to model the turbulence with the study initially comparing 2D simulations before extending the analysis to 3D simulations. The results show that while URANS models show significantly different dynamics as a result of grid refinement, the DES models show standard flow dynamics associated with cavitating flows. Nevertheless, signifi- cant discrepancies continue to exist when comparing the turbulence statistics on the local scale. As the discussion extends to 3D calculations, the DES models are able to well predict the turbulence phenomena at the local scale and reveal some new insights regarding the role of baroclinic torque into the cavitation-vortex interaction.The findings of this study thus contribute to the fundamental understandings of the venturi-type reactor. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.19550 [pdf, other]

Thermal conductivity reduction due to phonon geometrical scattering in nano-engineered epitaxial germanium

Authors: Jessy Paterson, Sunanda Mitra, Yanqing Liu, Mustapha Boukhari, Dhruv Singhal, David Lacroix, Emmanuel Hadji, André Barski, Dimitri Tainoff, Olivier Bourgeois

Abstract: Nano-engineering crystalline materials can be used to tailor their thermal properties. By adding new nanoscale phonon scattering centers and controlling their size, one can effectively decrease the phonon mean free path and hence the thermal conductivity of a fully crystalline material. In this letter, we use the 3$ω$ method in the temperature range of 100-300 K to experimentally report on the mor… ▽ More Nano-engineering crystalline materials can be used to tailor their thermal properties. By adding new nanoscale phonon scattering centers and controlling their size, one can effectively decrease the phonon mean free path and hence the thermal conductivity of a fully crystalline material. In this letter, we use the 3$ω$ method in the temperature range of 100-300 K to experimentally report on the more than threefold reduction of the thermal conductivity of an epitaxially-grown crystalline germanium thin film with embedded polydispersed crystalline \ch{Ge3Mn5} nano-inclusions with diameters ranging from 5 to 25~nm. A detailed analysis of the structure of the thin film coupled with Monte Carlo simulations of phonon transport highlight the role of the nano-inclusions volume fraction in the reduction of the phononic contribution to the thermal conductivity, in particular its temperature dependence, leading to a phonon mean free path that is set by geometrical constraints. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: Applied Physics Letters, In press

arXiv:2404.19478 [pdf, ps, other]

Non Gaussian statistics in static and dynamic Galton boards

Authors: Dhruv Shah, R. K. Shishir, Manjaree, Shreya Pithva, T. Y. Booritth Balaji, Rahul Agarwal Singh

Abstract: Perturbing the arrangements of pegs on a static Galton board can result in non-trivial stationary distributions, which in the continuum limit correspond to departure from regular gaussian behavior. Two such distributions are obtained. Further, the distributions generated for a dynamic galton board under external forcing in a general direction are obtained by solution of the corresponding stochasti… ▽ More Perturbing the arrangements of pegs on a static Galton board can result in non-trivial stationary distributions, which in the continuum limit correspond to departure from regular gaussian behavior. Two such distributions are obtained. Further, the distributions generated for a dynamic galton board under external forcing in a general direction are obtained by solution of the corresponding stochastic differential equations. Exact cumulant generating functions for the distribution are presented for forcing in one dimension. An approximate expression, correct to first order in the forcing amplitude, is presented for the case of two dimensions. Both cases show nontrivial departures from the static gaussian solution. △ Less

Submitted 2 July, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2404.18734 [pdf, other]

Interplay between Contractivity and Monotonicity for Reaction Networks

Authors: Alon Duvall, M. Ali Al-Radhawi, Dhruv D. Jatkar, Eduardo Sontag

Abstract: This work studies relationships between monotonicity and contractivity, and applies the results to establish that many reaction networks are weakly contractive, and thus, under appropriate compactness conditions, globally convergent to equilibria. Verification of these properties is achieved through a novel algorithm that can be used to generate cones for monotone systems. The results given here a… ▽ More This work studies relationships between monotonicity and contractivity, and applies the results to establish that many reaction networks are weakly contractive, and thus, under appropriate compactness conditions, globally convergent to equilibria. Verification of these properties is achieved through a novel algorithm that can be used to generate cones for monotone systems. The results given here allow a unified proof of global convergence for several classes of networks that had been previously studied in the literature. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.17676 [pdf, other]

Toward a 2D Local Implementation of Quantum LDPC Codes

Authors: Noah Berthusen, Dhruv Devulapalli, Eddie Schoute, Andrew M. Childs, Michael J. Gullans, Alexey V. Gorshkov, Daniel Gottesman

Abstract: Geometric locality is an important theoretical and practical factor for quantum low-density parity-check (qLDPC) codes which affects code performance and ease of physical realization. For device architectures restricted to 2D local gates, naively implementing the high-rate codes suitable for low-overhead fault-tolerant quantum computing incurs prohibitive overhead. In this work, we present an erro… ▽ More Geometric locality is an important theoretical and practical factor for quantum low-density parity-check (qLDPC) codes which affects code performance and ease of physical realization. For device architectures restricted to 2D local gates, naively implementing the high-rate codes suitable for low-overhead fault-tolerant quantum computing incurs prohibitive overhead. In this work, we present an error correction protocol built on a bilayer architecture that aims to reduce operational overheads when restricted to 2D local gates by measuring some generators less frequently than others. We investigate the family of bivariate bicycle qLDPC codes and show that they are well suited for a parallel syndrome measurement scheme using fast routing with local operations and classical communication (LOCC). Through circuit-level simulations, we find that in some parameter regimes bivariate bicycle codes implemented with this protocol have logical error rates comparable to the surface code while using fewer physical qubits. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 16 pages, 10 figures

Report number: LA-UR-24-22713

arXiv:2404.16016 [pdf, ps, other]

A question of Erdős and Graham on Egyptian fractions

Authors: David Conlon, Jacob Fox, Xiaoyu He, Dhruv Mubayi, Huy Tuan Pham, Andrew Suk, Jacques Verstraëte

Abstract: Answering a question of Erdős and Graham, we show that for each fixed positive rational number $x$ the number of ways to write $x$ as a sum of reciprocals of distinct positive integers each at most $n$ is $2^{(c_x + o(1))n}$ for an explicit constant $c_x$ increasing with $x$. Answering a question of Erdős and Graham, we show that for each fixed positive rational number $x$ the number of ways to write $x$ as a sum of reciprocals of distinct positive integers each at most $n$ is $2^{(c_x + o(1))n}$ for an explicit constant $c_x$ increasing with $x$. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 8 pages

arXiv:2404.12029 [pdf, other]

KnotResolver: Tracking self-intersecting filaments in microscopy using directed graphs

Authors: Dhruv Khatri, Shivani A. Yadav, Chaitanya A. Athale

Abstract: Quantification of microscopy time-series of in vitro reconstituted motor driven microtubule (MT) transport in 'gliding assays' is typically performed using computational object tracking tools. However, these are limited to non-intersecting and rod-like filaments. Here, we describe a novel computational image-analysis pipeline, KnotResolver, to track image time-series of highly curved self-intersec… ▽ More Quantification of microscopy time-series of in vitro reconstituted motor driven microtubule (MT) transport in 'gliding assays' is typically performed using computational object tracking tools. However, these are limited to non-intersecting and rod-like filaments. Here, we describe a novel computational image-analysis pipeline, KnotResolver, to track image time-series of highly curved self-intersecting looped filaments (knots) by resolving cross-overs. The code integrates filament segmentation and cross-over or 'knot' identification based on directed graph representation, where nodes represent cross-overs and edges represent the path connecting them. The graphs are mapped back to contours and the distance to a reference minimized. We demonstrate the utility of the tool by segmentation and tracking MTs from experiments with dynein-driven wave like filament loo**. The accuracy of contour detection is sub-pixel accuracy, and Dice scores indicate a robustness to noise, better than currently used tools. Thus KnotResolver overcomes multiple limitations of widely used tools in microscopy of cytoskeletal filament-like structures. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Manuscript in submission

arXiv:2404.11819 [pdf, other]

Utilizing Adversarial Examples for Bias Mitigation and Accuracy Enhancement

Authors: Pushkar Shukla, Dhruv Srikanth, Lee Cohen, Matthew Turk

Abstract: We propose a novel approach to mitigate biases in computer vision models by utilizing counterfactual generation and fine-tuning. While counterfactuals have been used to analyze and address biases in DNN models, the counterfactuals themselves are often generated from biased generative models, which can introduce additional biases or spurious correlations. To address this issue, we propose using adv… ▽ More We propose a novel approach to mitigate biases in computer vision models by utilizing counterfactual generation and fine-tuning. While counterfactuals have been used to analyze and address biases in DNN models, the counterfactuals themselves are often generated from biased generative models, which can introduce additional biases or spurious correlations. To address this issue, we propose using adversarial images, that is images that deceive a deep neural network but not humans, as counterfactuals for fair model training. Our approach leverages a curriculum learning framework combined with a fine-grained adversarial loss to fine-tune the model using adversarial examples. By incorporating adversarial images into the training data, we aim to prevent biases from propagating through the pipeline. We validate our approach through both qualitative and quantitative assessments, demonstrating improved bias mitigation and accuracy compared to existing methods. Qualitatively, our results indicate that post-training, the decisions made by the model are less dependent on the sensitive attribute and our model better disentangles the relationship between sensitive attributes and classification variables. △ Less

Submitted 27 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11325 [pdf, ps, other]

On Learning Parities with Dependent Noise

Authors: Noah Golowich, Ankur Moitra, Dhruv Rohatgi

Abstract: In this expository note we show that the learning parities with noise (LPN) assumption is robust to weak dependencies in the noise distribution of small batches of samples. This provides a partial converse to the linearization technique of [AG11]. The material in this note is drawn from a recent work by the authors [GMR24], where the robustness guarantee was a key component in a cryptographic sepa… ▽ More In this expository note we show that the learning parities with noise (LPN) assumption is robust to weak dependencies in the noise distribution of small batches of samples. This provides a partial converse to the linearization technique of [AG11]. The material in this note is drawn from a recent work by the authors [GMR24], where the robustness guarantee was a key component in a cryptographic separation between reinforcement learning and supervised learning. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: This note draws heavily from arXiv:2404.03774

arXiv:2404.08513 [pdf, other]

Adversarial Imitation Learning via Boosting

Authors: Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun

Abstract: Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning algorithms in improving sample efficiency and scalability to higher-dimensional observations. Despite DAC's empirical success, the original AIL objective… ▽ More Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning algorithms in improving sample efficiency and scalability to higher-dimensional observations. Despite DAC's empirical success, the original AIL objective is on-policy and DAC's ad-hoc application of off-policy training does not guarantee successful imitation (Kostrikov et al., 2019; 2020). Follow-up work such as ValueDICE (Kostrikov et al., 2020) tackles this issue by deriving a fully off-policy AIL objective. Instead in this work, we develop a novel and principled AIL algorithm via the framework of boosting. Like boosting, our new algorithm, AILBoost, maintains an ensemble of properly weighted weak learners (i.e., policies) and trains a discriminator that witnesses the maximum discrepancy between the distributions of the ensemble and the expert policy. We maintain a weighted replay buffer to represent the state-action distribution induced by the ensemble, allowing us to train discriminators using the entire data collected so far. In the weighted replay buffer, the contribution of the data from older policies are properly discounted with the weight computed based on the boosting framework. Empirically, we evaluate our algorithm on both controller state-based and pixel-based environments from the DeepMind Control Suite. AILBoost outperforms DAC on both types of environments, demonstrating the benefit of properly weighting replay buffer data for off-policy training. On state-based environments, DAC outperforms ValueDICE and IQ-Learn (Gary et al., 2021), achieving competitive performance with as little as one expert trajectory. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 19 pages, 7 figures, 4 tables, 3 algorithms, ICLR 2024

arXiv:2404.06723 [pdf, other]

Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision

Authors: Yingbo Ma, Suraj Kolla, Zhenhong Hu, Dhruv Kaliraman, Victoria Nolan, Ziyuan Guan, Yuanfang Ren, Brooke Armfield, Tezcan Ozrazgat-Baslanti, Jeremy A. Balch, Tyler J. Loftus, Parisa Rashidi, Azra Bihorac, Benjamin Shickel

Abstract: Modern electronic health records (EHRs) hold immense promise in tracking personalized patient health trajectories through sequential deep learning, owing to their extensive breadth, scale, and temporal granularity. Nonetheless, how to effectively leverage multiple modalities from EHRs poses significant challenges, given its complex characteristics such as high dimensionality, multimodality, sparsi… ▽ More Modern electronic health records (EHRs) hold immense promise in tracking personalized patient health trajectories through sequential deep learning, owing to their extensive breadth, scale, and temporal granularity. Nonetheless, how to effectively leverage multiple modalities from EHRs poses significant challenges, given its complex characteristics such as high dimensionality, multimodality, sparsity, varied recording frequencies, and temporal irregularities. To this end, this paper introduces a novel multimodal contrastive learning framework, specifically focusing on medical time series and clinical notes. To tackle the challenge of sparsity and irregular time intervals in medical time series, the framework integrates temporal cross-attention transformers with a dynamic embedding and tokenization scheme for learning multimodal feature representations. To harness the interconnected relationships between medical time series and clinical notes, the framework equips a global contrastive loss, aligning a patient's multimodal feature representations with the corresponding discharge summaries. Since discharge summaries uniquely pertain to individual patients and represent a holistic view of the patient's hospital stay, machine learning models are led to learn discriminative multimodal features via global contrasting. Extensive experiments with a real-world EHR dataset demonstrated that our framework outperformed state-of-the-art approaches on the exemplar task of predicting the occurrence of nine postoperative complications for more than 120,000 major inpatient surgeries using multimodal data from UF health system split among three hospitals (UF Health Gainesville, UF Health Jacksonville, and UF Health Jacksonville-North). △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 12 pages, 3 figures. arXiv admin note: text overlap with arXiv:2403.04012

arXiv:2404.06609 [pdf, other]

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

Authors: Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani, Sriram Yenamandra, Theophile Gervet, Matthew Chang, Zsolt Kira, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi

Abstract: The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images. However, these navigation models often handle only a single input modality as the target. With the progress achieved so far, it is time to move towards universal navigation models capable of handling various goal types, enabling more… ▽ More The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images. However, these navigation models often handle only a single input modality as the target. With the progress achieved so far, it is time to move towards universal navigation models capable of handling various goal types, enabling more effective user interaction with robots. To facilitate this goal, we propose GOAT-Bench, a benchmark for the universal navigation task referred to as GO to AnyThing (GOAT). In this task, the agent is directed to navigate to a sequence of targets specified by the category name, language description, or image in an open-vocabulary fashion. We benchmark monolithic RL and modular methods on the GOAT task, analyzing their performance across modalities, the role of explicit and implicit scene memories, their robustness to noise in goal specifications, and the impact of memory in lifelong scenarios. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.04603 [pdf, ps, other]

Analyzing LLM Usage in an Advanced Computing Class in India

Authors: Chaitanya Arora, Utkarsh Venaik, Pavit Singh, Sahil Goyal, Jatin Tyagi, Shyama Goel, Ujjwal Singhal, Dhruv Kumar

Abstract: This paper investigates the usage patterns of undergraduate and graduate students when engaging with large language models (LLMs) to tackle programming assignments in the context of advanced computing courses. Existing work predominantly focuses on the influence of LLMs in introductory programming contexts. Additionally, there is a scarcity of studies analyzing actual conversations between student… ▽ More This paper investigates the usage patterns of undergraduate and graduate students when engaging with large language models (LLMs) to tackle programming assignments in the context of advanced computing courses. Existing work predominantly focuses on the influence of LLMs in introductory programming contexts. Additionally, there is a scarcity of studies analyzing actual conversations between students and LLMs. Our study provides a comprehensive quantitative and qualitative analysis of raw interactions between students and LLMs within an advanced computing course (Distributed Systems) at an Indian University. We further complement this by conducting student interviews to gain deeper insights into their usage patterns. Our study shows that students make use of large language models (LLMs) in various ways: generating code or debugging code by identifying and fixing errors. They also copy and paste assignment descriptions into LLM interfaces for specific solutions, ask conceptual questions about complex programming ideas or theoretical concepts, and generate test cases to check code functionality and robustness. Our analysis includes over 4,000 prompts from 411 students and conducting interviews with 10 students. Our analysis shows that LLMs excel at generating boilerplate code and assisting in debugging, while students handle the integration of components and system troubleshooting. This aligns with the learning objectives of advanced computing courses, which are oriented towards teaching students how to build systems and troubleshoot, with less emphasis on generating code from scratch. Therefore, LLM tools can be leveraged to increase student productivity, as shown by the data we collected. This study contributes to the ongoing discussion on LLM use in education, advocating for their usefulness in advanced computing courses to complement higher-level learning and productivity. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: Under review: 12 pages

arXiv:2404.04527 [pdf, other]

VTR: An Optimized Vision Transformer for SAR ATR Acceleration on FPGA

Authors: Sachini Wickramasinghe, Dhruv Parikh, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, Carl Busart

Abstract: Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) is a key technique used in military applications like remote-sensing image recognition. Vision Transformers (ViTs) are the current state-of-the-art in various computer vision applications, outperforming their CNN counterparts. However, using ViTs for SAR ATR applications is challenging due to (1) standard ViTs require extensive trai… ▽ More Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) is a key technique used in military applications like remote-sensing image recognition. Vision Transformers (ViTs) are the current state-of-the-art in various computer vision applications, outperforming their CNN counterparts. However, using ViTs for SAR ATR applications is challenging due to (1) standard ViTs require extensive training data to generalize well due to their low locality; the standard SAR datasets, however, have a limited number of labeled training data which reduces the learning capability of ViTs; (2) ViTs have a high parameter count and are computation intensive which makes their deployment on resource-constrained SAR platforms difficult. In this work, we develop a lightweight ViT model that can be trained directly on small datasets without any pre-training by utilizing the Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA) modules. We directly train this model on SAR datasets which have limited training samples to evaluate its effectiveness for SAR ATR applications. We evaluate our proposed model, that we call VTR (ViT for SAR ATR), on three widely used SAR datasets: MSTAR, SynthWakeSAR, and GBSAR. Further, we propose a novel FPGA accelerator for VTR, in order to enable deployment for real-time SAR ATR applications. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: SPIE DCS 2024

arXiv:2404.03774 [pdf, other]

Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning

Authors: Noah Golowich, Ankur Moitra, Dhruv Rohatgi

Abstract: Supervised learning is often computationally easy in practice. But to what extent does this mean that other modes of learning, such as reinforcement learning (RL), ought to be computationally easy by extension? In this work we show the first cryptographic separation between RL and supervised learning, by exhibiting a class of block MDPs and associated decoding functions where reward-free explorati… ▽ More Supervised learning is often computationally easy in practice. But to what extent does this mean that other modes of learning, such as reinforcement learning (RL), ought to be computationally easy by extension? In this work we show the first cryptographic separation between RL and supervised learning, by exhibiting a class of block MDPs and associated decoding functions where reward-free exploration is provably computationally harder than the associated regression problem. We also show that there is no computationally efficient algorithm for reward-directed RL in block MDPs, even when given access to an oracle for this regression problem. It is known that being able to perform regression in block MDPs is necessary for finding a good policy; our results suggest that it is not sufficient. Our separation lower bound uses a new robustness property of the Learning Parities with Noise (LPN) hardness assumption, which is crucial in handling the dependent nature of RL data. We argue that separations and oracle lower bounds, such as ours, are a more meaningful way to prove hardness of learning because the constructions better reflect the practical reality that supervised learning by itself is often not the computational bottleneck. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 112 pages, 3 figures

arXiv:2404.02021 [pdf, ps, other]

On off-diagonal hypergraph Ramsey numbers

Authors: David Conlon, Jacob Fox, Benjamin Gunby, Xiaoyu He, Dhruv Mubayi, Andrew Suk, Jacques Verstraete

Abstract: A fundamental problem in Ramsey theory is to determine the growth rate in terms of $n$ of the Ramsey number $r(H, K_n^{(3)})$ of a fixed $3$-uniform hypergraph $H$ versus the complete $3$-uniform hypergraph with $n$ vertices. We study this problem, proving two main results. First, we show that for a broad class of $H$, including links of odd cycles and tight cycles of length not divisible by three… ▽ More A fundamental problem in Ramsey theory is to determine the growth rate in terms of $n$ of the Ramsey number $r(H, K_n^{(3)})$ of a fixed $3$-uniform hypergraph $H$ versus the complete $3$-uniform hypergraph with $n$ vertices. We study this problem, proving two main results. First, we show that for a broad class of $H$, including links of odd cycles and tight cycles of length not divisible by three, $r(H, K_n^{(3)}) \ge 2^{Ω_H(n \log n)}$. This significantly generalizes and simplifies an earlier construction of Fox and He which handled the case of links of odd cycles and is sharp both in this case and for all but finitely many tight cycles of length not divisible by three. Second, disproving a folklore conjecture in the area, we show that there exists a linear hypergraph $H$ for which $r(H, K_n^{(3)})$ is superpolynomial in $n$. This provides the first example of a separation between $r(H,K_n^{(3)})$ and $r(H,K_{n,n,n}^{(3)})$, since the latter is known to be polynomial in $n$ when $H$ is linear. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 22 pages

arXiv:2404.01472 [pdf, other]

The homeomorphisms of the Sierpiński carpet are not classifiable by countable structures

Authors: Dhruv Kulshreshtha, Aristotelis Panagiotopoulos

Abstract: We show that the homeomorphisms of the Sierpiński carpet are not classifiable, up to conjugacy, using isomorphism types of countable structures as invariants. We show that the homeomorphisms of the Sierpiński carpet are not classifiable, up to conjugacy, using isomorphism types of countable structures as invariants. △ Less

Submitted 11 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 13 pages, 6 figures

MSC Class: 54H05; 03E15; 54F15; 37B05

arXiv:2404.01413 [pdf, other]

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Authors: Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

Abstract: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration… ▽ More The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration until fitted models become useless. However, those studies largely assumed that new data replace old data over time, where an arguably more realistic assumption is that data accumulate over time. In this paper, we ask: what effect does accumulating data have on model collapse? We empirically study this question by pretraining sequences of language models on text corpora. We confirm that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse, then demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse; these results hold across a range of model sizes, architectures, and hyperparameters. We obtain similar results for deep generative models on other types of real data: diffusion models for molecule conformation generation and variational autoencoders for image generation. To understand why accumulating data can avoid model collapse, we use an analytically tractable framework introduced by prior work in which a sequence of linear models are fit to the previous models' outputs. Previous work used this framework to show that if data are replaced, the test error increases with the number of model-fitting iterations; we extend this argument to prove that if data instead accumulate, the test error has a finite upper bound independent of the number of iterations, meaning model collapse no longer occurs. △ Less

Submitted 29 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.14820 [pdf, other]

Competition for binding targets results in paradoxical effects for simultaneous activator and repressor action -- Extended Version

Authors: M. Ali Al-Radhawi, Krishna Manoj, Dhruv D. Jatkar, Alon Duvall, Domitilla Del Vecchio, Eduardo D. Sontag

Abstract: In the context of epigenetic transformations in cancer metastasis, a puzzling effect was recently discovered, in which the elimination (knock-out) of an activating regulatory element leads to increased (rather than decreased) activity of the element being regulated. It has been postulated that this paradoxical behavior can be explained by activating and repressing transcription factors competing f… ▽ More In the context of epigenetic transformations in cancer metastasis, a puzzling effect was recently discovered, in which the elimination (knock-out) of an activating regulatory element leads to increased (rather than decreased) activity of the element being regulated. It has been postulated that this paradoxical behavior can be explained by activating and repressing transcription factors competing for binding to other possible targets. It is very difficult to prove this hypothesis in mammalian cells, due to the large number of potential players and the complexity of endogenous intracellular regulatory networks. Instead, this paper analyzes this issue through an analogous synthetic biology construct which aims to reproduce the paradoxical behavior using standard bacterial gene expression networks. The paper first reviews the motivating cancer biology work, and then describes a proposed synthetic construct. A mathematical model is formulated, and basic properties of uniqueness of steady states and convergence to equilibria are established, as well as an identification of parameter regimes which should lead to observing such paradoxical phenomena (more activator leads to less activity at steady state). A proof is also given to show that this is a steady-state property, and for initial transients the phenomenon will not be observed. This work adds to the general line of work of resource competition in synthetic circuits. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 14 pages, 8 figures

arXiv:2403.14047 [pdf, other]

Accelerating ViT Inference on FPGA through Static and Dynamic Pruning

Authors: Dhruv Parikh, Shouyi Li, Bingyi Zhang, Rajgopal Kannan, Carl Busart, Viktor Prasanna

Abstract: Vision Transformers (ViTs) have achieved state-of-the-art accuracy on various computer vision tasks. However, their high computational complexity prevents them from being applied to many real-world applications. Weight and token pruning are two well-known methods for reducing complexity: weight pruning reduces the model size and associated computational demands, while token pruning further dynamic… ▽ More Vision Transformers (ViTs) have achieved state-of-the-art accuracy on various computer vision tasks. However, their high computational complexity prevents them from being applied to many real-world applications. Weight and token pruning are two well-known methods for reducing complexity: weight pruning reduces the model size and associated computational demands, while token pruning further dynamically reduces the computation based on the input. Combining these two techniques should significantly reduce computation complexity and model size; however, naively integrating them results in irregular computation patterns, leading to significant accuracy drops and difficulties in hardware acceleration. Addressing the above challenges, we propose a comprehensive algorithm-hardware codesign for accelerating ViT on FPGA through simultaneous pruning -combining static weight pruning and dynamic token pruning. For algorithm design, we systematically combine a hardware-aware structured block-pruning method for pruning model parameters and a dynamic token pruning method for removing unimportant token vectors. Moreover, we design a novel training algorithm to recover the model's accuracy. For hardware design, we develop a novel hardware accelerator for executing the pruned model. The proposed hardware design employs multi-level parallelism with load balancing strategy to efficiently deal with the irregular computation pattern led by the two pruning approaches. Moreover, we develop an efficient hardware mechanism for efficiently executing the on-the-fly token pruning. △ Less

Submitted 12 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: FCCM 2024

arXiv:2403.13450 [pdf, other]

Complex Networks characterization of Indian Water Dam Systems and its topographic response

Authors: Dhruv Patel

Abstract: In this paper a Complex Network approach is taken to understand the salient features of Indian Water Dam Networks. Detailed analysis of 15 river basin networks have been carried out. The data has been taken from "River Basin Atlas of India" compiled by the Indian Space Research Organisation (ISRO) and Central Water Commission (CWC), Ministry of Water Resources, Government of India. The paper also… ▽ More In this paper a Complex Network approach is taken to understand the salient features of Indian Water Dam Networks. Detailed analysis of 15 river basin networks have been carried out. The data has been taken from "River Basin Atlas of India" compiled by the Indian Space Research Organisation (ISRO) and Central Water Commission (CWC), Ministry of Water Resources, Government of India. The paper also investigates the correlation between various structural properties of the networks like total number of nodes, Link Density, Clustering Coefficient amongst each other and also with the Irrigation Potential and topographical features like the Elevation gradient of the region measured in meters. A mathematical model has also been proposed to understand the relation between irrigation potential measured in thousand hectares unit with the number of nodes, i.e. dams and barrages, to get a more quantitative understanding of the system. The paper also tries to observe the response of the network properties to actual topographical features of the region. This lays down a basic foundational work in understanding these water dam networks through a complex network approach over which further work can be done to make the predictions more efficient. △ Less

Submitted 28 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: 11 Main pages text and 5 figures in main text. 15 figures in appendix

arXiv:2403.12173 [pdf, other]

TnT-LLM: Text Mining at Scale with Large Language Models

Authors: Mengting Wan, Tara Safavi, Sujay Kumar Jauhar, Yu** Kim, Scott Counts, Jennifer Neville, Siddharth Suri, Chirag Shah, Ryen W White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, Nagu Rangan

Abstract: Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. Thi… ▽ More Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. This is particularly challenging when the label space is under-specified and large-scale data annotations are unavailable. In this paper, we address these challenges with Large Language Models (LLMs), whose prompt-based interface facilitates the induction and use of large-scale pseudo labels. We propose TnT-LLM, a two-phase framework that employs LLMs to automate the process of end-to-end label generation and assignment with minimal human effort for any given use-case. In the first phase, we introduce a zero-shot, multi-stage reasoning approach which enables LLMs to produce and refine a label taxonomy iteratively. In the second phase, LLMs are used as data labelers that yield training samples so that lightweight supervised classifiers can be reliably built, deployed, and served at scale. We apply TnT-LLM to the analysis of user intent and conversational domain for Bing Copilot (formerly Bing Chat), an open-domain chat-based search engine. Extensive experiments using both human and automatic evaluation metrics demonstrate that TnT-LLM generates more accurate and relevant label taxonomies when compared against state-of-the-art baselines, and achieves a favorable balance between accuracy and efficiency for classification at scale. We also share our practical experiences and insights on the challenges and opportunities of using LLMs for large-scale text mining in real-world applications. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 9 pages main content, 8 pages references and appendix

arXiv:2403.12016 [pdf, ps, other]

Ordered and colored subgraph density problems

Authors: Emily Cairncross, Dhruv Mubayi

Abstract: We consider three extremal problems about the number of copies of a fixed graph in another larger graph. First, we correct an error in a result of Reiher and Wagner and prove that the number of $k$-edge stars in a graph with density $x \in [0, 1]$ is asymptotically maximized by a clique and isolated vertices or its complement. Next, among ordered $n$-vertex graphs with $m$ edges, we determine the… ▽ More We consider three extremal problems about the number of copies of a fixed graph in another larger graph. First, we correct an error in a result of Reiher and Wagner and prove that the number of $k$-edge stars in a graph with density $x \in [0, 1]$ is asymptotically maximized by a clique and isolated vertices or its complement. Next, among ordered $n$-vertex graphs with $m$ edges, we determine the maximum and minimum number of copies of a $k$-edge star whose nonleaf vertex is minimum among all vertices of the star. Finally, for $s \ge 2$, we define a particular $3$-edge-colored complete graph $F$ on $2s$ vertices with colors blue, green and red, and determine, for each $(x_b, x_g)$ with $x_b+x_g\le 1$ and $x_b, x_g \ge 0$, the maximum density of $F$ in a large graph whose blue, green and red edge sets have densities $x_b, x_g$ and $1-x_b-x_g$, respectively. These are the first nontrivial examples of colored graphs for which such complete results are proved. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 17 pages, 6 figures

arXiv:2403.08956 [pdf]

doi 10.1109/INCET54531.2022.9825164

AI coach for badminton

Authors: Dhruv Toshniwal, Arpit Patil, Nancy Vachhani

Abstract: In the competitive realm of sports, optimal performance necessitates rigorous management of nutrition and physical conditioning. Specifically, in badminton, the agility and precision required make it an ideal candidate for motion analysis through video analytics. This study leverages advanced neural network methodologies to dissect video footage of badminton matches, aiming to extract detailed ins… ▽ More In the competitive realm of sports, optimal performance necessitates rigorous management of nutrition and physical conditioning. Specifically, in badminton, the agility and precision required make it an ideal candidate for motion analysis through video analytics. This study leverages advanced neural network methodologies to dissect video footage of badminton matches, aiming to extract detailed insights into player kinetics and biomechanics. Through the analysis of stroke mechanics, including hand-hip coordination, leg positioning, and the execution angles of strokes, the research aims to derive predictive models that can suggest improvements in stance, technique, and muscle orientation. These recommendations are designed to mitigate erroneous techniques, reduce the risk of joint fatigue, and enhance overall performance. Utilizing a vast array of data available online, this research correlates players' physical attributes with their in-game movements to identify muscle activation patterns during play. The goal is to offer personalized training and nutrition strategies that align with the specific biomechanical demands of badminton, thereby facilitating targeted performance enhancements. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 7 pages, 11 figures. https://ieeexplore.ieee.org/document/9825164

Journal ref: 2022 3rd International Conference for Emerging Technology (INCET), Belgaum, India, 2022, pp. 1-7

arXiv:2403.08283 [pdf, other]

Optimized Detection and Classification on GTRSB: Advancing Traffic Sign Recognition with Convolutional Neural Networks

Authors: Dhruv Toshniwal, Saurabh Loya, Anuj Khot, Yash Marda

Abstract: In the rapidly evolving landscape of transportation, the proliferation of automobiles has made road traffic more complex, necessitating advanced vision-assisted technologies for enhanced safety and navigation. These technologies are imperative for providing critical traffic sign information, influencing driver behavior, and supporting vehicle control, especially for drivers with disabilities and i… ▽ More In the rapidly evolving landscape of transportation, the proliferation of automobiles has made road traffic more complex, necessitating advanced vision-assisted technologies for enhanced safety and navigation. These technologies are imperative for providing critical traffic sign information, influencing driver behavior, and supporting vehicle control, especially for drivers with disabilities and in the burgeoning field of autonomous vehicles. Traffic sign detection and recognition have emerged as key areas of research due to their essential roles in ensuring road safety and compliance with traffic regulations. Traditional computer vision methods have faced challenges in achieving optimal accuracy and speed due to real-world variabilities. However, the advent of deep learning and Convolutional Neural Networks (CNNs) has revolutionized this domain, offering solutions that significantly surpass previous capabilities in terms of speed and reliability. This paper presents an innovative approach leveraging CNNs that achieves an accuracy of nearly 96\%, highlighting the potential for even greater precision through advanced localization techniques. Our findings not only contribute to the ongoing advancement of traffic sign recognition technology but also underscore the critical impact of these developments on road safety and the future of autonomous driving. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 8 pages, 8 figures, 1 table

arXiv:2403.08195 [pdf, ps, other]

Efficiently verifiable quantum advantage on near-term analog quantum simulators

Authors: Zhenning Liu, Dhruv Devulapalli, Dominik Hangleiter, Yi-Kai Liu, Alicia J. Kollár, Alexey V. Gorshkov, Andrew M. Childs

Abstract: Existing schemes for demonstrating quantum computational advantage are subject to various practical restrictions, including the hardness of verification and challenges in experimental implementation. Meanwhile, analog quantum simulators have been realized in many experiments to study novel physics. In this work, we propose a quantum advantage protocol based on single-step Feynman-Kitaev verificati… ▽ More Existing schemes for demonstrating quantum computational advantage are subject to various practical restrictions, including the hardness of verification and challenges in experimental implementation. Meanwhile, analog quantum simulators have been realized in many experiments to study novel physics. In this work, we propose a quantum advantage protocol based on single-step Feynman-Kitaev verification of an analog quantum simulation, in which the verifier need only run an $O(λ^2)$-time classical computation, and the prover need only prepare $O(1)$ samples of a history state and perform $O(λ^2)$ single-qubit measurements, for a security parameter $λ$. We also propose a near-term feasible strategy for honest provers and discuss potential experimental realizations. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 20 pages, 6 figures

arXiv:2403.07011 [pdf]

Automatic Detection and Classification of Corona Infection (COVID-19) from X-ray Images Using Convolution Neural Network

Authors: Kinjal A Patel, Tanvi Goswami

Abstract: The novel coronavirus universally known as the COVID-19 outbreak arises at the end of 2019 in one of the East Asian countries and it is subjected to widespread discussion and debate. There are almost 200 countries affected across the globe by COVID-19 and it has ruined many lives and the global economy. The virus is spreading very rapidly at the pace of around 10 fold in less than a month. Also, i… ▽ More The novel coronavirus universally known as the COVID-19 outbreak arises at the end of 2019 in one of the East Asian countries and it is subjected to widespread discussion and debate. There are almost 200 countries affected across the globe by COVID-19 and it has ruined many lives and the global economy. The virus is spreading very rapidly at the pace of around 10 fold in less than a month. Also, in the case of COVID- 19 it is critical to detect the infection as it employs various symptoms which may differ from person to person. Hence, diagnosis in starting stage and treatment are very much important for such type of infectious disease. The chest x-ray is one of the primary techniques among blood tests and Computed Tomography contributes a major role in the early diagnosis of COVID-19. There is a rising need for automated and auxiliary diagnostic tools for early diagnosis, as there are no accurate and truthful automated tool kits on hand. In this research study, we have designed a Convolution Neural Network architecture a deep net for the classification of x-ray images of chest among two classes: COVID-19 or Non-COVID- 19 infection. The anticipated model is expected to provide accurate diagnostic results and produced classification accuracy of 99%, 100%, and 100% with 70%-30%,75%-25% and 80%-20% train-test data split respectively, for the binary classification of the x-ray image to be COVID-19 or Non-COVID-19 infection category. We have designed the CNN with optimized parameters with 3 convolution layers and optimized number of filters in each layer. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.04012 [pdf, other]

Temporal Cross-Attention for Dynamic Embedding and Tokenization of Multimodal Electronic Health Records

Authors: Yingbo Ma, Suraj Kolla, Dhruv Kaliraman, Victoria Nolan, Zhenhong Hu, Ziyuan Guan, Yuanfang Ren, Brooke Armfield, Tezcan Ozrazgat-Baslanti, Tyler J. Loftus, Parisa Rashidi, Azra Bihorac, Benjamin Shickel

Abstract: The breadth, scale, and temporal granularity of modern electronic health records (EHR) systems offers great potential for estimating personalized and contextual patient health trajectories using sequential deep learning. However, learning useful representations of EHR data is challenging due to its high dimensionality, sparsity, multimodality, irregular and variable-specific recording frequency, a… ▽ More The breadth, scale, and temporal granularity of modern electronic health records (EHR) systems offers great potential for estimating personalized and contextual patient health trajectories using sequential deep learning. However, learning useful representations of EHR data is challenging due to its high dimensionality, sparsity, multimodality, irregular and variable-specific recording frequency, and timestamp duplication when multiple measurements are recorded simultaneously. Although recent efforts to fuse structured EHR and unstructured clinical notes suggest the potential for more accurate prediction of clinical outcomes, less focus has been placed on EHR embedding approaches that directly address temporal EHR challenges by learning time-aware representations from multimodal patient time series. In this paper, we introduce a dynamic embedding and tokenization framework for precise representation of multimodal clinical time series that combines novel methods for encoding time and sequential position with temporal cross-attention. Our embedding and tokenization framework, when integrated into a multitask transformer classifier with sliding window attention, outperformed baseline approaches on the exemplar task of predicting the occurrence of nine postoperative complications of more than 120,000 major inpatient surgeries using multimodal data from three hospitals and two academic health centers in the United States. △ Less

Submitted 1 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: ICLR 2024 Workshop on Learning From Time Series for Health. 10 pages, 3 figures

Showing 1–50 of 815 results for author: Dhruv