Skip to main content

Showing 1–50 of 1,165 results for author: Gupta, A

Searching in archive cs. Search in all archives.
.
  1. Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet

    Authors: Manish Dhakal, Arman Chhetri, Aman Kumar Gupta, Prabin Lamichhane, Suraj Pandey, Subarna Shakya

    Abstract: This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The majority of the audio dataset have silent gaps at both ends which are clipped during dataset preprocessing for a more uniform map** of audio frames and their corresponding texts. Mel Frequen… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted at 2022 International Conference on Inventive Computation Technologies (ICICT), IEEE

    Journal ref: 2022 International Conference on Inventive Computation Technologies (ICICT), pp. 515-521

  2. arXiv:2406.17279  [pdf, other

    cs.RO cs.AI

    Learning Decentralized Multi-Biped Control for Payload Transport

    Authors: Bikram Pandit, Ashutosh Gupta, Mohitvishnu S. Gadde, Addison Johnson, Aayam Kumar Shrestha, Helei Duan, Jeremy Dao, Alan Fern

    Abstract: Payload transport over flat terrain via multi-wheel robot carriers is well-understood, highly effective, and configurable. In this paper, our goal is to provide similar effectiveness and configurability for transport over rough terrain that is more suitable for legs rather than wheels. For this purpose, we consider multi-biped robot carriers, where wheels are replaced by multiple bipedal robots at… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Submitted to CoRL 2024, Project website: decmbc.github.io

  3. arXiv:2406.15876  [pdf, ps, other

    cs.DS

    Pairwise-Independent Contention Resolution

    Authors: Anupam Gupta, **qiao Hu, Gregory Kehne, Roie Levin

    Abstract: We study online contention resolution schemes (OCRSs) and prophet inequalities for non-product distributions. Specifically, when the active set is sampled according to a pairwise-independent (PI) distribution, we show a $(1-o_k(1))$-selectable OCRS for uniform matroids of rank $k$, and $Ω(1)$-selectable OCRSs for laminar, graphic, cographic, transversal, and regular matroids. These imply prophet i… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Contains new results on t-wise independent CRSs

  4. arXiv:2406.15556  [pdf, other

    cs.CV

    Open-Vocabulary Temporal Action Localization using Multimodal Guidance

    Authors: Akshita Gupta, Aditya Arora, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Graham W. Taylor

    Abstract: Open-Vocabulary Temporal Action Localization (OVTAL) enables a model to recognize any desired action category in videos without the need to explicitly curate training data for all categories. However, this flexibility poses significant challenges, as the model must recognize not only the action categories seen during training but also novel categories specified at inference. Unlike standard tempor… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  5. arXiv:2406.14775  [pdf, other

    physics.ao-ph cs.LG physics.flu-dyn physics.geo-ph

    Machine Learning Global Simulation of Nonlocal Gravity Wave Propagation

    Authors: Aman Gupta, Aditi Sheshadri, Sujit Roy, Vishal Gaur, Manil Maskey, Rahul Ramachandran

    Abstract: Global climate models typically operate at a grid resolution of hundreds of kilometers and fail to resolve atmospheric mesoscale processes, e.g., clouds, precipitation, and gravity waves (GWs). Model representation of these processes and their sources is essential to the global circulation and planetary energy budget, but subgrid scale contributions from these processes are often only approximatel… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 9 pages, 7 figures, no tables

  6. arXiv:2406.11307  [pdf, other

    cs.CL

    An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers

    Authors: Ashim Gupta, Sina Mahdipour Saravani, P. Sadayappan, Vivek Srikumar

    Abstract: The increasing size of transformer-based models in NLP makes the question of compressing them important. In this work, we present a comprehensive analysis of factorization based model compression techniques. Specifically, we focus on comparing straightforward low-rank factorization against the recently introduced Monarch factorization, which exhibits impressive performance preservation on the GLUE… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  7. arXiv:2406.10085  [pdf, other

    cs.CL

    Enhancing Question Answering on Charts Through Effective Pre-training Tasks

    Authors: Ashim Gupta, Vivek Gupta, Shuo Zhang, Yujie He, Ning Zhang, Shalin Shah

    Abstract: To completely understand a document, the use of textual information is not enough. Understanding visual cues, such as layouts and charts, is also required. While the current state-of-the-art approaches for document understanding (both OCR-based and OCR-free) work well, a thorough analysis of their capabilities and limitations has not yet been performed. Therefore, in this work, we addresses the li… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  8. arXiv:2406.08931  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning

    Authors: Arnav Goel, Medha Hira, Anubha Gupta

    Abstract: Advent of modern deep learning techniques has given rise to advancements in the field of Speech Emotion Recognition (SER). However, most systems prevalent in the field fail to generalize to speakers not seen during training. This study focuses on handling challenges of multilingual SER, specifically on unseen speakers. We introduce CAMuLeNet, a novel architecture leveraging co-attention based fusi… ▽ More

    Submitted 19 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 pages, Accepted to INTERSPEECH 2024. The first two authors contributed equally

  9. arXiv:2406.07662  [pdf, other

    eess.IV cs.AI cs.CV cs.LG q-bio.NC

    Progress Towards Decoding Visual Imagery via fNIRS

    Authors: Michel Adamic, Wellington Avelino, Anna Brandenberger, Bryan Chiang, Hunter Davis, Stephen Fay, Andrew Gregory, Aayush Gupta, Raphael Hotter, Grace Jiang, Fiona Leng, Stephen Polcyn, Thomas Ribeiro, Paul Scotti, Michelle Wang, Marley Xiong, Jonathan Xu

    Abstract: We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 2… ▽ More

    Submitted 22 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  10. arXiv:2406.06608  [pdf, other

    cs.CL cs.AI

    The Prompt Report: A Systematic Survey of Prompting Techniques

    Authors: Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hudson Tao, Ashay Srivastava, Hevander Da Costa, Saloni Gupta, Megan L. Rogers, Inna Goncearenco, Giuseppe Sarli, Igor Galynker , et al. (6 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) systems are being increasingly deployed across all parts of industry and research settings. Developers and end users interact with these systems through the use of prompting or prompt engineering. While prompting is a widespread and highly researched concept, there exists conflicting terminology and a poor ontological understanding of what constitutes a p… ▽ More

    Submitted 16 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  11. arXiv:2406.03907  [pdf, other

    cs.CV

    Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following

    Authors: Anshul Gupta, Pierre Vuillecard, Arya Farkhondeh, Jean-Marc Odobez

    Abstract: Contextual cues related to a person's pose and interactions with objects and other people in the scene can provide valuable information for gaze following. While existing methods have focused on dedicated cue extraction methods, in this work we investigate the zero-shot capabilities of Vision-Language Models (VLMs) for extracting a wide array of contextual cues to improve gaze following performanc… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at the GAZE Workshop at CVPR 2024

  12. arXiv:2406.01637  [pdf, other

    cs.MA cs.AI

    Teams of LLM Agents can Exploit Zero-Day Vulnerabilities

    Authors: Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, Daniel Kang

    Abstract: LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities). I… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  13. arXiv:2406.00887  [pdf, ps, other

    cs.RO

    Deep Reinforcement Learning for Sim-to-Real Policy Transfer of VTOL-UAVs Offshore Docking Operations

    Authors: Ali M. Ali, Aryaman Gupta, Hashim A. Hashim

    Abstract: This paper proposes a novel Reinforcement Learning (RL) approach for sim-to-real policy transfer of Vertical Take-Off and Landing Unmanned Aerial Vehicle (VTOL-UAV). The proposed approach is designed for VTOL-UAV landing on offshore docking stations in maritime operations. VTOL-UAVs in maritime operations encounter limitations in their operational range, primarily stemming from constraints imposed… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  14. arXiv:2406.00022  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning

    Authors: Arnav Goel, Medha Hira, Anubha Gupta

    Abstract: The field of prosody transfer in speech synthesis systems is rapidly advancing. This research is focused on evaluating learning methods for adapting pre-trained monolingual text-to-speech (TTS) models to multilingual conditions, i.e., Supervised Fine-Tuning (SFT) and Transfer Learning (TL). This comparison utilizes three distinct metrics: Mean Opinion Score (MOS), Recognition Accuracy (RA), and Me… ▽ More

    Submitted 18 June, 2024; v1 submitted 23 May, 2024; originally announced June 2024.

    Comments: 7 pages, Accepted to ICLR 2024 - Tiny Track

  15. arXiv:2406.00021  [pdf, other

    cs.CL cs.SD eess.AS

    CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning

    Authors: Medha Hira, Arnav Goel, Anubha Gupta

    Abstract: This paper presents CrossVoice, a novel cascade-based Speech-to-Speech Translation (S2ST) system employing advanced ASR, MT, and TTS technologies with cross-lingual prosody preservation through transfer learning. We conducted comprehensive experiments comparing CrossVoice with direct-S2ST systems, showing improved BLEU scores on tasks such as Fisher Es-En, VoxPopuli Fr-En and prosody preservation… ▽ More

    Submitted 18 June, 2024; v1 submitted 23 May, 2024; originally announced June 2024.

    Comments: 8 pages, Accepted at ICLR 2024 - Tiny Track

  16. arXiv:2405.20671  [pdf, other

    cs.LG cs.AI cs.CL

    Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

    Authors: Hanseul Cho, Jaeyoung Cha, Pranjal Awasthi, Srinadh Bhojanapalli, Anupam Gupta, Chulhee Yun

    Abstract: Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to generalize to longer sequences than those encountered during training. To tackle this problem, we propose position coupling, a simple yet effective method that directly embeds the structure of the tasks into the positional encoding of a (decoder-only) Transformer. Taking a departure from the vanilla absol… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 73 pages, 20 figures, 90 tables

  17. arXiv:2405.19706  [pdf, other

    cs.SE cs.CE cs.ET

    Bridging eResearch Infrastructure and Experimental Materials Science Process in the Quantum Data Hub

    Authors: Amarnath Gupta, Shweta Purawat, Subhasis Dasgupta, Pratyush Karmakar, Elaine Chi, Ilkay Altintas

    Abstract: Experimental materials science is experiencing significant growth due to automated experimentation and AI techniques. Integrated autonomous platforms are emerging, combining generative models, robotics, simulations, and automated systems for material synthesis. However, two major challenges remain: democratizing access to these technologies and creating accessible infrastructure for under-resource… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  18. arXiv:2405.19307  [pdf, other

    cs.RO

    Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels

    Authors: Abhay Deshpande, Liyiming Ke, Quinn Pfeifer, Abhishek Gupta, Siddhartha S. Srinivasa

    Abstract: We consider imitation learning with access only to expert demonstrations, whose real-world application is often limited by covariate shift due to compounding errors during execution. We investigate the effectiveness of the Continuity-based Corrective Labels for Imitation Learning (CCIL) framework in mitigating this issue for real-world fine manipulation tasks. CCIL generates corrective labels by l… ▽ More

    Submitted 3 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  19. arXiv:2405.18657  [pdf, other

    cs.NI

    The Efficacy of the Connect America Fund in Addressing US Internet Access Inequities

    Authors: Haarika Manda, Varshika Srinivasavaradhan, Laasya Koduru, Kevin Zhang, Xuanhe Zhou, Udit Paul, Elizabeth Belding, Arpit Gupta, Tejas N. Narechania

    Abstract: Residential fixed broadband internet access in the United States (US) has long been distributed inequitably, drawing significant attention from researchers and policymakers. This paper evaluates the efficacy of the Connect America Fund (CAF), a key policy intervention aimed at addressing disparities in US internet access. CAF subsidizes the creation of new regulated broadband monopolies in underse… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  20. arXiv:2405.14716  [pdf, other

    cs.AI cs.HC

    HTN-Based Tutors: A New Intelligent Tutoring Framework Based on Hierarchical Task Networks

    Authors: Momin N. Siddiqui, Adit Gupta, Jennifer M. Reddig, Christopher J. MacLellan

    Abstract: Intelligent tutors have shown success in delivering a personalized and adaptive learning experience. However, there exist challenges regarding the granularity of knowledge in existing frameworks and the resulting instructions they can provide. To address these issues, we propose HTN-based tutors, a new intelligent tutoring framework that represents expert models using Hierarchical Task Networks (H… ▽ More

    Submitted 23 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted for publication in Proceedings of the Eleventh ACM Conference on Learning @ Scale (L@S'24), July 18--20, 2024, Atlanta, GA, USA

  21. arXiv:2405.13762  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

    Authors: Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli

    Abstract: Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the a… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  22. arXiv:2405.11656  [pdf, other

    cs.RO cs.AI

    URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images

    Authors: Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang, Karthikeya Vemuri, Alan Wu, Dieter Fox, Abhishek Gupta

    Abstract: Constructing simulation scenes that are both visually and physically realistic is a problem of practical interest in domains ranging from robotics to computer vision. This problem has become even more relevant as researchers wielding large data-hungry learning methods seek new sources of training data for physical decision-making systems. However, building simulation models is often still done by… ▽ More

    Submitted 31 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted at RSS2024

  23. arXiv:2405.10750  [pdf, other

    eess.SY cs.LG

    Parameter Identification for Electrochemical Models of Lithium-Ion Batteries Using Bayesian Optimization

    Authors: Jianzong Pi, Samuel Filgueira da Silva, Mehmet Fatih Ozkan, Abhishek Gupta, Marcello Canova

    Abstract: Efficient parameter identification of electrochemical models is crucial for accurate monitoring and control of lithium-ion cells. This process becomes challenging when applied to complex models that rely on a considerable number of interdependent parameters that affect the output response. Gradient-based and metaheuristic optimization techniques, although previously employed for this task, are lim… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 6 pages

  24. arXiv:2405.10490  [pdf

    stat.ME cs.AI cs.IR cs.LG math.OC

    Neural Optimization with Adaptive Heuristics for Intelligent Marketing System

    Authors: Changshuai Wei, Benjamin Zelditch, Joyce Chen, Andre Assuncao Silva T Ribeiro, **gyi Kenneth Tay, Borja Ocejo Elizondo, Keerthi Selvaraj, Aman Gupta, Licurgo Benemann De Almeida

    Abstract: Computational marketing has become increasingly important in today's digital world, facing challenges such as massive heterogeneous data, multi-channel customer journeys, and limited marketing budgets. In this paper, we propose a general framework for marketing AI systems, the Neural Optimization with Adaptive Heuristics (NOAH) framework. NOAH is the first general framework for marketing optimizat… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: KDD 2024

    ACM Class: G.3; G.1.6; I.2

  25. arXiv:2405.08576  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Hearing Touch: Audio-Visual Pretraining for Contact-Rich Manipulation

    Authors: Jared Mejia, Victoria Dean, Tess Hellebrekers, Abhinav Gupta

    Abstract: Although pre-training on a large amount of data is beneficial for robot learning, current paradigms only perform large-scale pretraining for visual representations, whereas representations for other modalities are trained from scratch. In contrast to the abundance of visual data, it is unclear what relevant internet-scale data may be used for pretraining other modalities such as tactile sensing. S… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted to ICRA 2024

  26. arXiv:2405.05530  [pdf, other

    cs.CV

    NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry

    Authors: Yash Khandelwal, Mayur Arvind, Sriram Kumar, Ashish Gupta, Sachin Kumar Danisetty, Piyush Bagad, Anish Madan, Mayank Lunayach, Aditya Annavajjala, Abhishek Maiti, Sansiddh Jain, Aman Dalmia, Namrata Deka, Jerome White, Jigar Doshi, Angjoo Kanazawa, Rahul Panicker, Alpan Raval, Srinivas Rana, Makarand Tapaswi

    Abstract: Malnutrition among newborns is a top public health concern in develo** countries. Identification and subsequent growth monitoring are key to successful interventions. However, this is challenging in rural communities where health systems tend to be inaccessible and under-equipped, with poor adherence to protocol. Our goal is to equip health workers and public health systems with a solution for c… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted at CVPM Workshop at CVPR 2024

  27. arXiv:2405.04023  [pdf, other

    eess.IV cs.CV

    Lumbar Spine Tumor Segmentation and Localization in T2 MRI Images Using AI

    Authors: Rikathi Pal, Sudeshna Mondal, Aditi Gupta, Priya Saha, Somoballi Ghoshal, Amlan Chakrabarti, Susmita Sur-Kolay

    Abstract: In medical imaging, segmentation and localization of spinal tumors in three-dimensional (3D) space pose significant computational challenges, primarily stemming from limited data availability. In response, this study introduces a novel data augmentation technique, aimed at automating spine tumor segmentation and localization through AI approaches. Leveraging a fusion of fuzzy c-means clustering an… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 9 pages, 12 figures

  28. arXiv:2405.03594  [pdf, other

    cs.CL cs.AI

    Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

    Authors: Abhinav Agarwalla, Abhay Gupta, Alexandre Marques, Shubhra Pandit, Michael Goin, Eldar Kurtic, Kevin Leong, Tuan Nguyen, Mahmoud Salem, Dan Alistarh, Sean Lie, Mark Kurtz

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks. We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs that achieve full accuracy recovery for fine-tuning tasks at up to 70% sparsity. We achieve this for the LLaMA-2 7B model by combining the SparseGPT one-shot pruning me… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  29. arXiv:2405.01610  [pdf, other

    cs.CL cs.IR

    Automating the Analysis of Public Saliency and Attitudes towards Biodiversity from Digital Media

    Authors: Noah Giebink, Amrita Gupta, Diogo Verìssimo, Charlotte H. Chang, Tony Chang, Angela Brennan, Brett Dickson, Alex Bowmer, Jonathan Baillie

    Abstract: Measuring public attitudes toward wildlife provides crucial insights into our relationship with nature and helps monitor progress toward Global Biodiversity Framework targets. Yet, conducting such assessments at a global scale is challenging. Manually curating search terms for querying news and social media is tedious, costly, and can lead to biased results. Raw news and social media data returned… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: v0.1, 21 pages with 10 figures

  30. arXiv:2405.01527  [pdf, other

    cs.RO cs.CV

    Track2Act: Predicting Point Tracks from Internet Videos enables Diverse Zero-shot Robot Manipulation

    Authors: Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta, Shubham Tulsiani

    Abstract: We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation: interacting with unseen objects in novel scenes without test-time adaptation. While typical approaches rely on a large amount of demonstration data for such generalization, we propose an approach that leverages web videos to predict plausible interaction plans and learns a task-agnostic transformati… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: preprint

  31. arXiv:2405.00664  [pdf, other

    cs.CL cs.AI cs.LG

    Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3

    Authors: Junsang Yoon, Akshat Gupta, Gopala Anumanchipalli

    Abstract: This study presents a targeted model editing analysis focused on the latest large language model, Llama-3. We explore the efficacy of popular model editing techniques - ROME, MEMIT, and EMMET, which are designed for precise layer interventions. We identify the most effective layers for targeted edits through an evaluation that encompasses up to 4096 edits across three distinct strategies: sequenti… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  32. arXiv:2404.17701  [pdf, other

    cs.AR cs.LG physics.ins-det

    Embedded FPGA Developments in 130nm and 28nm CMOS for Machine Learning in Particle Detector Readout

    Authors: Julia Gonski, Aseem Gupta, Haoyi Jia, Hyunjoon Kim, Lorenzo Rota, Larry Ruckman, Angelo Dragone, Ryan Herbst

    Abstract: Embedded field programmable gate array (eFPGA) technology allows the implementation of reconfigurable logic within the design of an application-specific integrated circuit (ASIC). This approach offers the low power and efficiency of an ASIC along with the ease of FPGA configuration, particularly beneficial for the use case of machine learning in the data pipeline of next-generation collider experi… ▽ More

    Submitted 1 July, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: 16 pages, 12 figures

  33. arXiv:2404.17439  [pdf, other

    cs.NI

    Enhancing QoE in HTTP/3 using EPS Framework

    Authors: Abhinav Gupta, Radim Bartos

    Abstract: HTTP/3, the latest evolution of the Hypertext Transfer Protocol, utilizes QUIC, a new transport protocol leveraging UDP to overcome limitations such as connection time and head-of-line blocking prevalent in HTTP/2. This advancement is enhanced by the Extensible Prioritization Scheme (EPS), which introduces a flexible prioritization framework for improving website resource delivery. This paper prop… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  34. arXiv:2404.15565  [pdf, other

    cs.CL

    CASPR: Automated Evaluation Metric for Contrastive Summarization

    Authors: Nirupan Ananthamurugan, Dat Duong, Philip George, Ankita Gupta, Sandeep Tata, Beliz Gunel

    Abstract: Summarizing comparative opinions about entities (e.g., hotels, phones) from a set of source reviews, often referred to as contrastive summarization, can considerably aid users in decision making. However, reliably measuring the contrastiveness of the output summaries without relying on human evaluations remains an open problem. Prior work has proposed token-overlap based metrics, Distinctiveness S… ▽ More

    Submitted 13 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  35. arXiv:2404.14779  [pdf, other

    cs.CL

    Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches

    Authors: Clément Christophe, Praveen K Kanithi, Prateek Munjal, Tathagata Raha, Nasir Hayat, Ronnie Rajan, Ahmed Al-Mahrooqi, Avani Gupta, Muhammad Umar Salman, Gurpreet Gosal, Bhargav Kanakiya, Charles Chen, Natalia Vassilieva, Boulbaba Ben Amor, Marco AF Pimentel, Shadab Khan

    Abstract: This study presents a comprehensive analysis and comparison of two predominant fine-tuning methodologies - full-parameter fine-tuning and parameter-efficient tuning - within the context of medical Large Language Models (LLMs). We developed and refined a series of LLMs, based on the Llama-2 architecture, specifically designed to enhance medical knowledge retrieval, reasoning, and question-answering… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Published at AAAI 2024 Spring Symposium - Clinical Foundation Models

  36. arXiv:2404.14735  [pdf, other

    cs.RO

    Rank2Reward: Learning Shaped Reward Functions from Passive Video

    Authors: Daniel Yang, Davin Tjia, Jacob Berg, Dima Damen, Pulkit Agrawal, Abhishek Gupta

    Abstract: Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation puts a heavy burden on human supervisors. In contrast to this paradigm, it is often significantly easier to provide raw, action-free visual data of tasks being performed. Moreover, this data can even be mined from video datasets or the web. Ideally, this data… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: ICRA 2024

  37. arXiv:2404.13460  [pdf, other

    cs.NI

    Improving Web Content Delivery with HTTP/3 and Non-Incremental EPS

    Authors: Abhinav Gupta, Radim Bartos

    Abstract: HTTP/3 marks a significant advancement in protocol development, utilizing QUIC as its underlying transport layer to exploit multiplexing capabilities and minimize head-of-line blocking. The introduction of the Extensible Prioritization Scheme (EPS) offers a signaling mechanism for controlling the order of resource delivery. In this study, we propose map**s from Chromium priority hints to EPS urg… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  38. arXiv:2404.13377  [pdf, other

    cs.NE

    Bridging the Gap Between Theory and Practice: Benchmarking Transfer Evolutionary Optimization

    Authors: Yaqing Hou, Wenqiang Ma, Abhishek Gupta, Kavitesh Kumar Bali, Hongwei Ge, Qiang Zhang, Carlos A. Coello Coello, Yew-Soon Ong

    Abstract: In recent years, the field of Transfer Evolutionary Optimization (TrEO) has witnessed substantial growth, fueled by the realization of its profound impact on solving complex problems. Numerous algorithms have emerged to address the challenges posed by transferring knowledge between tasks. However, the recently highlighted ``no free lunch theorem'' in transfer optimization clarifies that no single… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 17 pages, 18 figures

  39. Visualizing Intelligent Tutor Interactions for Responsive Pedagogy

    Authors: Grace Guo, Aishwarya Mudgal Sunil Kumar, Adit Gupta, Adam Coscia, Chris MacLellan, Alex Endert

    Abstract: Intelligent tutoring systems leverage AI models of expert learning and student knowledge to deliver personalized tutoring to students. While these intelligent tutors have demonstrated improved student learning outcomes, it is still unclear how teachers might integrate them into curriculum and course planning to support responsive pedagogy. In this paper, we conducted a design study with five teach… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 9 pages, 5 figures, ACM AVI 2024

  40. arXiv:2404.12383  [pdf, ps, other

    cs.CV

    G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

    Authors: Yufei Ye, Abhinav Gupta, Kris Kitani, Shubham Tulsiani

    Abstract: We propose G-HOP, a denoising diffusion based generative prior for hand-object interactions that allows modeling both the 3D object and a human hand, conditioned on the object category. To learn a 3D spatial diffusion model that can capture this joint distribution, we represent the human hand via a skeletal distance field to obtain a representation aligned with the (latent) signed distance field f… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: accepted to CVPR2024; project page at https://judyye.github.io/ghop-www

  41. arXiv:2404.12308  [pdf, other

    cs.RO cs.LG eess.SY

    ASID: Active Exploration for System Identification in Robotic Manipulation

    Authors: Marius Memmel, Andrew Wagenmaker, Chuning Zhu, Patrick Yin, Dieter Fox, Abhishek Gupta

    Abstract: Model-free control strategies such as reinforcement learning have shown the ability to learn control strategies without requiring an accurate model or simulator of the world. While this is appealing due to the lack of modeling requirements, such methods can be sample inefficient, making them impractical in many real-world domains. On the other hand, model-based control techniques leveraging accura… ▽ More

    Submitted 26 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Project website at https://weirdlabuw.github.io/asid

  42. arXiv:2404.08144  [pdf, other

    cs.CR cs.AI

    LLM Agents can Autonomously Exploit One-day Vulnerabilities

    Authors: Richard Fang, Rohan Bindu, Akul Gupta, Daniel Kang

    Abstract: LLMs have becoming increasingly powerful, both in their benign and malicious uses. With the increase in capabilities, researchers have been increasingly interested in their ability to exploit cybersecurity vulnerabilities. In particular, recent work has conducted preliminary studies on the ability of LLM agents to autonomously hack websites. However, these studies are limited to simple vulnerabili… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  43. arXiv:2404.07883  [pdf, other

    cs.HC cs.AI

    Apprentice Tutor Builder: A Platform For Users to Create and Personalize Intelligent Tutors

    Authors: Glen Smith, Adit Gupta, Christopher MacLellan

    Abstract: Intelligent tutoring systems (ITS) are effective for improving students' learning outcomes. However, their development is often complex, time-consuming, and requires specialized programming and tutor design knowledge, thus hindering their widespread application and personalization. We present the Apprentice Tutor Builder (ATB) , a platform that simplifies tutor creation and personalization. Instru… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  44. arXiv:2404.03981  [pdf, other

    cs.CG

    Approximation Schemes for Geometric Knapsack for Packing Spheres and Fat Objects

    Authors: Pritam Acharya, Sujoy Bhore, Aaryan Gupta, Arindam Khan, Bratin Mondal, Andreas Wiese

    Abstract: We study the geometric knapsack problem in which we are given a set of $d$-dimensional objects (each with associated profits) and the goal is to find the maximum profit subset that can be packed non-overlap**ly into a given $d$-dimensional (unit hypercube) knapsack. Even if $d=2$ and all input objects are disks, this problem is known to be NP-hard [Demaine, Fekete, Lang, 2010]. In this paper, we… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 28 pages, 8 figures

  45. arXiv:2404.02872  [pdf, other

    cs.AI

    Integrating Explanations in Learning LTL Specifications from Demonstrations

    Authors: Ashutosh Gupta, John Komp, Abhay Singh Rajput, Krishna Shankaranarayanan, Ashutosh Trivedi, Namrita Varshney

    Abstract: This paper investigates whether recent advances in Large Language Models (LLMs) can assist in translating human explanations into a format that can robustly support learning Linear Temporal Logic (LTL) from demonstrations. Both LLMs and optimization-based methods can extract LTL specifications from demonstrations; however, they have distinct limitations. LLMs can quickly generate solutions and inc… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 21 Pages, 13 Page Appendix

    ACM Class: I.2.8

  46. arXiv:2404.01812  [pdf, other

    cs.RO cs.AI

    Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions

    Authors: Saptarshi Dasgupta, Akshat Gupta, Shreshth Tuli, Rohan Paul

    Abstract: Manipulating unseen objects is challenging without a 3D representation, as objects generally have occluded surfaces. This requires physical interaction with objects to build their internal representations. This paper presents an approach that enables a robot to rapidly learn the complete 3D model of a given object for manipulation in unfamiliar orientations. We use an ensemble of partially constru… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  47. arXiv:2404.01292  [pdf, other

    cs.CV cs.LG

    Measuring Style Similarity in Diffusion Models

    Authors: Gowthami Somepalli, Anubhav Gupta, Kamal Gupta, Shramay Palta, Micah Goldblum, Jonas Gei**, Abhinav Shrivastava, Tom Goldstein

    Abstract: Generative models are now widely used by graphic designers and artists. Prior works have shown that these models remember and often replicate content from their training data during generation. Hence as their proliferation increases, it has become important to perform a database search to determine whether the properties of the image are attributable to specific training data, every time before a… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  48. arXiv:2404.01282  [pdf, other

    cs.CV

    LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization

    Authors: Akshita Gupta, Gaurav Mittal, Ahmed Magooda, Ye Yu, Graham W. Taylor, Mei Chen

    Abstract: Temporal Action Localization (TAL) involves localizing and classifying action snippets in an untrimmed video. The emergence of large video foundation models has led RGB-only video backbones to outperform previous methods needing both RGB and optical flow modalities. Leveraging these large models is often limited to training only the TAL head due to the prohibitively large GPU memory required to ad… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  49. arXiv:2404.00538  [pdf, ps, other

    cs.CR stat.AP

    Eclipse Attack Detection on a Blockchain Network as a Non-Parametric Change Detection Problem

    Authors: Anurag Gupta, Vikram Krishnamurthy, Brian M. Sadler

    Abstract: This paper introduces a novel non-parametric change detection algorithm to identify eclipse attacks on a blockchain network; the non-parametric algorithm relies only on the empirical mean and variance of the dataset, making it highly adaptable. An eclipse attack occurs when malicious actors isolate blockchain users, disrupting their ability to reach consensus with the broader network, thereby dist… ▽ More

    Submitted 30 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

  50. arXiv:2403.19885  [pdf, other

    cs.CV cs.RO

    Towards Long Term SLAM on Thermal Imagery

    Authors: Colin Keil, Aniket Gupta, Pushyami Kaveti, Hanumant Singh

    Abstract: Visual SLAM with thermal imagery, and other low contrast visually degraded environments such as underwater, or in areas dominated by snow and ice, remain a difficult problem for many state of the art (SOTA) algorithms. In addition to challenging front-end data association, thermal imagery presents an additional difficulty for long term relocalization and map reuse. The relative temperatures of obj… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, Submitted to IROS 2024