Skip to main content

Showing 1–50 of 380 results for author: Gupta, K

Searching in archive cs. Search in all archives.
.
  1. Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet

    Authors: Manish Dhakal, Arman Chhetri, Aman Kumar Gupta, Prabin Lamichhane, Suraj Pandey, Subarna Shakya

    Abstract: This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The majority of the audio dataset have silent gaps at both ends which are clipped during dataset preprocessing for a more uniform map** of audio frames and their corresponding texts. Mel Frequen… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted at 2022 International Conference on Inventive Computation Technologies (ICICT), IEEE

    Journal ref: 2022 International Conference on Inventive Computation Technologies (ICICT), pp. 515-521

  2. arXiv:2406.16965  [pdf, other

    cs.LG cs.AI cs.CY

    Present and Future of AI in Renewable Energy Domain : A Comprehensive Survey

    Authors: Abdur Rashid, Parag Biswas, Angona Biswas, MD Abdullah Al Nasim, Kishor Datta Gupta, Roy George

    Abstract: Artificial intelligence (AI) has become a crucial instrument for streamlining processes in various industries, including electrical power systems, as a result of recent digitalization. Algorithms for artificial intelligence are data-driven models that are based on statistical learning theory and are used as a tool to take use of the data that the power system and its users generate. Initially, we… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  3. arXiv:2406.15732  [pdf, other

    cs.AI

    AI-Driven Approaches for Optimizing Power Consumption: A Comprehensive Survey

    Authors: Parag Biswas, Abdur Rashid, Angona Biswas, Md Abdullah Al Nasim, Kishor Datta Gupta, Roy George

    Abstract: Reduced environmental effect, lower operating costs, and a stable and sustainable energy supply for current and future generations are the main reasons why power optimization is important. Power optimization makes ensuring that energy is used more effectively, cutting down on waste and optimizing the utilization of resources.In today's world, power optimization and artificial intelligence (AI) int… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  4. arXiv:2406.10528  [pdf, other

    cs.LG

    Memory Faults in Activation-sparse Quantized Deep Neural Networks: Analysis and Mitigation using Sharpness-aware Training

    Authors: Akul Malhotra, Sumeet Kumar Gupta

    Abstract: Improving the hardware efficiency of deep neural network (DNN) accelerators with techniques such as quantization and sparsity enhancement have shown an immense promise. However, their inference accuracy in non-ideal real-world settings (such as in the presence of hardware faults) is yet to be systematically analyzed. In this work, we investigate the impact of memory faults on activation-sparse qua… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2301.00675

  5. arXiv:2406.08900  [pdf, other

    eess.AS cs.SD eess.SP

    On Improving Error Resilience of Neural End-to-End Speech Coders

    Authors: Kishan Gupta, Nicola Pia, Srikanth Korse, Andreas Brendel, Guillaume Fuchs, Markus Multrus

    Abstract: Error resilient tools like Packet Loss Concealment (PLC) and Forward Error Correction (FEC) are essential to maintain a reliable speech communication for applications like Voice over Internet Protocol (VoIP), where packets are frequently delayed and lost. In recent times, end-to-end neural speech codecs have seen a significant rise, due to their ability to transmit speech signal at low bitrates bu… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  6. arXiv:2406.06908  [pdf, other

    cs.CV

    UVIS: Unsupervised Video Instance Segmentation

    Authors: Shuaiyi Huang, Saksham Suri, Kamal Gupta, Sai Saketh Rambhatla, Ser-nam Lim, Abhinav Shrivastava

    Abstract: Video instance segmentation requires classifying, segmenting, and tracking every object across video frames. Unlike existing approaches that rely on masks, boxes, or category labels, we propose UVIS, a novel Unsupervised Video Instance Segmentation (UVIS) framework that can perform video instance segmentation without any video annotations or dense label-based pretraining. Our key insight comes fro… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: CVPR2024 Workshop

  7. arXiv:2406.03565  [pdf, other

    cs.GT cs.MA eess.SY

    Second-Order Algorithms for Finding Local Nash Equilibria in Zero-Sum Games

    Authors: Kushagra Gupta, Xinjie Liu, Ufuk Topcu, David Fridovich-Keil

    Abstract: Zero-sum games arise in a wide variety of problems, including robust optimization and adversarial learning. However, algorithms deployed for finding a local Nash equilibrium in these games often converge to non-Nash stationary points. This highlights a key challenge: for any algorithm, the stability properties of its underlying dynamical system can cause non-Nash points to be potential attractors.… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  8. arXiv:2405.13063  [pdf, other

    physics.ao-ph cs.LG

    Aurora: A Foundation Model of the Atmosphere

    Authors: Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan Weyn, Haiyu Dong, Anna Vaughan, Jayesh K. Gupta, Kit Tambiratnam, Alex Archibald, Elizabeth Heider, Max Welling, Richard E. Turner, Paris Perdikaris

    Abstract: Deep learning foundation models are revolutionizing many facets of science by leveraging vast amounts of data to learn general-purpose representations that can be adapted to tackle diverse downstream tasks. Foundation models hold the promise to also transform our ability to model our planet and its subsystems by exploiting the vast expanse of Earth system data. Here we introduce Aurora, a large-sc… ▽ More

    Submitted 28 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  9. arXiv:2405.11775  [pdf, other

    cs.CL cs.LG

    Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques

    Authors: Siva Rajesh Kasa, Aniket Goel, Karan Gupta, Sumegh Roychowdhury, Anish Bhanushali, Nikhil Pattisapu, Prasanna Srinivasa Murthy

    Abstract: Ordinal Classification (OC) is a widely encountered challenge in Natural Language Processing (NLP), with applications in various domains such as sentiment analysis, rating prediction, and more. Previous approaches to tackle OC have primarily focused on modifying existing or creating novel loss functions that \textbf{explicitly} account for the ordinal nature of labels. However, with the advent of… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Findings of ACL 2024

  10. arXiv:2405.11458  [pdf, other

    cs.AI eess.SY

    CPS-LLM: Large Language Model based Safe Usage Plan Generator for Human-in-the-Loop Human-in-the-Plant Cyber-Physical System

    Authors: Ayan Banerjee, Aranyak Maity, Payal Kamboj, Sandeep K. S. Gupta

    Abstract: We explore the usage of large language models (LLM) in human-in-the-loop human-in-the-plant cyber-physical systems (CPS) to translate a high-level prompt into a personalized plan of actions, and subsequently convert that plan into a grounded inference of sequential decision-making automated by a real-world CPS controller to achieve a control goal. We show that it is relatively straightforward to c… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted for publication in AAAI 2024, Planning for Cyber Physical Systems

  11. arXiv:2405.08417  [pdf, other

    eess.AS cs.SD

    Simple and Efficient Quantization Techniques for Neural Speech Coding

    Authors: Andreas Brendel, Nicola Pia, Kishan Gupta, Guillaume Fuchs, Markus Multrus

    Abstract: Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the art, where a discrete representation in the bottleneck of the autoencoder has to be learned that allows for efficient transmission of the input audio signal. This… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  12. arXiv:2405.06712  [pdf, other

    cs.CL cs.AI

    Digital Diagnostics: The Potential Of Large Language Models In Recognizing Symptoms Of Common Illnesses

    Authors: Gaurav Kumar Gupta, Aditi Singh, Sijo Valayakkad Manikandan, Abul Ehtesham

    Abstract: The recent swift development of LLMs like GPT-4, Gemini, and GPT-3.5 offers a transformative opportunity in medicine and healthcare, especially in digital diagnostics. This study evaluates each model diagnostic abilities by interpreting a user symptoms and determining diagnoses that fit well with common illnesses, and it demonstrates how each of these models could significantly increase diagnostic… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 14 pages, 4 figures

  13. arXiv:2404.17922  [pdf, other

    cs.CV cs.RO

    Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM

    Authors: Laksh Nanwani, Kumaraditya Gupta, Aditya Mathur, Swayam Agrawal, A. H. Abdul Hafez, K. Madhava Krishna

    Abstract: Humans excel at forming mental maps of their surroundings, equip** them to understand object relationships and navigate based on language queries. Our previous work SI Maps [1] showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasi… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  14. arXiv:2404.15549  [pdf, other

    cs.CL cs.AI

    PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models

    Authors: Shashi Kant Gupta, Aditya Basu, Mauro Nievas, Jerrin Thomas, Nathan Wolfrath, Adhitya Ramamurthi, Bradley Taylor, Anai N. Kothari, Regina Schwind, Therica M. Miller, Sorena Nadaf-Rahrov, Yanshan Wang, Hrituraj Singh

    Abstract: Clinical trial matching is the task of identifying trials for which patients may be potentially eligible. Typically, this task is labor-intensive and requires detailed verification of patient electronic health records (EHRs) against the stringent inclusion and exclusion criteria of clinical trials. This process is manual, time-intensive, and challenging to scale up, resulting in many patients miss… ▽ More

    Submitted 26 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: 30 Pages, 8 Figures, Supplementary Work Attached

  15. arXiv:2404.15351  [pdf, other

    eess.SP cs.HC cs.LG

    Integrating Physiological Data with Large Language Models for Empathic Human-AI Interaction

    Authors: Poorvesh Dongre, Majid Behravan, Kunal Gupta, Mark Billinghurst, Denis Gračanin

    Abstract: This paper explores enhancing empathy in Large Language Models (LLMs) by integrating them with physiological data. We propose a physiological computing approach that includes develo** deep learning models that use physiological data for recognizing psychological states and integrating the predicted states with LLMs for empathic interaction. We showcase the application of this approach in an Empa… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  16. arXiv:2404.10179  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Scaling Instructable Agents Across Many Simulated Worlds

    Authors: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi , et al. (68 additional authors not shown)

    Abstract: Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructio… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 March, 2024; originally announced April 2024.

  17. arXiv:2404.07670  [pdf, ps, other

    cs.IT cs.ET math.CO

    On Naisargik Images of Varshamov-Tenengolts and Helberg Codes

    Authors: Kalp Pandya, Devdeep Shetranjiwala, Naisargi Savaliya, Manish K. Gupta

    Abstract: The VT and Helberg codes, both in binary and non-binary forms, stand as elegant solutions for rectifying insertion and deletion errors. In this paper we consider the quaternary versions of these codes. It is well known that many optimal binary non-linear codes like Kerdock and Prepreta can be depicted as Gray images (isometry) of codes defined over $\mathbb{Z}_4$. Thus a natural question arises: C… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 20 pages, 18 Tables, draft, data is at https://github.com/guptalab/GrayVT

  18. arXiv:2404.06680  [pdf, other

    cs.CL

    Onco-Retriever: Generative Classifier for Retrieval of EHR Records in Oncology

    Authors: Shashi Kant Gupta, Aditya Basu, Bradley Taylor, Anai Kothari, Hrituraj Singh

    Abstract: Retrieving information from EHR systems is essential for answering specific questions about patient journeys and improving the delivery of clinical care. Despite this fact, most EHR systems still rely on keyword-based searches. With the advent of generative large language models (LLMs), retrieving information can lead to better search and summarization capabilities. Such retrievers can also feed R… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 18 pages

  19. arXiv:2404.06442  [pdf, other

    cs.CV cs.RO

    QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding

    Authors: Yash Mehan, Kumaraditya Gupta, Rohit Jayanti, Anirudh Govil, Sourav Garg, Madhava Krishna

    Abstract: Understanding the structural organisation of 3D indoor scenes in terms of rooms is often accomplished via floorplan extraction. Robotic tasks such as planning and navigation require a semantic understanding of the scene as well. This is typically achieved via object-level semantic segmentation. However, such methods struggle to segment out topological regions like "kitchen" in the scene. In this w… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  20. arXiv:2404.04877  [pdf, other

    cs.IT cs.CY cs.ET

    A Bird-Eye view on DNA Storage Simulators

    Authors: Sanket Doshi, Mihir Gohel, Manish K. Gupta

    Abstract: In the current world due to the huge demand for storage, DNA-based storage solution sounds quite promising because of their longevity, low power consumption, and high capacity. However in real life storing data in the form of DNA is quite expensive, and challenging. Therefore researchers and developers develop such kind of software that helps simulate real-life DNA storage without worrying about t… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 19 pages, 19 figures, draft, review

  21. arXiv:2404.01292  [pdf, other

    cs.CV cs.LG

    Measuring Style Similarity in Diffusion Models

    Authors: Gowthami Somepalli, Anubhav Gupta, Kamal Gupta, Shramay Palta, Micah Goldblum, Jonas Gei**, Abhinav Shrivastava, Tom Goldstein

    Abstract: Generative models are now widely used by graphic designers and artists. Prior works have shown that these models remember and often replicate content from their training data during generation. Hence as their proliferation increases, it has become important to perform a database search to determine whether the properties of the image are attributable to specific training data, every time before a… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  22. arXiv:2404.00846  [pdf, other

    cs.CV cs.LG

    Transfer Learning with Point Transformers

    Authors: Kartik Gupta, Rahul Vippala, Sahima Srivastava

    Abstract: Point Transformers are near state-of-the-art models for classification, segmentation, and detection tasks on Point Cloud data. They utilize a self attention based mechanism to model large range spatial dependencies between multiple point sets. In this project we explore two things: classification performance of these attention based networks on ModelNet10 dataset and then, we use the trained model… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  23. arXiv:2404.00399  [pdf, other

    cs.CL cs.AI cs.LG

    Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

    Authors: Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak , et al. (20 additional authors not shown)

    Abstract: Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, where… ▽ More

    Submitted 23 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Preprint

  24. arXiv:2404.00191  [pdf

    cs.CV

    Optimal Blackjack Strategy Recommender: A Comprehensive Study on Computer Vision Integration for Enhanced Gameplay

    Authors: Krishnanshu Gupta, Devon Bolt, Ben Hinchliff

    Abstract: This research project investigates the application of several computer vision techniques for playing card detection and recognition in the context of the popular casino game, blackjack. The primary objective is to develop a robust system that is capable of detecting and accurately classifying playing cards in real-time, and displaying the optimal move recommendation based on the given image of the… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: 24 pages, 13 figures

    ACM Class: I.4.9; I.5.3; I.5.4

  25. arXiv:2403.15170  [pdf, other

    cs.LG cs.AI eess.SP

    Exploring the Task-agnostic Trait of Self-supervised Learning in the Context of Detecting Mental Disorders

    Authors: Rohan Kumar Gupta, Rohit Sinha

    Abstract: Self-supervised learning (SSL) has been investigated to generate task-agnostic representations across various domains. However, such investigation has not been conducted for detecting multiple mental disorders. The rationale behind the existence of a task-agnostic representation lies in the overlap** symptoms among multiple mental disorders. Consequently, the behavioural data collected for menta… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  26. arXiv:2403.14625  [pdf, other

    cs.CV

    LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors

    Authors: Saksham Suri, Matthew Walmer, Kamal Gupta, Abhinav Shrivastava

    Abstract: We present a simple self-supervised method to enhance the performance of ViT features for dense downstream tasks. Our Lightweight Feature Transform (LiFT) is a straightforward and compact postprocessing network that can be applied to enhance the features of any pre-trained ViT backbone. LiFT is fast and easy to train with a self-supervised objective, and it boosts the density of ViT features for m… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  27. arXiv:2403.10372  [pdf, other

    cs.CR

    Construction of all MDS and involutory MDS matrices

    Authors: Yogesh Kumar, P. R. Mishra, Susanta Samanta, Kishan Chand Gupta, Atul Gaur

    Abstract: In this paper, we propose two algorithms for a hybrid construction of all $n\times n$ MDS and involutory MDS matrices over a finite field $\mathbb{F}_{p^m}$, respectively. The proposed algorithms effectively narrow down the search space to identify $(n-1) \times (n-1)$ MDS matrices, facilitating the generation of all $n \times n$ MDS and involutory MDS matrices over $\mathbb{F}_{p^m}$. To the best… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  28. arXiv:2403.09037  [pdf, other

    cs.CV cs.CL

    The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?

    Authors: Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gould

    Abstract: Large vision-language models (LVLMs), designed to interpret and respond to human instructions, occasionally generate hallucinated or harmful content due to inappropriate instructions. This study uses linear probing to shed light on the hidden knowledge at the output layer of LVLMs. We demonstrate that the logit distributions of the first tokens contain sufficient information to determine whether t… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Under review. Project page: https://github.com/Qinyu-Allen-Zhao/LVLM-LP

  29. arXiv:2403.08763  [pdf, other

    cs.LG cs.AI cs.CL

    Simple and Scalable Strategies to Continually Pre-train Large Language Models

    Authors: Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

    Abstract: Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptati… ▽ More

    Submitted 26 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  30. arXiv:2403.01927  [pdf, other

    q-bio.GN cs.CV q-bio.QM q-bio.TO

    Advancing Gene Selection in Oncology: A Fusion of Deep Learning and Sparsity for Precision Gene Selection

    Authors: Akhila Krishna, Ravi Kant Gupta, Pranav Jeevan, Amit Sethi

    Abstract: Gene selection plays a pivotal role in oncology research for improving outcome prediction accuracy and facilitating cost-effective genomic profiling for cancer patients. This paper introduces two gene selection strategies for deep learning-based survival prediction models. The first strategy uses a sparsity-inducing method while the second one uses importance based gene selection for identifying r… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  31. arXiv:2402.18128  [pdf, other

    cs.CV cs.LG

    Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization

    Authors: Han Guo, Ramtin Hosseini, Ruiyi Zhang, Sai Ashish Somayajula, Ranak Roy Chowdhury, Rajesh K. Gupta, Pengtao Xie

    Abstract: Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation of MAE lies in its disregard for the varying informativeness of different patches, as it uniformly selects patches to mask. To overcome this, some approaches pr… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  32. arXiv:2402.01801  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Models for Time Series: A Survey

    Authors: Xiyuan Zhang, Ranak Roy Chowdhury, Rajesh K. Gupta, **gbo Shang

    Abstract: Large Language Models (LLMs) have seen significant use in domains such as natural language processing and computer vision. Going beyond text, image and graphics, LLMs present a significant potential for analysis of time series data, benefiting domains such as climate, IoT, healthcare, traffic, audio and finance. This survey paper provides an in-depth exploration and a detailed taxonomy of the vari… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: GitHub repository: https://github.com/xiyuanzh/awesome-llm-time-series

  33. arXiv:2402.00865  [pdf, other

    cs.CV cs.LG

    Towards Optimal Feature-Sha** Methods for Out-of-Distribution Detection

    Authors: Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gould

    Abstract: Feature sha** refers to a family of methods that exhibit state-of-the-art performance for out-of-distribution (OOD) detection. These approaches manipulate the feature representation, typically from the penultimate layer of a pre-trained deep learning model, so as to better differentiate between in-distribution (ID) and OOD samples. However, existing feature-sha** methods usually employ rules m… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: ICLR 2024. Project page: https://github.com/Qinyu-Allen-Zhao/OptFSOOD

  34. arXiv:2401.12789  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

    Authors: W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath

    Abstract: In the era of large models, the autoregressive nature of decoding often results in latency serving as a significant bottleneck. We propose a non-autoregressive LM-fused ASR system that effectively leverages the parallelization capabilities of accelerator hardware. Our approach combines the Universal Speech Model (USM) and the PaLM 2 language model in per-segment scoring mode, achieving an average… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  35. arXiv:2401.04636  [pdf, other

    cs.IT eess.SP

    On the Target Detection Performance of a Molecular Communication Network with Multiple Mobile Nanomachines

    Authors: Nithin V. Sabu, Abhishek K. Gupta

    Abstract: A network of nanomachines (NMs) can be used to build a target detection system for a variety of promising applications. They have the potential to detect toxic chemicals, infectious bacteria, and biomarkers of dangerous diseases such as cancer within the human body. Many diseases and health disorders can be detected early and efficiently treated in the future by utilizing these systems. To fully g… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  36. arXiv:2312.16549  [pdf, other

    cs.LG cs.AI cs.CL

    How Robust are LLMs to In-Context Majority Label Bias?

    Authors: Karan Gupta, Sumegh Roychowdhury, Siva Rajesh Kasa, Santhosh Kumar Kasa, Anish Bhanushali, Nikhil Pattisapu, Prasanna Srinivasa Murthy

    Abstract: In the In-Context Learning (ICL) setup, various forms of label biases can manifest. One such manifestation is majority label bias, which arises when the distribution of labeled examples in the in-context samples is skewed towards one or more specific classes making Large Language Models (LLMs) more prone to predict those labels. Such discrepancies can arise from various factors, including logistic… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: 6 pages, 3 figures, 2 table. Accepted at Workshop on Responsible Language Modeling, AAAI 2024, (www.aaai.org)

  37. arXiv:2312.13711  [pdf

    cs.LG cs.CR cs.IR

    A Learning oriented DLP System based on Classification Model

    Authors: Kishu Gupta, Ashwani Kush

    Abstract: Data is the key asset for organizations and data sharing is lifeline for organization growth; which may lead to data loss. Data leakage is the most critical issue being faced by organizations. In order to mitigate the data leakage issues data leakage prevention systems (DLPSs) are deployed at various levels by the organizations. DLPSs are capable to protect all kind of data i.e. DAR, DIM/DIT, DIU.… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  38. arXiv:2312.13704  [pdf

    cs.CR cs.LG

    A Forecasting-Based DLP Approach for Data Security

    Authors: Kishu Gupta, Ashwani Kush

    Abstract: Sensitive data leakage is the major growing problem being faced by enterprises in this technical era. Data leakage causes severe threats for organization of data safety which badly affects the reputation of organizations. Data leakage is the flow of sensitive data/information from any data holder to an unauthorized destination. Data leak prevention (DLP) is set of techniques that try to alleviate… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  39. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  40. The Expert Knowledge combined with AI outperforms AI Alone in Seizure Onset Zone Localization using resting state fMRI

    Authors: Payal Kamboj, Ayan Banerjee, Varina L. Boerwinkle, Sandeep K. S. Gupta

    Abstract: We evaluated whether integration of expert guidance on seizure onset zone (SOZ) identification from resting state functional MRI (rs-fMRI) connectomics combined with deep learning (DL) techniques enhances the SOZ delineation in patients with refractory epilepsy (RE), compared to utilizing DL alone. Rs-fMRI were collected from 52 children with RE who had subsequently undergone ic-EEG and then, if i… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted in Frontiers in Neurology journal, section Artificial Intelligence

  41. arXiv:2312.04564  [pdf, other

    cs.CV cs.GR

    EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS

    Authors: Sharath Girish, Kamal Gupta, Abhinav Shrivastava

    Abstract: Recently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for bo… ▽ More

    Submitted 24 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Website: https://efficientgaussian.github.io Code: https://github.com/Sharath-girish/efficientgaussian

  42. arXiv:2311.05109  [pdf, other

    cs.CV cs.LG

    Reducing the Side-Effects of Oscillations in Training of Quantized YOLO Networks

    Authors: Kartik Gupta, Akshay Asthana

    Abstract: Quantized networks use less computational and memory resources and are suitable for deployment on edge devices. While quantization-aware training QAT is the well-studied approach to quantize the networks at low precision, most research focuses on over-parameterized networks for classification with limited studies on popular and edge device friendly single-shot object detection and semantic segment… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  43. arXiv:2311.03320  [pdf, other

    cs.CL

    Tackling Concept Shift in Text Classification using Entailment-style Modeling

    Authors: Sumegh Roychowdhury, Karan Gupta, Siva Rajesh Kasa, Prasanna Srinivasa Murthy, Alok Chandra

    Abstract: Pre-trained language models (PLMs) have seen tremendous success in text classification (TC) problems in the context of Natural Language Processing (NLP). In many real-world text classification tasks, the class definitions being learned do not remain constant but rather change with time - this is known as Concept Shift. Most techniques for handling concept shift rely on retraining the old classifie… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Journal ref: NeurIPS 2023 - Workshop on Distribution Shifts

  44. arXiv:2311.00112  [pdf, other

    cs.RO

    Hierarchical Optimization-based Control for Whole-body Loco-manipulation of Heavy Objects

    Authors: Alberto Rigo, Muqun Hu, Satyandra K. Gupta, Quan Nguyen

    Abstract: In recent years, the field of legged robotics has seen growing interest in enhancing the capabilities of these robots through the integration of articulated robotic arms. However, achieving successful loco-manipulation, especially involving interaction with heavy objects, is far from straightforward, as object manipulation can introduce substantial disturbances that impact the robot's locomotion.… ▽ More

    Submitted 19 March, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

    Comments: 7 pages, 7 figures

  45. arXiv:2310.11768  [pdf, other

    cs.CR cs.IT math.AG math.NT

    On the Classification of Weierstrass Elliptic Curves over $\mathbb{Z}_n$

    Authors: Param Parekh, Paavan Parekh, Sourav Deb, Manish K Gupta

    Abstract: The development of secure cryptographic protocols and the subsequent attack mechanisms have been placed in the literature with the utmost curiosity. While sophisticated quantum attacks bring a concern to the classical cryptographic protocols present in the applications used in everyday life, the necessity of develo** post-quantum protocols is felt primarily. In post-quantum cryptography, ell… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: 12 pages, 2 figures, draft

  46. arXiv:2310.11325  [pdf, other

    cs.CR

    Detection of Malicious DNS-over-HTTPS Traffic: An Anomaly Detection Approach using Autoencoders

    Authors: Sergio Salinas Monroy, Aman Kumar Gupta, Garrett Wahlstedt

    Abstract: To maintain the privacy of users' web browsing history, popular browsers encrypt their DNS traffic using the DNS-over-HTTPS (DoH) protocol. Unfortunately, encrypting DNS packets prevents many existing intrusion detection systems from using plaintext domain names to detect malicious traffic. In this paper, we design an autoencoder that is capable of detecting malicious DNS traffic by only observing… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  47. arXiv:2310.03346  [pdf, other

    cs.CV

    Combining Datasets with Different Label Sets for Improved Nucleus Segmentation and Classification

    Authors: Amruta Parulekar, Utkarsh Kanwat, Ravi Kant Gupta, Medha Chippa, Thomas Jacob, Tripti Bameta, Swapnil Rane, Amit Sethi

    Abstract: Segmentation and classification of cell nuclei in histopathology images using deep neural networks (DNNs) can save pathologists' time for diagnosing various diseases, including cancers, by automating cell counting and morphometric assessments. It is now well-known that the accuracy of DNNs increases with the sizes of annotated datasets available for training. Although multiple datasets of histopat… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  48. arXiv:2310.03185  [pdf, other

    cs.CR cs.AI

    Misusing Tools in Large Language Models With Visual Adversarial Examples

    Authors: Xiaohan Fu, Zihan Wang, Shuheng Li, Rajesh K. Gupta, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Earlence Fernandes

    Abstract: Large Language Models (LLMs) are being enhanced with the ability to use tools and to process multiple modalities. These new capabilities bring new benefits and also new security risks. In this work, we show that an attacker can use visual adversarial examples to cause attacker-desired tool usage. For example, the attacker could cause a victim LLM to delete calendar events, leak private conversatio… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  49. arXiv:2310.02437  [pdf, other

    cs.CV

    EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields

    Authors: Anish Bhattacharya, Ratnesh Madaan, Fernando Cladera, Sai Vemprala, Rogerio Bonatti, Kostas Daniilidis, Ashish Kapoor, Vijay Kumar, Nikolai Matni, Jayesh K. Gupta

    Abstract: We present EvDNeRF, a pipeline for generating event data and training an event-based dynamic NeRF, for the purpose of faithfully reconstructing eventstreams on scenes with rigid and non-rigid deformations that may be too fast to capture with a standard camera. Event cameras register asynchronous per-pixel brightness changes at MHz rates with high dynamic range, making them ideal for observing fast… ▽ More

    Submitted 6 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: 16 pages, 20 figures, 2 tables

  50. arXiv:2310.01360  [pdf, other

    cs.RO

    Toward Scalable Visual Servoing Using Deep Reinforcement Learning and Optimal Control

    Authors: Salar Asayesh, Hossein Sheikhi Darani, Mo chen, Mehran Mehrandezh, Kamal Gupta

    Abstract: Classical pixel-based Visual Servoing (VS) approaches offer high accuracy but suffer from a limited convergence area due to optimization nonlinearity. Modern deep learning-based VS methods overcome traditional vision issues but lack scalability, requiring training on limited scenes. This paper proposes a hybrid VS strategy utilizing Deep Reinforcement Learning (DRL) and optimal control to enhance… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.