Skip to main content

Showing 1–50 of 468 results for author: Singh, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16320  [pdf, other

    cs.CL

    What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Noise-free Text-Image Corruption and Evaluation

    Authors: Michal Golovanevsky, William Rudman, Vedant Palit, Ritambhara Singh, Carsten Eickhoff

    Abstract: Vision-Language Models (VLMs) have gained community-spanning prominence due to their ability to integrate visual and textual inputs to perform complex tasks. Despite their success, the internal decision-making processes of these models remain opaque, posing challenges in high-stakes applications. To address this, we introduce NOTICE, the first Noise-free Text-Image Corruption and Evaluation pipeli… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. Retrieval Augmented Zero-Shot Text Classification

    Authors: Tassallah Abdullahi, Ritambhara Singh, Carsten Eickhoff

    Abstract: Zero-shot text learning enables text classifiers to handle unseen classes efficiently, alleviating the need for task-specific training data. A simple approach often relies on comparing embeddings of query (text) to those of potential classes. However, the embeddings of a simple query sometimes lack rich contextual information, which hinders the classification performance. Traditionally, this has b… ▽ More

    Submitted 26 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Proceedings of the 2024 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR '24), July 13, 2024, Washington DC, DC, USA

  3. arXiv:2406.11769  [pdf, other

    cs.CV

    Solving Vision Tasks with Simple Photoreceptors Instead of Cameras

    Authors: Andrei Atanov, Jiawei Fu, Rishubh Singh, Isabella Yu, Andrew Spielberg, Amir Zamir

    Abstract: A de facto standard in solving computer vision problems is to use a common high-resolution camera and choose its placement on an agent (i.e., position and orientation) based on human intuition. On the other hand, extremely simple and well-designed visual sensors found throughout nature allow many organisms to perform diverse, complex behaviors. In this work, motivated by these examples, we raise t… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.09750  [pdf, other

    cs.CV cs.AI

    ControlVAR: Exploring Controllable Visual Autoregressive Modeling

    Authors: Xiang Li, Kai Qiu, Hao Chen, Jason Kuen, Zhe Lin, Rita Singh, Bhiksha Raj

    Abstract: Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs), especially in tasks like control-to-image generation. However, challenges such as expensive computational cost, high inference latency, and difficulties of integration with large language models (LLMs) have necessitated exploring alternatives to DMs. This paper introduces ControlVAR, a novel… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 24 pages, 19 figures, 4 tables

  5. Automating Patch Set Generation from Code Review Comments Using Large Language Models

    Authors: Tajmilur Rahman, Rahul Singh, Mir Yousuf Sultan

    Abstract: The advent of Large Language Models (LLMs) has revolutionized various domains of artificial intelligence, including the realm of software engineering. In this research, we evaluate the efficacy of pre-trained LLMs in replicating the tasks traditionally performed by developers in response to code review comments. We provide code contexts to five popular LLMs and obtain the suggested code-changes (p… ▽ More

    Submitted 9 April, 2024; originally announced June 2024.

    Comments: 2 pages

  6. arXiv:2405.18793  [pdf, other

    cs.LG

    Adaptive Discretization-based Non-Episodic Reinforcement Learning in Metric Spaces

    Authors: Avik Kar, Rahul Singh

    Abstract: We study non-episodic Reinforcement Learning for Lipschitz MDPs in which state-action space is a metric space, and the transition kernel and rewards are Lipschitz functions. We develop computationally efficient UCB-based algorithm, $\textit{ZoRL-}ε$ that adaptively discretizes the state-action space and show that their regret as compared with $ε$-optimal policy is bounded as… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 38 pages, 2 figures

  7. arXiv:2405.15468  [pdf, other

    cs.CV cs.GR

    Semantic Aware Diffusion Inverse Tone Map**

    Authors: Abhishek Goswami, Aru Ranjan Singh, Francesco Banterle, Kurt Debattista, Thomas Bashford-Rogers

    Abstract: The range of real-world scene luminance is larger than the capture capability of many digital camera sensors which leads to details being lost in captured images, most typically in bright regions. Inverse tone map** attempts to boost these captured Standard Dynamic Range (SDR) images back to High Dynamic Range (HDR) by creating a map** that linearizes the well exposed values from the SDR image… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  8. arXiv:2405.13370  [pdf, other

    eess.IV cs.CV cs.LG

    Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning

    Authors: Yasmeena Akhter, Rishabh Ranjan, Richa Singh, Mayank Vatsa

    Abstract: This research addresses the challenges of diagnosing chest X-rays (CXRs) at low resolutions, a common limitation in resource-constrained healthcare settings. High-resolution CXR imaging is crucial for identifying small but critical anomalies, such as nodules or opacities. However, when images are downsized for processing in Computer-Aided Diagnosis (CAD) systems, vital spatial details and receptiv… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: IEEE ISBI 2024

  9. arXiv:2405.09101  [pdf, other

    cs.RO eess.SY

    Adaptive Koopman Embedding for Robust Control of Complex Nonlinear Dynamical Systems

    Authors: Rajpal Singh, Chandan Kumar Sah, Jishnu Keshavan

    Abstract: The discovery of linear embedding is the key to the synthesis of linear control techniques for nonlinear systems. In recent years, while Koopman operator theory has become a prominent approach for learning these linear embeddings through data-driven methods, these algorithms often exhibit limitations in generalizability beyond the distribution captured by training data and are not robust to change… ▽ More

    Submitted 20 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Corrected the title

  10. RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

    Authors: Avinash Anand, Raj Jaiswal, Mohit Gupta, Siddhesh S Bangar, Pijush Bhuyan, Naman Lal, Rajeev Singh, Ritika Jha, Rajiv Ratn Shah, Shin'ichi Satoh

    Abstract: Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these… ▽ More

    Submitted 19 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 8 pages, 6 figures, MMAsia 2023 Proceedings of the 5th ACM International Conference on Multimedia in Asia

    Journal ref: In Proceedings of the 5th ACM International Conference on Multimedia in Asia 2023. Association for Computing Machinery, NY, USA, Article 74, pp. 1-6

  11. arXiv:2404.03587  [pdf, other

    cs.RO cs.AI

    Anticipate & Collab: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration

    Authors: Shivam Singh, Karthik Swaminathan, Raghav Arora, Ramandeep Singh, Ahana Datta, Dipanjan Das, Snehasis Banerjee, Mohan Sridharan, Madhava Krishna

    Abstract: An agent assisting humans in daily living activities can collaborate more effectively by anticipating upcoming tasks. Data-driven methods represent the state of the art in task anticipation, planning, and related problems, but these methods are resource-hungry and opaque. Our prior work introduced a proof of concept framework that used an LLM to anticipate 3 high-level tasks that served as goals f… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  12. arXiv:2403.20312  [pdf, other

    cs.CV

    Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations

    Authors: Jaisidh Singh, Ishaan Shrivastava, Mayank Vatsa, Richa Singh, Aparna Bharati

    Abstract: Existing vision-language models (VLMs) treat text descriptions as a unit, confusing individual concepts in a prompt and impairing visual semantic matching and reasoning. An important aspect of reasoning in logic and language is negations. This paper highlights the limitations of popular VLMs such as CLIP, at understanding the implications of negations, i.e., the effect of the word "not" in a given… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 14 pages + 6 figures in main manuscript (excluding references)

  13. arXiv:2403.15248  [pdf, other

    cs.CV cs.AI eess.IV

    Self-Supervised Backbone Framework for Diverse Agricultural Vision Tasks

    Authors: Sudhir Sornapudi, Rajhans Singh

    Abstract: Computer vision in agriculture is game-changing with its ability to transform farming into a data-driven, precise, and sustainable industry. Deep learning has empowered agriculture vision to analyze vast, complex visual data, but heavily rely on the availability of large annotated datasets. This remains a bottleneck as manual labeling is error-prone, time-consuming, and expensive. The lack of effi… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  14. arXiv:2403.13989  [pdf, other

    cs.SE

    FastFlip: Compositional Error Injection Analysis

    Authors: Keyur Joshi, Rahul Singh, Tommaso Bassetto, Sarita Adve, Darko Marinov, Sasa Misailovic

    Abstract: Instruction-level error injection analyses aim to find instructions where errors often lead to unacceptable outcomes like Silent Data Corruptions (SDCs). These analyses require significant time, which is especially problematic if developers wish to regularly analyze software that evolves over time. We present FastFlip, a combination of empirical error injection and symbolic SDC propagation analy… ▽ More

    Submitted 26 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  15. arXiv:2403.12321  [pdf, other

    cs.HC

    Explainable agency: human preferences for simple or complex explanations

    Authors: Michelle Blom, Ronal Singh, Tim Miller, Liz Sonenberg, Kerry Trentelman, Adam Saulwick

    Abstract: Research in cognitive psychology has established that whether people prefer simpler explanations to complex ones is context dependent, but the question of `simple vs. complex' becomes critical when an artificial agent seeks to explain its decisions or predictions to humans. We present a model for abstracting causal reasoning chains for the purpose of explanation. This model uses a set of rules to… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  16. arXiv:2403.09806  [pdf, other

    cs.AI

    xLP: Explainable Link Prediction for Master Data Management

    Authors: Balaji Ganesan, Matheen Ahmed Pasha, Srinivasa Parkala, Neeraj R Singh, Gayatri Mishra, Sumit Bhatia, Hima Patel, Somashekar Naganna, Sameep Mehta

    Abstract: Explaining neural model predictions to users requires creativity. Especially in enterprise applications, where there are costs associated with users' time, and their trust in the model predictions is critical for adoption. For link prediction in master data management, we have built a number of explainability solutions drawing from research in interpretability, fact verification, path ranking, neu… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures, NeurIPS 2020 Competition and Demonstration Track. arXiv admin note: text overlap with arXiv:2012.05516

  17. arXiv:2403.07043  [pdf, other

    cs.RO

    A Collision Cone Approach for Control Barrier Functions

    Authors: Manan Tayal, Bhavya Giri Goswami, Karthik Rajgopal, Rajpal Singh, Tejas Rao, Jishnu Keshavan, Pushpak Jagtap, Shishir Kolathaya

    Abstract: This work presents a unified approach for collision avoidance using Collision-Cone Control Barrier Functions (CBFs) in both ground (UGV) and aerial (UAV) unmanned vehicles. We propose a novel CBF formulation inspired by collision cones, to ensure safety by constraining the relative velocity between the vehicle and the obstacle to always point away from each other. The efficacy of this approach is… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 13 pages, 16 pages. arXiv admin note: substantial text overlap with arXiv:2209.11524, arXiv:2303.15871, arXiv:2310.10839

  18. arXiv:2403.04924  [pdf, other

    cs.CV

    $\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

    Authors: Xiang Li, Kai Qiu, **glu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazak, Hao Chen, Xiaonan Huang, Bhiksha Raj

    Abstract: Referring perception, which aims at grounding visual objects with multimodal referring guidance, is essential for bridging the gap between humans, who provide instructions, and the environment where intelligent systems perceive. Despite progress in this field, the robustness of referring perception models (RPMs) against disruptive perturbations is not well explored. This work thoroughly assesses t… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Code and dataset will be released at https://github.com/lxa9867/r2bench

  19. arXiv:2403.04379  [pdf, other

    cs.NI

    Performance evaluation of conditional handover in 5G systems under fading scenario

    Authors: Souvik Deb, Megh Rathod, Rishi Balamurugan, Shankar K. Ghosh, Rajeev K. Singh, Samriddha Sanyal

    Abstract: To enhance the handover performance in fifth generation (5G) cellular systems, conditional handover (CHO) has been evolved as a promising solution. Unlike A3 based handover where handover execution is certain after receiving handover command from the serving access network, in CHO, handover execution is conditional on the RSRP measurements from both current and target access networks, as well as o… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  20. arXiv:2402.10781  [pdf, other

    cs.IT

    Towards 6G Evolution: Three Enhancements, Three Innovations, and Three Major Challenges

    Authors: Rohit Singh, Aryan Kaushik, Wonjae Shin, Marco Di Renzo, Vincenzo Sciancalepore, Doohwan Lee, Hirofumi Sasaki, Arman Shojaeifard, Octavia A. Dobre

    Abstract: Over the past few decades, wireless communication has witnessed remarkable growth, experiencing several transformative changes. This article aims to provide a comprehensive overview of wireless communication technologies, from the foundations to the recent wireless advances. Specifically, we take a neutral look at the state-of-the-art technologies for 5G and the ongoing evolutions towards 6G, revi… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 8 pages, 4 figures, 1 table

  21. arXiv:2402.10454  [pdf, other

    cs.CV

    Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary Task Integration

    Authors: Mahapara Khurshid, Mayank Vatsa, Richa Singh

    Abstract: The rising global prevalence of skin conditions, some of which can escalate to life-threatening stages if not timely diagnosed and treated, presents a significant healthcare challenge. This issue is particularly acute in remote areas where limited access to healthcare often results in delayed treatment, allowing skin diseases to advance to more critical stages. One of the primary challenges in dia… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  22. arXiv:2402.09585  [pdf, other

    cs.SD eess.AS

    Domain Adaptation for Contrastive Audio-Language Models

    Authors: Soham Deshmukh, Rita Singh, Bhiksha Raj

    Abstract: Audio-Language Models (ALM) aim to be general-purpose audio models by providing zero-shot capabilities at test time. The zero-shot performance of ALM improves by using suitable text prompts for each domain. The text prompts are usually hand-crafted through an ad-hoc process and lead to a drop in ALM generalization and out-of-distribution performance. Existing approaches to improve domain performan… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  23. arXiv:2402.05398  [pdf, other

    cs.CV

    On the Effect of Image Resolution on Semantic Segmentation

    Authors: Ritambhara Singh, Abhishek Jain, Pietro Perona, Shivani Agarwal, Junfeng Yang

    Abstract: High-resolution semantic segmentation requires substantial computational resources. Traditional approaches in the field typically downscale the input images before processing and then upscale the low-resolution outputs back to their original dimensions. While this strategy effectively identifies broad regions, it often misses finer details. In this study, we demonstrate that a streamlined model ca… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2209.08667 by other authors

  24. arXiv:2402.01922  [pdf, other

    cs.LG cs.AI

    A General Framework for Learning from Weak Supervision

    Authors: Hao Chen, **dong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj

    Abstract: Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulati… ▽ More

    Submitted 5 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: 24 pages, 20 tables, 9 figures

  25. arXiv:2402.01292  [pdf, other

    cs.AI cs.HC

    Towards the New XAI: A Hypothesis-Driven Approach to Decision Support Using Evidence

    Authors: Thao Le, Tim Miller, Liz Sonenberg, Ronal Singh

    Abstract: Prior research on AI-assisted human decision-making has explored several different explainable AI (XAI) approaches. A recent paper has proposed a paradigm shift calling for hypothesis-driven XAI through a conceptual framework called evaluative AI that gives people evidence that supports or refutes hypotheses without necessarily giving a decision-aid recommendation. In this paper, we describe and e… ▽ More

    Submitted 27 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: 26 pages

  26. arXiv:2402.00282  [pdf, other

    eess.AS cs.SD

    PAM: Prompting Audio-Language Models for Audio Quality Assessment

    Authors: Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

    Abstract: While audio quality is a key performance metric for various audio processing tasks, including generative modeling, its objective measurement remains a challenge. Audio-Language Models (ALMs) are pre-trained on audio-text pairs that may contain information about audio quality, the presence of artifacts, or noise. Given an audio input and a text prompt related to quality, an ALM can be used to calcu… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  27. arXiv:2401.15589  [pdf, other

    cs.HC cs.CY

    OpineBot: Class Feedback Reimagined Using a Conversational LLM

    Authors: Henansh Tanwar, Kunal Shrivastva, Rahul Singh, Dhruv Kumar

    Abstract: Conventional class feedback systems often fall short, relying on static, unengaging surveys offering little incentive for student participation. To address this, we present OpineBot, a novel system employing large language models (LLMs) to conduct personalized, conversational class feedback via chatbot interface. We assessed OpineBot's effectiveness in a user study with 20 students from an Indian… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: Under review

  28. arXiv:2401.12863  [pdf, other

    cs.CL cs.AI

    KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

    Authors: Debjyoti Mondal, Suraj Modi, Subhadarshi Panda, Rituraj Singh, Godawari Sudhakar Rao

    Abstract: Large Language Models (LLMs) have demonstrated impressive performance in natural language processing tasks by leveraging chain of thought (CoT) that enables step-by-step thinking. Extending LLMs with multimodal capabilities is the recent interest, but incurs computational cost and requires substantial hardware resources. To address these challenges, we propose KAM-CoT a framework that integrates C… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: AAAI 2024

  29. arXiv:2401.12803  [pdf, other

    cs.IT cs.AI cs.LG eess.SP

    Enhancements for 5G NR PRACH Reception: An AI/ML Approach

    Authors: Rohit Singh, Anil Kumar Yerrapragada, Jeeva Keshav S, Radha Krishna Ganti

    Abstract: Random Access is an important step in enabling the initial attachment of a User Equipment (UE) to a Base Station (gNB). The UE identifies itself by embedding a Preamble Index (RAPID) in the phase rotation of a known base sequence, which it transmits on the Physical Random Access Channel (PRACH). The signal on the PRACH also enables the estimation of propagation delay, often known as Timing Advance… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  30. arXiv:2401.09620  [pdf, other

    cs.NI cs.PF

    Cost-effective and performant virtual WANs with CORNIFER

    Authors: Anjali, Rachee Singh, Michael M. Swift

    Abstract: Virtual wide-area networks (WANs) are WAN-as-a-service cloud offerings that aim to bring the performance benefits of dedicated wide-area interconnects to enterprise customers. In this work, we show that the topology of a virtual WAN can render it both performance and cost inefficient. We develop Cornifer, a tool that designs virtual WAN topologies by deciding the number of virtual WAN nodes and th… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 21 pages

    ACM Class: C.2.1; C.4

  31. arXiv:2401.00547   

    cs.LG math.OC

    On Learning for Ambiguous Chance Constrained Problems

    Authors: A Ch Madhusudanarao, Rahul Singh

    Abstract: We study chance constrained optimization problems $\min_x f(x)$ s.t. $P(\left\{ θ: g(x,θ)\le 0 \right\})\ge 1-ε$ where $ε\in (0,1)$ is the violation probability, when the distribution $P$ is not known to the decision maker (DM). When the DM has access to a set of distributions $\mathcal{U}$ such that $P$ is contained in $\mathcal{U}$, then the problem is known as the ambiguous chance-constrained p… ▽ More

    Submitted 11 February, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: We have "not considered the uniform bound" for violation probabilities corresponding to the set of distributions in the ambiguity set

  32. arXiv:2312.11561  [pdf, other

    cs.LG cs.AI

    COPD-FlowNet: Elevating Non-invasive COPD Diagnosis with CFD Simulations

    Authors: Aryan Tyagi, Aryaman Rao, Shubhanshu Rao, Raj Kumar Singh

    Abstract: Chronic Obstructive Pulmonary Disorder (COPD) is a prevalent respiratory disease that significantly impacts the quality of life of affected individuals. This paper presents COPDFlowNet, a novel deep-learning framework that leverages a custom Generative Adversarial Network (GAN) to generate synthetic Computational Fluid Dynamics (CFD) velocity flow field images specific to the trachea of COPD patie… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: 2 pages 2 tables 3 figures

  33. arXiv:2312.10127  [pdf, other

    cs.PF cs.DC cs.LG

    How Does It Function? Characterizing Long-term Trends in Production Serverless Workloads

    Authors: Artjom Joosen, Ahmed Hassan, Martin Asenov, Rajkarn Singh, Luke Darlow, Jianfeng Wang, Adam Barker

    Abstract: This paper releases and analyzes two new Huawei cloud serverless traces. The traces span a period of over 7 months with over 1.4 trillion function invocations combined. The first trace is derived from Huawei's internal workloads and contains detailed per-second statistics for 200 functions running across multiple Huawei cloud data centers. The second trace is a representative workload from Huawei'… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    ACM Class: D.4.7; I.5.1; C.4

    Journal ref: SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud Computing, October 2023, Pages 443-458

  34. arXiv:2312.06699  [pdf, other

    cs.CV cs.LG

    Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning

    Authors: Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Rahul Pratap Singh, Bishmoy Paul, Ali Dabouei, Min Xu

    Abstract: A thorough comprehension of textual data is a fundamental element in multi-modal video analysis tasks. However, recent works have shown that the current models do not achieve a comprehensive understanding of the textual data during the training for the target downstream tasks. Orthogonal to the previous approaches to this limitation, we postulate that understanding the significance of the sentence… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  35. arXiv:2312.04231  [pdf, other

    cs.CV cs.AI

    Adventures of Trustworthy Vision-Language Models: A Survey

    Authors: Mayank Vatsa, Anubhooti Jain, Richa Singh

    Abstract: Recently, transformers have become incredibly popular in computer vision and vision-language tasks. This notable rise in their usage can be primarily attributed to the capabilities offered by attention mechanisms and the outstanding ability of transformers to adapt and apply themselves to a variety of tasks and domains. Their versatility and state-of-the-art performance have established them as in… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted in AAAI 2024

  36. Token Prediction as Implicit Classification to Identify LLM-Generated Text

    Authors: Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj

    Abstract: This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation. Instead of adding an additional classification layer to a base LM, we reframe the classification task as a next-token prediction task and directly fine-tune the base LM to perform it. We utilize the Text-to-Text Transfer Transformer (T5) model as the backbone for our experi… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023, Main Conference

  37. arXiv:2310.15848  [pdf, other

    cs.LG cs.CV

    On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms

    Authors: Surbhi Mittal, Kartik Thakral, Richa Singh, Mayank Vatsa, Tamar Glaser, Cristian Canton Ferrer, Tal Hassner

    Abstract: Artificial Intelligence (AI) has made its way into various scientific fields, providing astonishing improvements over existing algorithms for a wide variety of tasks. In recent years, there have been severe concerns over the trustworthiness of AI technologies. The scientific community has focused on the development of trustworthy AI algorithms. However, machine and deep learning algorithms, popula… ▽ More

    Submitted 24 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: corrected typos

  38. arXiv:2310.09449  [pdf, other

    cs.CV cs.LG

    Pairwise Similarity Learning is SimPLE

    Authors: Yandong Wen, Weiyang Liu, Yao Feng, Bhiksha Raj, Rita Singh, Adrian Weller, Michael J. Black, Bernhard Schölkopf

    Abstract: In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL). PSL subsumes a wide range of important applications, such as open-set face recognition, speaker verification, image retrieval and person re-identification. The goal of PSL is to learn a pairwise similarity function assigning a higher similarity score to positive pairs (i.e., a pair of samples w… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Published in ICCV 2023 (Project page: https://simple.is.tue.mpg.de/)

  39. arXiv:2310.07209  [pdf, other

    cs.CV

    Multi-task Explainable Skin Lesion Classification

    Authors: Mahapara Khurshid, Mayank Vatsa, Richa Singh

    Abstract: Skin cancer is one of the deadliest diseases and has a high mortality rate if left untreated. The diagnosis generally starts with visual screening and is followed by a biopsy or histopathological examination. Early detection can aid in lowering mortality rates. Visual screening can be limited by the experience of the doctor. Due to the long tail distribution of dermatological datasets and signific… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  40. arXiv:2310.04445  [pdf, other

    cs.CL cs.AI cs.LG

    LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

    Authors: Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Joseph Konan, Dareen Alharthi, Hazim T Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, Rita Singh

    Abstract: It has been shown that Large Language Model (LLM) alignments can be circumvented by appending specially crafted attack suffixes with harmful queries to elicit harmful responses. To conduct attacks against private target models whose characterization is unknown, public models can be used as proxies to fashion the attack, with successful attacks being transferred from public proxies to private targe… ▽ More

    Submitted 21 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

  41. arXiv:2310.02298  [pdf, other

    cs.SD cs.AI eess.AS

    Prompting Audios Using Acoustic Properties For Emotion Representation

    Authors: Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

    Abstract: Emotions lie on a continuum, but current models treat emotions as a finite valued discrete variable. This representation does not capture the diversity in the expression of emotion. To better represent emotions we propose the use of natural language descriptions (or prompts). In this work, we address the challenge of automatically generating these prompts and training a model to better learn emoti… ▽ More

    Submitted 6 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.07737

  42. arXiv:2310.00808  [pdf, other

    cs.CV

    Completing Visual Objects via Bridging Generation and Segmentation

    Authors: Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, Zicheng Liu

    Abstract: This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components. Our method, named MaskComp, delineates the completion process through iterative stages of generation and segmentation. In each iteration, the object mask is provided as an additional condition to boost image generation, and, in return, the gene… ▽ More

    Submitted 2 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

  43. arXiv:2310.00706  [pdf, other

    cs.CL cs.SD eess.AS

    Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

    Authors: Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

    Abstract: Modern speech synthesis systems have improved significantly, with synthetic speech being indistinguishable from real speech. However, efficient and holistic evaluation of synthetic speech still remains a significant challenge. Human evaluation using Mean Opinion Score (MOS) is ideal, but inefficient due to high costs. Therefore, researchers have developed auxiliary automatic metrics like Word Erro… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  44. arXiv:2310.00132  [pdf, other

    cs.CV

    QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

    Authors: Xiang Li, **glu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj

    Abstract: Audiovisual segmentation (AVS) is a challenging task that aims to segment visual objects in videos according to their associated acoustic cues. With multiple sound sources and background disturbances involved, establishing robust correspondences between audio and visual contents poses unique challenges due to (1) complex entanglement across sound sources and (2) frequent changes in the occurrence… ▽ More

    Submitted 19 April, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

  45. arXiv:2309.13544  [pdf

    cs.IR cs.AI cs.LG cs.SD eess.AS

    Related Rhythms: Recommendation System To Discover Music You May Like

    Authors: Rahul Singh, Pranav Kanuparthi

    Abstract: Machine Learning models are being utilized extensively to drive recommender systems, which is a widely explored topic today. This is especially true of the music industry, where we are witnessing a surge in growth. Besides a large chunk of active users, these systems are fueled by massive amounts of data. These large-scale systems yield applications that aim to provide a better user experience and… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    ACM Class: I.2.6; H.3.3

  46. arXiv:2309.13542  [pdf, other

    cs.IT

    Integrated Sensing and Communications for IoT: Synergies with Key 6G Technology Enablers

    Authors: Aryan Kaushik, Rohit Singh, Ming Li, Honghao Luo, Shalanika Dayarathna, Rajitha Senanayake, Xueli An, Richard A. Stirling-Gallacher, Wonjae Shin, Marco Di Renzo

    Abstract: The Internet of Things (IoT) and wireless generations have been evolving simultaneously for the past few decades. Built upon wireless communication and sensing technologies, IoT networks are usually evaluated based on metrics that measure the device ability to sense information and effectively share it with the network, which makes Integrated Sensing and Communication (ISAC) a pivotal candidate fo… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

    Comments: 7 pages, 6 figures

  47. arXiv:2309.13227  [pdf, other

    cs.LG cs.SD eess.AS

    Importance of negative sampling in weak label learning

    Authors: Ankit Shah, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj

    Abstract: Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known. The pool of negative instances is usually larger than positive instances, thus making selecting the most informative negative instance critical for performance. Such a selection strategy for negative instances from each bag is an open prob… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  48. arXiv:2309.07372  [pdf, other

    eess.AS cs.SD

    Training Audio Captioning Models without Audio

    Authors: Soham Deshmukh, Benjamin Elizalde, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang

    Abstract: Automated Audio Captioning (AAC) is the task of generating natural language descriptions given an audio stream. A typical AAC system requires manually curated training data of audio segments and corresponding text caption annotations. The creation of these audio-caption pairs is costly, resulting in general data scarcity for the task. In this work, we address this major limitation and propose an a… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  49. arXiv:2308.14190  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Score-Based Generative Models for PET Image Reconstruction

    Authors: Imraj RD Singh, Alexander Denker, Riccardo Barbano, Željko Kereta, Bangti **, Kris Thielemans, Peter Maass, Simon Arridge

    Abstract: Score-based generative models have demonstrated highly promising results for medical image reconstruction tasks in magnetic resonance imaging or computed tomography. However, their application to Positron Emission Tomography (PET) is still largely unexplored. PET image reconstruction involves a variety of challenges, including Poisson noise with high variance and a wide dynamic range. To address t… ▽ More

    Submitted 23 January, 2024; v1 submitted 27 August, 2023; originally announced August 2023.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024:001

    MSC Class: 15A29; 45Q05 ACM Class: I.4.9; J.2; I.2.1

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)

  50. Modeling Digital Penetration of the Industrialized Society and its Ensuing Transfiguration

    Authors: Johannes Vrana, Ripudaman Singh

    Abstract: The Fourth Industrial Revolution, ushered by the deeper integration of digital technologies into professional and social spaces, provides an opportunity to meaningfully serve society. Humans have tremendous capability to innovatively improve social well-being when the situation is clear. Which was not the case during the first three revolutions. Thus, society has been accepting lifestyle changes w… ▽ More

    Submitted 21 August, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: 22 pages, 6 figures, 2 tables

    Journal ref: DISO 2, 54 (2023)