Skip to main content

Showing 1–50 of 205 results for author: Sarthak

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.06125  [pdf, other

    cs.HC cs.AI

    Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

    Authors: Avinash Anand, Chayan Tank, Sarthak Pol, Vinayak Katoch, Shaina Mehta, Rajiv Ratn Shah

    Abstract: Depression has proven to be a significant public health issue, profoundly affecting the psychological well-being of individuals. If it remains undiagnosed, depression can lead to severe health issues, which can manifest physically and even lead to suicide. Generally, Diagnosing depression or any other mental disorder involves conducting semi-structured interviews alongside supplementary questionna… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 12 pages, 9 figures, 9 tables

  2. arXiv:2406.19543  [pdf, other

    cs.CL cs.SI

    Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management

    Authors: Seid Muhie Yimam, Daryna Dementieva, Tim Fischer, Daniil Moskovskiy, Naquee Rizwan, Punyajoy Saha, Sarthak Roy, Martin Semmann, Alexander Panchenko, Chris Biemann, Animesh Mukherjee

    Abstract: Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of abusive speech. In this work, we propose a more comprehensive approach called Demarcat… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.10448  [pdf, other

    eess.AS cs.SD

    AVR: Synergizing Foundation Models for Audio-Visual Humor Detection

    Authors: Sarthak Sharma, Orchid Chetia Phukan, Drishti Singh, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this work, we present, AVR application for audio-visual humor detection. While humor detection has traditionally centered around textual analysis, recent advancements have spotlighted multimodal approaches. However, these methods lean on textual cues as a modality, necessitating the use of ASR systems for transcribing the audio-data. This heavy reliance on ASR accuracy can pose challenges in re… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024 Show & Tell Demonstrations

  4. arXiv:2406.06798  [pdf, other

    eess.AS cs.SD

    The Reasonable Effectiveness of Speaker Embeddings for Violence Detection

    Authors: Sarthak Jain, Orchid Chetia Phukan, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this paper, we focus on audio violence detection (AVD). AVD is necessary for several reasons, especially in the context of maintaining safety, preventing harm, and ensuring security in various environments. This calls for accurate AVD systems. Like many related applications in audio processing, the most common approach for improving the performance, would be by leveraging self-supervised (SSL)… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 24 Show & Tell Demonstrations

  5. arXiv:2406.06781  [pdf, other

    eess.AS cs.SD

    PERSONA: An Application for Emotion Recognition, Gender Recognition and Age Estimation

    Authors: Devyani Koshal, Orchid Chetia Phukan, Sarthak Jain, Arun Balaji Buduru, Rajesh Sharma

    Abstract: Emotion Recognition (ER), Gender Recognition (GR), and Age Estimation (AE) constitute paralinguistic tasks that rely not on the spoken content but primarily on speech characteristics such as pitch and tone. While previous research has made significant strides in develo** models for each task individually, there has been comparatively less emphasis on concurrently learning these tasks, despite th… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024 Show & Tell Demonstrations

  6. arXiv:2406.06774  [pdf, other

    eess.AS cs.SD

    ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection

    Authors: Orchid Chetia Phukan, Sarthak Jain, Shubham Singh, Muskaan Singh, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this work, we focus on the detection of depression through speech analysis. Previous research has widely explored features extracted from pre-trained models (PTMs) primarily trained for paralinguistic tasks. Although these features have led to sufficient advances in speech-based depression detection, their performance declines in real-world settings. To address this, in this paper, we introduce… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024 Show & Tell Demonstrations

  7. arXiv:2406.02178  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations

    Authors: Sarthak Yadav, Zheng-Hua Tan

    Abstract: Despite its widespread adoption as the prominent neural architecture, the Transformer has spurred several independent lines of work to address its limitations. One such approach is selective state space models, which have demonstrated promising results for language modelling. However, their feasibility for learning self-supervised, general-purpose audio representations is yet to be investigated. T… ▽ More

    Submitted 7 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  8. arXiv:2406.00869  [pdf, other

    cs.RO

    Using 3-D LiDAR Data for Safe Physical Human-Robot Interaction

    Authors: Sarthak Arora, Karthik Subramanian, Odysseus Adamides, Ferat Sahin

    Abstract: This paper explores the use of 3D lidar in a physical Human-Robot Interaction (pHRI) scenario. To achieve the aforementioned, experiments were conducted to mimic a modern shop-floor environment. Data was collected from a pool of seventeen participants while performing pre-determined tasks in a shared workspace with the robot. To demonstrate an end-to-end case; a perception pipeline was developed t… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE-CASE 2024. Under Review

  9. arXiv:2406.00314  [pdf, other

    cs.CL cs.AI cs.LG

    CASE: Efficient Curricular Data Pre-training for Building Assistive Psychology Expert Models

    Authors: Sarthak Harne, Monjoy Narayan Choudhury, Madhav Rao, TK Srikanth, Seema Mehrotra, Apoorva Vashisht, Aarushi Basu, Manjit Sodhi

    Abstract: The limited availability of psychologists necessitates efficient identification of individuals requiring urgent mental healthcare. This study explores the use of Natural Language Processing (NLP) pipelines to analyze text data from online mental health forums used for consultations. By analyzing forum posts, these pipelines can flag users who may require immediate professional attention. A crucial… ▽ More

    Submitted 16 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  10. arXiv:2405.20971  [pdf, other

    cs.LG cs.CV

    Amortizing intractable inference in diffusion models for vision, language, and control

    Authors: Siddarth Venkatraman, Moksh Jain, Luca Scimeca, Minsu Kim, Marcin Sendera, Mohsin Hasan, Luke Rowe, Sarthak Mittal, Pablo Lemos, Emmanuel Bengio, Alexandre Adam, Jarrid Rector-Brooks, Yoshua Bengio, Glen Berseth, Nikolay Malkin

    Abstract: Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generat… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Code: https://github.com/GFNOrg/diffusion-finetuning

  11. arXiv:2405.19162  [pdf, other

    cs.LG cs.AI

    Does learning the right latent variables necessarily improve in-context learning?

    Authors: Sarthak Mittal, Eric Elmoznino, Leo Gagnon, Sangnie Bhardwaj, Dhanya Sridhar, Guillaume Lajoie

    Abstract: Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting avenues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by in… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  12. arXiv:2405.18383  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    Brain Tumor Segmentation (BraTS) Challenge 2024: Meningioma Radiotherapy Planning Automated Segmentation

    Authors: Dominic LaBella, Katherine Schumacher, Michael Mix, Kevin Leu, Shan McBurney-Lin, Pierre Nedelec, Javier Villanueva-Meyer, Jonathan Shapey, Tom Vercauteren, Kazumi Chia, Omar Al-Salihi, Justin Leu, Lia Halasz, Yury Velichko, Chunhao Wang, John Kirkpatrick, Scott Floyd, Zachary J. Reitman, Trey Mullikin, Ulas Bagci, Sean Sachdev, Jona A. Hattangadi-Gluth, Tyler Seibert, Nikdokht Farid, Connor Puett , et al. (45 additional authors not shown)

    Abstract: The 2024 Brain Tumor Segmentation Meningioma Radiotherapy (BraTS-MEN-RT) challenge aims to advance automated segmentation algorithms using the largest known multi-institutional dataset of radiotherapy planning brain MRIs with expert-annotated target labels for patients with intact or post-operative meningioma that underwent either conventional external beam radiotherapy or stereotactic radiosurger… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 13 pages, 9 figures, 1 table

  13. arXiv:2405.12167  [pdf, other

    cs.CY

    Open-Source Assessments of AI Capabilities: The Proliferation of AI Analysis Tools, Replicating Competitor Models, and the Zhousidun Dataset

    Authors: Ritwik Gupta, Leah Walker, Eli Glickman, Raine Koizumi, Sarthak Bhatnagar, Andrew W. Reddie

    Abstract: The integration of artificial intelligence (AI) into military capabilities has become a norm for major military power across the globe. Understanding how these AI models operate is essential for maintaining strategic advantages and ensuring security. This paper demonstrates an open-source methodology for analyzing military AI models through a detailed examination of the Zhousidun dataset, a Chines… ▽ More

    Submitted 24 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  14. arXiv:2405.10871  [pdf, other

    cs.CV

    BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regions

    Authors: Spyridon Bakas, Siddhesh P. Thakur, Shahriar Faghani, Mana Moassefi, Ujjwal Baid, Verena Chung, Sarthak Pati, Shubham Innani, Bhakti Baheti, Jake Albrecht, Alexandros Karargyris, Hasan Kassem, MacLean P. Nasrallah, Jared T. Ahrendsen, Valeria Barresi, Maria A. Gubbiotti, Giselle Y. López, Calixto-Hope G. Lucas, Michael L. Miller, Lee A. D. Cooper, Jason T. Huse, William R. Bell

    Abstract: Glioblastoma is the most common primary adult brain tumor, with a grim prognosis - median survival of 12-18 months following treatment, and 4 months otherwise. Glioblastoma is widely infiltrative in the cerebral hemispheres and well-defined by heterogeneous molecular and micro-environmental histopathologic profiles, which pose a major obstacle in treatment. Correctly diagnosing these tumors and as… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  15. arXiv:2405.09787  [pdf, other

    eess.IV cs.CV cs.LG

    Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge

    Authors: Dominic LaBella, Ujjwal Baid, Omaditya Khanna, Shan McBurney-Lin, Ryan McLean, Pierre Nedelec, Arif Rashid, Nourel Hoda Tahon, Talissa Altes, Radhika Bhalerao, Yaseen Dhemesh, Devon Godfrey, Fathi Hilal, Scott Floyd, Anastasia Janas, Anahita Fathi Kazerooni, John Kirkpatrick, Collin Kent, Florian Kofler, Kevin Leu, Nazanin Maleki, Bjoern Menze, Maxence Pajot, Zachary J. Reitman, Jeffrey D. Rudie , et al. (96 additional authors not shown)

    Abstract: We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 11 tables, 10 figures, MICCAI

  16. arXiv:2405.03113  [pdf, other

    cs.RO cs.AI

    Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

    Authors: Caleb Chuck, Carl Qi, Michael J. Munje, Shuozhe Li, Max Rudolph, Chang Shi, Siddhant Agarwal, Harshit Sikchi, Abhinav Peri, Sarthak Dayal, Evan Kuo, Kavan Mehta, Anthony Wang, Peter Stone, Amy Zhang, Scott Niekum

    Abstract: Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  17. arXiv:2404.16478  [pdf, other

    cs.CL cs.AI

    Evaluating Consistency and Reasoning Capabilities of Large Language Models

    Authors: Yash Saxena, Sarthak Chopra, Arunendra Mani Tripathi

    Abstract: Large Language Models (LLMs) are extensively used today across various sectors, including academia, research, business, and finance, for tasks such as text generation, summarization, and translation. Despite their widespread adoption, these models often produce incorrect and misleading information, exhibiting a tendency to hallucinate. This behavior can be attributed to several factors, with consi… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  18. arXiv:2404.08855  [pdf, other

    cs.RO cs.LG

    WROOM: An Autonomous Driving Approach for Off-Road Navigation

    Authors: Dvij Kalaria, Shreya Sharma, Sarthak Bhagat, Haoru Xue, John M. Dolan

    Abstract: Off-road navigation is a challenging problem both at the planning level to get a smooth trajectory and at the control level to avoid flip** over, hitting obstacles, or getting stuck at a rough patch. There have been several recent works using classical approaches involving depth map prediction followed by smooth trajectory planning and using a controller to track it. We design an end-to-end rein… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  19. arXiv:2404.05366  [pdf, other

    cs.CV

    CDAD-Net: Bridging Domain Gaps in Generalized Category Discovery

    Authors: Sai Bhargav Rongali, Sarthak Mehrotra, Ankit Jha, Mohamad Hassan N C, Shirsha Bose, Tanisha Gupta, Mainak Singha, Biplab Banerjee

    Abstract: In Generalized Category Discovery (GCD), we cluster unlabeled samples of known and novel classes, leveraging a training dataset of known classes. A salient challenge arises due to domain shifts between these datasets. To address this, we present a novel setting: Across Domain Generalized Category Discovery (AD-GCD) and bring forth CDAD-NET (Class Discoverer Across Domains) as a remedy. CDAD-NET is… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted in L3D-IVU, CVPR Workshop, 2024

  20. arXiv:2403.18062  [pdf, other

    cs.RO cs.AI

    ShapeGrasp: Zero-Shot Task-Oriented Gras** with Large Language Models through Geometric Decomposition

    Authors: Samuel Li, Sarthak Bhagat, Joseph Campbell, Yaqi Xie, Woojun Kim, Katia Sycara, Simon Stepputtis

    Abstract: Task-oriented gras** of unfamiliar objects is a necessary skill for robots in dynamic in-home environments. Inspired by the human capability to grasp such objects through intuition about their shape and structure, we present a novel zero-shot task-oriented gras** method leveraging a geometric decomposition of the target object into simple, convex shapes that we represent in a graph structure,… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 8 pages

  21. arXiv:2403.15582  [pdf, other

    cond-mat.quant-gas cs.DC eess.SP physics.atom-ph

    Fast real-time arbitrary waveform generation using graphic processing units

    Authors: Juntian Tu, Sarthak Subhankar

    Abstract: Real-time Arbitrary Waveform Generation (AWG) is essential in various engineering and research applications, and often requires complex bespoke hardware and software. This paper introduces an AWG framework using an NVIDIA Graphics Processing Unit (GPU) and a commercially available high-speed Digital-to-Analog Converter (DAC) card, both running on a desktop personal computer (PC). The GPU accelerat… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 13 pages, 10 figures

  22. arXiv:2403.12419  [pdf, ps, other

    cs.IT

    Sparsity-Constrained Community-Based Group Testing

    Authors: Sarthak Jain, Martina Cardone, Soheil Mohajer

    Abstract: In this work, we consider the sparsity-constrained community-based group testing problem, where the population follows a community structure. In particular, the community consists of $F$ families, each with $M$ members. A number $k_f$ out of the $F$ families are infected, and a family is said to be infected if $k_m$ out of its $M$ members are infected. Furthermore, the sparsity constraint allows a… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  23. arXiv:2403.10663  [pdf, other

    cs.CR cs.CV cs.LG

    Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data

    Authors: Yuxuan Li, Sarthak Kumar Maharana, Yunhui Guo

    Abstract: With the increasing prevalence of Machine Learning as a Service (MLaaS) platforms, there is a growing focus on deep neural network (DNN) watermarking techniques. These methods are used to facilitate the verification of ownership for a target DNN model to protect intellectual property. One of the most widely employed watermarking techniques involves embedding a trigger set into the source model. Un… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  24. arXiv:2403.10650  [pdf, other

    cs.CV cs.LG

    PALM: Pushing Adaptive Learning Rate Mechanisms for Continual Test-Time Adaptation

    Authors: Sarthak Kumar Maharana, Baoming Zhang, Yunhui Guo

    Abstract: Real-world vision models in dynamic environments face rapid shifts in domain distributions, leading to decreased recognition performance. Continual test-time adaptation (CTTA) directly adjusts a pre-trained source discriminative model to these changing domains using test data. A highly effective CTTA method involves applying layer-wise adaptive learning rates, and selectively adapting pre-trained… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  25. arXiv:2403.06326  [pdf, other

    cs.CL cs.AI cs.LG

    From Instructions to Constraints: Language Model Alignment with Automatic Constraint Verification

    Authors: Fei Wang, Chao Shang, Sarthak Jain, Shuai Wang, Qiang Ning, Bonan Min, Vittorio Castelli, Yassine Benajiba, Dan Roth

    Abstract: User alignment is crucial for adapting general-purpose language models (LMs) to downstream tasks, but human annotations are often not available for all types of instructions, especially those with customized constraints. We observe that user instructions typically contain constraints. While assessing response quality in terms of the whole instruction is often costly, efficiently evaluating the sat… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  26. arXiv:2402.19469  [pdf, other

    cs.RO cs.CV cs.LG

    Humanoid Locomotion as Next Token Prediction

    Authors: Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik

    Abstract: We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories. To account for the multi-modal nature of the data, we perform prediction in a modality-aligned way, and for each input token predict the next token from the same modality. This gen… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  27. arXiv:2402.10783  [pdf, other

    cs.DS cs.CC

    On Permutation Selectors and their Applications in Ad-Hoc Radio Networks Protocols

    Authors: Jordan Kuschner, Yugarshi Shashwat, Sarthak Yadav, Marek Chrobak

    Abstract: Selective families of sets, or selectors, are combinatorial tools used to "isolate" individual members of sets from some set family. Given a set $X$ and an element $x\in X$, to isolate $x$ from $X$, at least one of the sets in the selector must intersect $X$ on exactly $x$. We study (k,N)-permutation selectors which have the property that they can isolate each element of each $k$-element subset of… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 9 pages, 2 figures

  28. arXiv:2402.10202  [pdf, other

    cs.LG

    Bridging Associative Memory and Probabilistic Modeling

    Authors: Rylan Schaeffer, Nika Zahedi, Mikail Khona, Dhruv Pai, Sang Truong, Yilun Du, Mitchell Ostrow, Sarthak Chandra, Andres Carranza, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo

    Abstract: Associative memory and probabilistic modeling are two fundamental topics in artificial intelligence. The first studies recurrent neural networks designed to denoise, complete and retrieve data, whereas the second studies learning and sampling from probability distributions. Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log like… ▽ More

    Submitted 13 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  29. arXiv:2402.07640  [pdf, other

    cs.MM cs.AI

    CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis

    Authors: Puneet Kumar, Sarthak Malik, Balasubramanian Raman, Xiaobai Li

    Abstract: The Controllable Multimodal Feedback Synthesis (CMFeed) dataset enables the generation of sentiment-controlled feedback from multimodal inputs. It contains images, text, human comments, comments' metadata and sentiment labels. Existing datasets for related tasks such as multimodal summarization, visual question answering, visual dialogue, and sentiment-aware text generation do not incorporate trai… ▽ More

    Submitted 5 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  30. arXiv:2402.07498  [pdf, other

    cs.LG

    Accelerated Smoothing: A Scalable Approach to Randomized Smoothing

    Authors: Devansh Bhardwaj, Kshitiz Kaushik, Sarthak Gupta

    Abstract: Randomized smoothing has emerged as a potent certifiable defense against adversarial attacks by employing smoothing noises from specific distributions to ensure the robustness of a smoothed classifier. However, the utilization of Monte Carlo sampling in this process introduces a compute-intensive element, which constrains the practicality of randomized smoothing on a larger scale. To address this… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  31. arXiv:2402.06121  [pdf, other

    cs.LG stat.ML

    Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

    Authors: Tara Akhound-Sadegh, Jarrid Rector-Brooks, Avishek Joey Bose, Sarthak Mittal, Pablo Lemos, Cheng-Hao Liu, Marcin Sendera, Siamak Ravanbakhsh, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Alexander Tong

    Abstract: Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and… ▽ More

    Submitted 26 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Published at ICML 2024. Code for iDEM is available at https://github.com/jarridrb/dem

  32. arXiv:2402.06030  [pdf, other

    cs.LG cs.AI

    Game-theoretic Counterfactual Explanation for Graph Neural Networks

    Authors: Chirag Chhablani, Sarthak Jain, Akshay Channesh, Ian A. Kash, Sourav Medya

    Abstract: Graph Neural Networks (GNNs) have been a powerful tool for node classification tasks in complex networks. However, their decision-making processes remain a black-box to users, making it challenging to understand the reasoning behind their predictions. Counterfactual explanations (CFE) have shown promise in enhancing the interpretability of machine learning models. Prior approaches to compute CFE f… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted to WWW 2024

  33. arXiv:2402.05098  [pdf, other

    cs.LG stat.ML

    Improved off-policy training of diffusion samplers

    Authors: Marcin Sendera, Minsu Kim, Sarthak Mittal, Pablo Lemos, Luca Scimeca, Jarrid Rector-Brooks, Alexandre Adam, Yoshua Bengio, Nikolay Malkin

    Abstract: We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into… ▽ More

    Submitted 26 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 24 pages; changed title from v2; code: https://github.com/GFNOrg/gfn-diffusion

  34. arXiv:2312.15010  [pdf, other

    cs.CV

    SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology

    Authors: Saarthak Kapse, Pushpak Pati, Srijan Das, **gwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi R. Gupta, Prateek Prasanna

    Abstract: Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging, given the complexity of gigapixel slides. Traditionally, MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks, offering little insight to the end-user (pathologist) regarding the rationale behind these selectio… ▽ More

    Submitted 18 May, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

  35. arXiv:2312.14461  [pdf, other

    cs.CR cs.AI cs.LG

    Attacking Byzantine Robust Aggregation in High Dimensions

    Authors: Sarthak Choudhary, Aashish Kolluri, Prateek Saxena

    Abstract: Training modern neural networks or models typically requires averaging over a sample of high-dimensional vectors. Poisoning attacks can skew or bias the average vectors used to train the model, forcing the model to learn specific patterns or avoid learning anything useful. Byzantine robust aggregation is a principled algorithmic defense against such biasing. Robust aggregators can bound the maximu… ▽ More

    Submitted 19 April, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

  36. arXiv:2312.12608  [pdf, other

    cs.LG cs.CR stat.ML

    Trust, But Verify: A Survey of Randomized Smoothing Techniques

    Authors: Anupriya Kumari, Devansh Bhardwaj, Sukrit **dal, Sarthak Gupta

    Abstract: Machine learning models have demonstrated remarkable success across diverse domains but remain vulnerable to adversarial attacks. Empirical defence mechanisms often fall short, as new attacks constantly emerge, rendering existing defences obsolete. A paradigm shift from empirical defences to certification-based defences has been observed in response. Randomized smoothing has emerged as a promising… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  37. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  38. arXiv:2312.07330  [pdf, other

    cs.CV

    Learned representation-guided diffusion models for large-image generation

    Authors: Alexandros Graikos, Srikar Yellapragada, Minh-Quan Le, Saarthak Kapse, Prateek Prasanna, Joel Saltz, Dimitris Samaras

    Abstract: To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning… ▽ More

    Submitted 28 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  39. arXiv:2312.04429  [pdf, other

    cs.CV

    Approximate Caching for Efficiently Serving Diffusion Models

    Authors: Shubham Agarwal, Subrata Mitra, Sarthak Chakraborty, Srikrishna Karanam, Koyel Mukherjee, Shiv Saini

    Abstract: Text-to-image generation using diffusion models has seen explosive popularity owing to their ability in producing high quality images adhering to text prompts. However, production-grade diffusion model serving is a resource intensive task that not only require high-end GPUs which are expensive but also incurs considerable latency. In this paper, we introduce a technique called approximate-caching… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted at NSDI'24

  40. arXiv:2312.02608  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Panoptica -- instance-wise evaluation of 3D semantic and instance segmentation maps

    Authors: Florian Kofler, Hendrik Möller, Josef A. Buchner, Ezequiel de la Rosa, Ivan Ezhov, Marcel Rosier, Isra Mekki, Suprosanna Shit, Moritz Negwer, Rami Al-Maskari, Ali Ertürk, Shankeeth Vinayahalingam, Fabian Isensee, Sarthak Pati, Daniel Rueckert, Jan S. Kirschke, Stefan K. Ehrlich, Annika Reinke, Bjoern Menze, Benedikt Wiestler, Marie Piraud

    Abstract: This paper introduces panoptica, a versatile and performance-optimized package designed for computing instance-wise segmentation quality metrics from 2D and 3D segmentation maps. panoptica addresses the limitations of existing metrics and provides a modular framework that complements the original intersection over union-based panoptic quality with other metrics, such as the distance metric Average… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: 15 pages, 6 figures, 3 tables

  41. arXiv:2312.01238  [pdf, other

    cs.LG stat.AP stat.CO stat.ME stat.ML

    A deep learning pipeline for cross-sectional and longitudinal multiview data integration

    Authors: Sarthak Jain, Sandra E. Safo

    Abstract: Biomedical research now commonly integrates diverse data types or views from the same individuals to better understand the pathobiology of complex diseases, but the challenge lies in meaningfully integrating these diverse views. Existing methods often require the same type of data from all views (cross-sectional data only or longitudinal data only) or do not consider any class outcome in the integ… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  42. arXiv:2311.16052  [pdf, other

    cs.CV

    Exploring Attribute Variations in Style-based GANs using Diffusion Models

    Authors: Rishubh Parihar, Prasanna Balaji, Raghav Magazine, Sarthak Vora, Tejan Karmali, Varun Jampani, R. Venkatesh Babu

    Abstract: Existing attribute editing methods treat semantic attributes as binary, resulting in a single edit per attribute. However, attributes such as eyeglasses, smiles, or hairstyles exhibit a vast range of diversity. In this work, we formulate the task of \textit{diverse attribute editing} by modeling the multidimensional nature of attribute edits. This enables users to generate multiple plausible edits… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Neurips Workshop on Diffusion Models 2023

  43. arXiv:2310.16370  [pdf, other

    cs.DC

    PartRePer-MPI: Combining Fault Tolerance and Performance for MPI Applications

    Authors: Sarthak Joshi, Sathish Vadhiyar

    Abstract: As we have entered Exascale computing, the faults in high-performance systems are expected to increase considerably. To compensate for a higher failure rate, the standard checkpoint/restart technique would need to create checkpoints at a much higher frequency resulting in an excessive amount of overhead which would not be sustainable for many scientific applications. Replication allows for fast re… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  44. arXiv:2310.12860  [pdf, other

    cs.CL cs.CY

    Probing LLMs for hate speech detection: strengths and vulnerabilities

    Authors: Sarthak Roy, Ashish Harshavardhan, Animesh Mukherjee, Punyajoy Saha

    Abstract: Recently efforts have been made by social media platforms as well as researchers to detect hateful or toxic language using large language models. However, none of these works aim to use explanation, additional context and victim community information in the detection process. We utilise different prompt variation, input information and evaluate large language models in zero shot setting (without a… ▽ More

    Submitted 28 October, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: 13 pages, 9 figures, 7 tables, accepted to findings of EMNLP 2023

  45. arXiv:2310.01991  [pdf, other

    cs.CL cs.AI cs.LG

    Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems

    Authors: Aniruddha Deb, Neeva Oza, Sarthak Singla, Dinesh Khandelwal, Dinesh Garg, Parag Singla

    Abstract: While forward reasoning (i.e., find the answer given the question) has been explored extensively in recent literature, backward reasoning is relatively unexplored. We examine the backward reasoning capabilities of LLMs on Math Word Problems (MWPs): given a mathematical question and its answer, with some details omitted from the question, can LLMs effectively retrieve the missing information? On mo… ▽ More

    Submitted 7 July, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: 10 pages, 4 figures

    ACM Class: I.2.3

  46. Outage-Watch: Early Prediction of Outages using Extreme Event Regularizer

    Authors: Shubham Agarwal, Sarthak Chakraborty, Shaddy Garg, Sumit Bisht, Chahat Jain, Ashritha Gonuguntla, Shiv Saini

    Abstract: Cloud services are omnipresent and critical cloud service failure is a fact of life. In order to retain customers and prevent revenue loss, it is important to provide high reliability guarantees for these services. One way to do this is by predicting outages in advance, which can help in reducing the severity as well as time to recovery. It is difficult to forecast critical failures due to the rar… ▽ More

    Submitted 10 November, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted to ESEC/FSE 2023

  47. arXiv:2309.07230  [pdf, other

    cs.SE

    ESRO: Experience Assisted Service Reliability against Outages

    Authors: Sarthak Chakraborty, Shubham Agarwal, Shaddy Garg, Abhimanyu Sethia, Udit Narayan Pandey, Videh Aggarwal, Shiv Saini

    Abstract: Modern cloud services are prone to failures due to their complex architecture, making diagnosis a critical process. Site Reliability Engineers (SREs) spend hours leveraging multiple sources of data, including the alerts, error logs, and domain expertise through past experiences to locate the root cause(s). These experiences are documented as natural language text in outage reports for previous out… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted to 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023)

  48. arXiv:2309.06439  [pdf, other

    cs.CV

    Attention De-sparsification Matters: Inducing Diversity in Digital Pathology Representation Learning

    Authors: Saarthak Kapse, Srijan Das, **gwei Zhang, Rajarsi R. Gupta, Joel Saltz, Dimitris Samaras, Prateek Prasanna

    Abstract: We propose DiRL, a Diversity-inducing Representation Learning technique for histopathology imaging. Self-supervised learning techniques, such as contrastive and non-contrastive approaches, have been shown to learn rich and effective representations of digitized tissue samples with limited pathologist supervision. Our analysis of vanilla SSL-pretrained models' attention distribution reveals an insi… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  49. arXiv:2309.05943  [pdf, other

    cs.CV cs.AI

    Knowledge-Guided Short-Context Action Anticipation in Human-Centric Videos

    Authors: Sarthak Bhagat, Simon Stepputtis, Joseph Campbell, Katia Sycara

    Abstract: This work focuses on anticipating long-term human actions, particularly using short video segments, which can speed up editing workflows through improved suggestions while fostering creativity by suggesting narratives. To this end, we imbue a transformer network with a symbolic knowledge graph for action anticipation in video segments by boosting certain aspects of the transformer's attention mech… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: ICCV 2023 Workshop on AI for Creative Video Editing and Understanding

  50. arXiv:2309.05668  [pdf, other

    cs.CL cs.AI

    Studying the impacts of pre-training using ChatGPT-generated text on downstream tasks

    Authors: Sarthak Anand

    Abstract: In recent times, significant advancements have been witnessed in the field of language models, particularly with the emergence of Large Language Models (LLMs) that are trained on vast amounts of data extracted from internet archives. These LLMs, such as ChatGPT, have become widely accessible, allowing users to generate text for various purposes including articles, essays, jokes, and poetry. Given… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: Master's thesis