Skip to main content

Showing 1–18 of 18 results for author: Phukan, O C

.
  1. arXiv:2406.10448  [pdf, other

    eess.AS cs.SD

    AVR: Synergizing Foundation Models for Audio-Visual Humor Detection

    Authors: Sarthak Sharma, Orchid Chetia Phukan, Drishti Singh, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this work, we present, AVR application for audio-visual humor detection. While humor detection has traditionally centered around textual analysis, recent advancements have spotlighted multimodal approaches. However, these methods lean on textual cues as a modality, necessitating the use of ASR systems for transcribing the audio-data. This heavy reliance on ASR accuracy can pose challenges in re… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024 Show & Tell Demonstrations

  2. arXiv:2406.09156  [pdf, other

    cs.LG cs.CV cs.MM cs.SD eess.AS

    Towards Multilingual Audio-Visual Question Answering

    Authors: Orchid Chetia Phukan, Priyabrata Mallick, Swarup Ranjan Behera, Aalekhya Satya Narayani, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this paper, we work towards extending Audio-Visual Question Answering (AVQA) to multilingual settings. Existing AVQA research has predominantly revolved around English and replicating it for addressing AVQA in other languages requires a substantial allocation of resources. As a scalable solution, we leverage machine translation and present two multilingual AVQA datasets for eight languages crea… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

    MSC Class: 68T45

  3. arXiv:2406.06798  [pdf, other

    eess.AS cs.SD

    The Reasonable Effectiveness of Speaker Embeddings for Violence Detection

    Authors: Sarthak Jain, Orchid Chetia Phukan, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this paper, we focus on audio violence detection (AVD). AVD is necessary for several reasons, especially in the context of maintaining safety, preventing harm, and ensuring security in various environments. This calls for accurate AVD systems. Like many related applications in audio processing, the most common approach for improving the performance, would be by leveraging self-supervised (SSL)… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 24 Show & Tell Demonstrations

  4. arXiv:2406.06781  [pdf, other

    eess.AS cs.SD

    PERSONA: An Application for Emotion Recognition, Gender Recognition and Age Estimation

    Authors: Devyani Koshal, Orchid Chetia Phukan, Sarthak Jain, Arun Balaji Buduru, Rajesh Sharma

    Abstract: Emotion Recognition (ER), Gender Recognition (GR), and Age Estimation (AE) constitute paralinguistic tasks that rely not on the spoken content but primarily on speech characteristics such as pitch and tone. While previous research has made significant strides in develo** models for each task individually, there has been comparatively less emphasis on concurrently learning these tasks, despite th… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024 Show & Tell Demonstrations

  5. arXiv:2406.06774  [pdf, other

    eess.AS cs.SD

    ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection

    Authors: Orchid Chetia Phukan, Sarthak Jain, Shubham Singh, Muskaan Singh, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this work, we focus on the detection of depression through speech analysis. Previous research has widely explored features extracted from pre-trained models (PTMs) primarily trained for paralinguistic tasks. Although these features have led to sufficient advances in speech-based depression detection, their performance declines in real-world settings. To address this, in this paper, we introduce… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024 Show & Tell Demonstrations

  6. arXiv:2406.03514  [pdf, other

    eess.AS

    NeuRO: An Application for Code-Switched Autism Detection in Children

    Authors: Mohd Mujtaba Akhtar, Girish, Orchid Chetia Phukan, Muskaan Singh

    Abstract: Code-switching is a common communication phenomenon where individuals alternate between two or more languages or linguistic styles within a single conversation. Autism Spectrum Disorder (ASD) is a developmental disorder posing challenges in social interaction, communication, and repetitive behaviors. Detecting ASD in individuals with code-switch scenario presents unique challenges. In this paper,… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 24 Show & Tell Demonstrations

  7. arXiv:2406.03205  [pdf, other

    eess.AS

    CoLLAB: A Collaborative Approach for Multilingual Abuse Detection

    Authors: Orchid Chetia Phukan, Yashasvi Chaurasia, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this study, we investigate representations from paralingual Pre-Trained model (PTM) for Audio Abuse Detection (AAD), which has not been explored for AAD. Our results demonstrate their superiority compared to other PTM representations on the ADIMA benchmark. Furthermore, combining PTM representations enhances AAD performance. Despite these improvements, challenges with cross-lingual generalizabi… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  8. arXiv:2404.00827  [pdf, other

    eess.SP

    SONIC: Synergizing VisiON Foundation Models for Stress RecogNItion from ECG signals

    Authors: Orchid Chetia Phukan, Ankita Das, Arun Balaji Buduru, Rajesh Sharma

    Abstract: Stress recognition through physiological signals such as Electrocardiogram (ECG) signals has garnered significant attention. Traditionally, research in this field predominantly focused on utilizing handcrafted features or raw signals as inputs for learning algorithms. However, there is now a burgeoning interest within the community in leveraging large-scale vision foundation models (VFMs) like Res… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  9. arXiv:2404.00809  [pdf, other

    eess.AS

    Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake

    Authors: Orchid Chetia Phukan, Gautam Siddharth Kashyap, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this work, we investigate multilingual speech Pre-Trained models (PTMs) for Audio deepfake detection (ADD). We hypothesize that multilingual PTMs trained on large-scale diverse multilingual data gain knowledge about diverse pitches, accents, and tones, during their pre-training phase and making them more robust to variations. As a result, they will be more effective for detecting audio deepfake… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted to NAACL (Findings) 2024

  10. arXiv:2402.01579  [pdf, other

    eess.AS cs.CL cs.SD

    Are Paralinguistic Representations all that is needed for Speech Emotion Recognition?

    Authors: Orchid Chetia Phukan, Gautam Siddharth Kashyap, Arun Balaji Buduru, Rajesh Sharma

    Abstract: Availability of representations from pre-trained models (PTMs) have facilitated substantial progress in speech emotion recognition (SER). Particularly, representations from PTM trained for paralinguistic speech processing have shown state-of-the-art (SOTA) performance for SER. However, such paralinguistic PTM representations haven't been evaluated for SER in linguistic environments other than Engl… ▽ More

    Submitted 11 July, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted to INTERSPEECH 24

  11. arXiv:2401.05968  [pdf, other

    cs.CV

    A Lightweight Feature Fusion Architecture For Resource-Constrained Crowd Counting

    Authors: Yashwardhan Chaudhuri, Ankit Kumar, Orchid Chetia Phukan, Arun Balaji Buduru

    Abstract: Crowd counting finds direct applications in real-world situations, making computational efficiency and performance crucial. However, most of the previous methods rely on a heavy backbone and a complex downstream architecture that restricts the deployment. To address this challenge and enhance the versatility of crowd-counting models, we introduce two lightweight models. These models maintain the s… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  12. arXiv:2311.16958  [pdf

    cs.NE

    From Simulations to Reality: Enhancing Multi-Robot Exploration for Urban Search and Rescue

    Authors: Gautam Siddharth Kashyap, Deepkashi Mahajan, Orchid Chetia Phukan, Ankit Kumar, Alexander E. I. Brownlee, Jiechao Gao

    Abstract: In this study, we present a novel hybrid algorithm, combining Levy Flight (LF) and Particle Swarm Optimization (PSO) (LF-PSO), tailored for efficient multi-robot exploration in unknown environments with limited communication and no global positioning information. The research addresses the growing interest in employing multiple autonomous robots for exploration tasks, particularly in scenarios suc… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  13. arXiv:2310.07613  [pdf, other

    cs.AI cs.CY

    Reinforcement Learning-based Knowledge Graph Reasoning for Explainable Fact-checking

    Authors: Gustav Nikopensius, Mohit Mayank, Orchid Chetia Phukan, Rajesh Sharma

    Abstract: Fact-checking is a crucial task as it ensures the prevention of misinformation. However, manual fact-checking cannot keep up with the rate at which false information is generated and disseminated online. Automated fact-checking by machines is significantly quicker than by humans. But for better trust and transparency of these automated systems, explainability in the fact-checking process is necess… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted to ASONAM 2023

  14. arXiv:2306.10338  [pdf, other

    cs.CY

    Trauma lurking in the shadows: A Reddit case study of mental health issues in online posts about Childhood Sexual Abuse

    Authors: Orchid Chetia Phukan, Rajesh Sharma, Arun Balaji Buduru

    Abstract: Childhood Sexual Abuse (CSA) is a menace to society and has long-lasting effects on the mental health of the survivors. From time to time CSA survivors are haunted by various mental health issues in their lifetime. Proper care and attention towards CSA survivors facing mental health issues can drastically improve the mental health conditions of CSA survivors. Previous works leveraging online socia… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

  15. arXiv:2306.02308  [pdf

    cs.NE

    Roulette-Wheel Selection-Based PSO Algorithm for Solving the Vehicle Routing Problem with Time Windows

    Authors: Gautam Siddharth Kashyap, Alexander E. I. Brownlee, Orchid Chetia Phukan, Karan Malik, Samar Wazir

    Abstract: The well-known Vehicle Routing Problem with Time Windows (VRPTW) aims to reduce the cost of moving goods between several destinations while accommodating constraints like set time windows for certain locations and vehicle capacity. Applications of the VRPTW problem in the real world include Supply Chain Management (SCM) and logistic dispatching, both of which are crucial to the economy and are exp… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

  16. arXiv:2305.18640  [pdf, other

    eess.AS

    Transforming the Embeddings: A Lightweight Technique for Speech Emotion Recognition Tasks

    Authors: Orchid Chetia Phukan, Arun Balaji Buduru, Rajesh Sharma

    Abstract: Speech emotion recognition (SER) is a field that has drawn a lot of attention due to its applications in diverse fields. A current trend in methods used for SER is to leverage embeddings from pre-trained models (PTMs) as input features to downstream models. However, the use of embeddings from speaker recognition PTMs hasn't garnered much focus in comparison to other PTM embeddings. To fill this ga… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023

  17. arXiv:2304.11472  [pdf, other

    eess.AS cs.AI cs.LG

    A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition

    Authors: Orchid Chetia Phukan, Arun Balaji Buduru, Rajesh Sharma

    Abstract: Pre-trained models (PTMs) have shown great promise in the speech and audio domain. Embeddings leveraged from these models serve as inputs for learning algorithms with applications in various downstream tasks. One such crucial task is Speech Emotion Recognition (SER) which has a wide range of applications, including dynamic analysis of customer calls, mental health assessment, and personalized lang… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

  18. arXiv:2304.10512  [pdf, other

    cs.LG cs.CL cs.SI

    "Can We Detect Substance Use Disorder?": Knowledge and Time Aware Classification on Social Media from Darkweb

    Authors: Usha Lokala, Orchid Chetia Phukan, Triyasha Ghosh Dastidar, Francois Lamy, Raminta Daniulaityte, Amit Sheth

    Abstract: Opioid and substance misuse is rampant in the United States today, with the phenomenon known as the "opioid crisis". The relationship between substance use and mental health has been extensively studied, with one possible relationship being: substance misuse causes poor mental health. However, the lack of evidence on the relationship has resulted in opioids being largely inaccessible through legal… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.