Skip to main content

Showing 1–6 of 6 results for author: Mundnich, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.08317  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

    Authors: Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 9+6 pages, Submitted to ACL 2024

  2. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  3. arXiv:2102.05811  [pdf, other

    cs.CV eess.IV

    Audiovisual Highlight Detection in Videos

    Authors: Karel Mundnich, Alexandra Fenster, Aparna Khare, Shiva Sundaram

    Abstract: In this paper, we test the hypothesis that interesting events in unstructured videos are inherently audiovisual. We combine deep image representations for object recognition and scene understanding with representations from an audiovisual affect recognition model. To this set, we include content agnostic audio-visual synchrony representations and mel-frequency cepstral coefficients to capture othe… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Comments: 5 pages, 2 figures, conference paper

  4. arXiv:2003.08474  [pdf, other

    eess.SP cs.CY cs.HC stat.AP

    TILES-2018, a longitudinal physiologic and behavioral data set of hospital workers

    Authors: Karel Mundnich, Brandon M. Booth, Michelle L'Hommedieu, Tiantian Feng, Benjamin Girault, Justin L'Hommedieu, Mackenzie Wildman, Sophia Skaaden, Amrutha Nadarajan, Jennifer L. Villatte, Tiago H. Falk, Kristina Lerman, Emilio Ferrara, Shrikanth Narayanan

    Abstract: We present a novel longitudinal multimodal corpus of physiological and behavioral data collected from direct clinical providers in a hospital workplace. We designed the study to investigate the use of off-the-shelf wearable and environmental sensors to understand individual-specific constructs such as job performance, interpersonal interaction, and well-being of hospital workers over time in their… ▽ More

    Submitted 18 December, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: 57 pages, 9 figures, journal paper

    Journal ref: Sci Data 7, 354 (2020)

  5. arXiv:2003.05897  [pdf, other

    eess.AS cs.SD

    Bringing in the outliers: A sparse subspace clustering approach to learn a dictionary of mouse ultrasonic vocalizations

    Authors: Jiaxi Wang, Karel Mundnich, Allison T. Knoll, Pat Levitt, Shrikanth Narayanan

    Abstract: Mice vocalize in the ultrasonic range during social interactions. These vocalizations are used in neuroscience and clinical studies to tap into complex behaviors and states. The analysis of these ultrasonic vocalizations (USVs) has been traditionally a manual process, which is prone to errors and human bias, and is not scalable to large scale analysis. We propose a new method to automatically crea… ▽ More

    Submitted 12 March, 2020; originally announced March 2020.

    Comments: 5 pages, 4 figures, conference paper, accepted in ICASSP 2020

  6. arXiv:1911.03843  [pdf, other

    eess.AS cs.LG cs.SD

    Characterizing dynamically varying acoustic scenes from egocentric audio recordings in workplace setting

    Authors: Arindam Jati, Amrutha Nadarajan, Karel Mundnich, Shrikanth Narayanan

    Abstract: Devices capable of detecting and categorizing acoustic scenes have numerous applications such as providing context-aware user experiences. In this paper, we address the task of characterizing acoustic scenes in a workplace setting from audio recordings collected with wearable microphones. The acoustic scenes, tracked with Bluetooth transceivers, vary dynamically with time from the egocentric persp… ▽ More

    Submitted 9 November, 2019; originally announced November 2019.

    Comments: The paper is submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020