Search | arXiv e-print repository

Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?

Authors: Eklavya Sarkar, Mathew Magimai. -Doss

Abstract: Self-supervised learning (SSL) models use only the intrinsic structure of a given signal, independent of its acoustic domain, to extract essential information from the input to an embedding space. This implies that the utility of such representations is not limited to modeling human speech alone. Building on this understanding, this paper explores the cross-transferability of SSL neural representa… ▽ More Self-supervised learning (SSL) models use only the intrinsic structure of a given signal, independent of its acoustic domain, to extract essential information from the input to an embedding space. This implies that the utility of such representations is not limited to modeling human speech alone. Building on this understanding, this paper explores the cross-transferability of SSL neural representations learned from human speech to analyze bio-acoustic signals. We conduct a caller discrimination analysis and a caller detection study on Marmoset vocalizations using eleven SSL models pre-trained with various pretext tasks. The results show that the embedding spaces carry meaningful caller information and can successfully distinguish the individual identities of Marmoset callers without fine-tuning. This demonstrates that representations pre-trained on human speech can be effectively applied to the bio-acoustics domain, providing valuable insights for future investigations in this field. △ Less

Submitted 8 June, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted at Interspeech 2023

arXiv:2206.13420 [pdf, other]

doi 10.21437/Interspeech.2022-10535

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

Authors: Eklavya Sarkar, RaviShankar Prasad, Mathew Magimai. -Doss

Abstract: Voice activity detection (VAD) is an important pre-processing step for speech technology applications. The task consists of deriving segment boundaries of audio signals which contain voicing information. In recent years, it has been shown that voice source and vocal tract system information can be extracted using zero-frequency filtering (ZFF) without making any explicit model assumptions about th… ▽ More Voice activity detection (VAD) is an important pre-processing step for speech technology applications. The task consists of deriving segment boundaries of audio signals which contain voicing information. In recent years, it has been shown that voice source and vocal tract system information can be extracted using zero-frequency filtering (ZFF) without making any explicit model assumptions about the speech signal. This paper investigates the potential of zero-frequency filtering for jointly modeling voice source and vocal tract system information, and proposes two approaches for VAD. The first approach demarcates voiced regions using a composite signal composed of different zero-frequency filtered signals. The second approach feeds the composite signal as input to the rVAD algorithm. These approaches are compared with other supervised and unsupervised VAD methods in the literature, and are evaluated on the Aurora-2 database, across a range of SNRs (20 to -5 dB). Our studies show that the proposed ZFF-based methods perform comparable to state-of-art VAD methods and are more invariant to added degradation and different channel characteristics. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted at Interspeech 2022

arXiv:2101.00008 [pdf, other]

Explainability Matters: Backdoor Attacks on Medical Imaging

Authors: Munachiso Nwadike, Takumi Miyawaki, Esha Sarkar, Michail Maniatakos, Farah Shamout

Abstract: Deep neural networks have been shown to be vulnerable to backdoor attacks, which could be easily introduced to the training set prior to model training. Recent work has focused on investigating backdoor attacks on natural images or toy datasets. Consequently, the exact impact of backdoors is not yet fully understood in complex real-world applications, such as in medical imaging where misdiagnosis… ▽ More Deep neural networks have been shown to be vulnerable to backdoor attacks, which could be easily introduced to the training set prior to model training. Recent work has focused on investigating backdoor attacks on natural images or toy datasets. Consequently, the exact impact of backdoors is not yet fully understood in complex real-world applications, such as in medical imaging where misdiagnosis can be very costly. In this paper, we explore the impact of backdoor attacks on a multi-label disease classification task using chest radiography, with the assumption that the attacker can manipulate the training dataset to execute the attack. Extensive evaluation of a state-of-the-art architecture demonstrates that by introducing images with few-pixel perturbations into the training set, an attacker can execute the backdoor successfully without having to be involved with the training procedure. A simple 3$\times$3 pixel trigger can achieve up to 1.00 Area Under the Receiver Operating Characteristic (AUROC) curve on the set of infected images. In the set of clean images, the backdoored neural network could still achieve up to 0.85 AUROC, highlighting the stealthiness of the attack. As the use of deep learning based diagnostic systems proliferates in clinical practice, we also show how explainability is indispensable in this context, as it can identify spatially localized backdoors in inference time. △ Less

Submitted 30 December, 2020; originally announced January 2021.

arXiv:2003.07859 [pdf, other]

doi 10.1109/TIFS.2021.3114024

Stop-and-Go: Exploring Backdoor Attacks on Deep Reinforcement Learning-based Traffic Congestion Control Systems

Authors: Yue Wang, Esha Sarkar, Wenqing Li, Michail Maniatakos, Saif Eddin Jabari

Abstract: Recent work has shown that the introduction of autonomous vehicles (AVs) in traffic could help reduce traffic jams. Deep reinforcement learning methods demonstrate good performance in complex control problems, including autonomous vehicle control, and have been used in state-of-the-art AV controllers. However, deep neural networks (DNNs) render automated driving vulnerable to machine learning-base… ▽ More Recent work has shown that the introduction of autonomous vehicles (AVs) in traffic could help reduce traffic jams. Deep reinforcement learning methods demonstrate good performance in complex control problems, including autonomous vehicle control, and have been used in state-of-the-art AV controllers. However, deep neural networks (DNNs) render automated driving vulnerable to machine learning-based attacks. In this work, we explore the backdooring/trojanning of DRL-based AV controllers. We develop a trigger design methodology that is based on well-established principles of traffic physics. The malicious actions include vehicle deceleration and acceleration to cause stop-and-go traffic waves to emerge (congestion attacks) or AV acceleration resulting in the AV crashing into the vehicle in front (insurance attack). We test our attack on single-lane and two-lane circuits. Our experimental results show that the backdoored model does not compromise normal operation performance, with the maximum decrease in cumulative rewards being 1%. Still, it can be maliciously activated to cause a crash or congestion when the corresponding triggers appear. △ Less

Submitted 26 August, 2021; v1 submitted 17 March, 2020; originally announced March 2020.

Report number: 2021

Journal ref: IEEE Transactions on Information Forensics and Security, 2021

Showing 1–4 of 4 results for author: Sarkar, E