Search | arXiv e-print repository

The Vocal Signature of Social Anxiety: Exploration using Hypothesis-Testing and Machine-Learning Approaches

Authors: Or Alon-Ronen, Yosi Shrem, Yossi Keshet, Eva Gilboa-Schechtman

Abstract: Background - Social anxiety (SA) is a common and debilitating condition, negatively affecting life quality even at sub-diagnostic thresholds. We sought to characterize SA's acoustic signature using hypothesis-testing and machine learning (ML) approaches. Methods - Participants formed spontaneous utterances responding to instructions to refuse or consent to commands of alleged peers. Vocal properti… ▽ More Background - Social anxiety (SA) is a common and debilitating condition, negatively affecting life quality even at sub-diagnostic thresholds. We sought to characterize SA's acoustic signature using hypothesis-testing and machine learning (ML) approaches. Methods - Participants formed spontaneous utterances responding to instructions to refuse or consent to commands of alleged peers. Vocal properties (e.g., intensity and duration) of these utterances were analyzed. Results - Our prediction that, as compared to low-SA (n=31), high-SA (n=32) individuals exhibit a less confident vocal speech signature, especially with respect to refusal utterances, was only partially supported by the classical hypothesis-testing approach. However, the results of the ML analyses and specifically the decision tree classifier were consistent with such speech patterns in SA. Using a Gaussian Process (GP) classifier, we were able to distinguish between high- and low-SA individuals with high (75.6%) accuracy and good (.83 AUC) separability. We also expected and found that vocal properties differentiated between refusal and consent utterances. Conclusions - Our findings provide further support for the usefulness of ML approach for the study of psychopathology, highlighting the utility of develo** automatic techniques to create behavioral markers of SAD. Clinically, the simplicity and accessibility of these procedures may encourage people to seek professional help. △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2206.11632 [pdf, other]

Formant Estimation and Tracking using Probabilistic Heat-Maps

Authors: Yosi Shrem, Felix Kreuk, Joseph Keshet

Abstract: Formants are the spectral maxima that result from acoustic resonances of the human vocal tract, and their accurate estimation is among the most fundamental speech processing problems. Recent work has been shown that those frequencies can accurately be estimated using deep learning techniques. However, when presented with a speech from a different domain than that in which they have been trained on… ▽ More Formants are the spectral maxima that result from acoustic resonances of the human vocal tract, and their accurate estimation is among the most fundamental speech processing problems. Recent work has been shown that those frequencies can accurately be estimated using deep learning techniques. However, when presented with a speech from a different domain than that in which they have been trained on, these methods exhibit a decline in performance, limiting their usage as generic tools. The contribution of this paper is to propose a new network architecture that performs well on a variety of different speaker and speech domains. Our proposed model is composed of a shared encoder that gets as input a spectrogram and outputs a domain-invariant representation. Then, multiple decoders further process this representation, each responsible for predicting a different formant while considering the lower formant predictions. An advantage of our model is that it is based on heatmaps that generate a probability distribution over formant predictions. Results suggest that our proposed model better represents the signal over various domains and leads to better formant frequency tracking and estimation. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: interspeech 2022

arXiv:1910.13255 [pdf, other]

Dr.VOT : Measuring Positive and Negative Voice Onset Time in the Wild

Authors: Yosi Shrem, Matthew Goldrick, Joseph Keshet

Abstract: Voice Onset Time (VOT), a key measurement of speech for basic research and applied medical studies, is the time between the onset of a stop burst and the onset of voicing. When the voicing onset precedes burst onset the VOT is negative; if voicing onset follows the burst, it is positive. In this work, we present a deep-learning model for accurate and reliable measurement of VOT in naturalistic spe… ▽ More Voice Onset Time (VOT), a key measurement of speech for basic research and applied medical studies, is the time between the onset of a stop burst and the onset of voicing. When the voicing onset precedes burst onset the VOT is negative; if voicing onset follows the burst, it is positive. In this work, we present a deep-learning model for accurate and reliable measurement of VOT in naturalistic speech. The proposed system addresses two critical issues: it can measure positive and negative VOT equally well, and it is trained to be robust to variation across annotations. Our approach is based on the structured prediction framework, where the feature functions are defined to be RNNs. These learn to capture segmental variation in the signal. Results suggest that our method substantially improves over the current state-of-the-art. In contrast to previous work, our Deep and Robust VOT annotator, Dr.VOT, can successfully estimate negative VOTs while maintaining state-of-the-art performance on positive VOTs. This high level of performance generalizes to new corpora without further retraining. Index Terms: structured prediction, multi-task learning, adversarial training, recurrent neural networks, sequence segmentation. △ Less

Submitted 27 October, 2019; originally announced October 2019.

Comments: interspeech 2019

Journal ref: interspeech 2019

Showing 1–3 of 3 results for author: Shrem, Y