Search | arXiv e-print repository

Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

Authors: Ziyun Cui, Chang Lei, Wen Wu, Yinan Duan, Diyang Qu, Ji Wu, Runsen Chen, Chao Zhang

Abstract: The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse… ▽ More The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse acoustic and linguistic features embedded in spontaneous speech, both the Whisper speech model and textual large language models (LLMs) are used for suicide risk detection. Both all-parameter finetuning and parameter-efficient finetuning approaches are used to adapt the pre-trained models for suicide risk detection, and multiple audio-text fusion approaches are evaluated to combine the representations of Whisper and the LLM. The proposed system achieves a detection accuracy of 0.807 and an F1-score of 0.846 on the test set with 119 subjects, indicating promising potential for real suicide risk detection applications. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2304.10295 [pdf, other]

Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation

Authors: Hao Zhang, Nianwen Si, Yaqi Chen, Wenlin Zhang, Xukui Yang, Dan Qu, Zhen Li

Abstract: Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training. However, transcriptions are not always available, and how to improve the ST model performance without transcription, i.e., data efficiency, has rarely been studied in… ▽ More Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training. However, transcriptions are not always available, and how to improve the ST model performance without transcription, i.e., data efficiency, has rarely been studied in the literature. In this paper, we propose Decoupled Non-parametric Knowledge Distillation (DNKD) from data perspective to improve the data efficiency. Our method follows the knowledge distillation paradigm. However, instead of obtaining the teacher distribution from a sophisticated MT model, we construct it from a non-parametric datastore via k-Nearest-Neighbor (kNN) retrieval, which removes the dependence on transcription and MT model. Then we decouple the classic knowledge distillation loss into target and non-target distillation to enhance the effect of the knowledge among non-target logits, which is the prominent "dark knowledge". Experiments on MuST-C corpus show that, the proposed method can achieve consistent improvement over the strong baseline without requiring any transcription. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: Accepted by ICASSP 2023

arXiv:2006.04314 [pdf, ps, other]

Machine Learning Enabled Preamble Collision Resolution in Distributed Massive MIMO

Authors: Jie Ding, Daiming Qu, Pei Liu, **ho Choi

Abstract: Preamble collision is a bottleneck that impairs the performance of random access (RA) user equipment (UE) in grant-free RA (GFRA). In this paper, by leveraging distributed massive multiple input multiple output (mMIMO) together with machine learning, a novel machine learning based framework solution is proposed to address the preamble collision problem in GFRA. The key idea is to identify and empl… ▽ More Preamble collision is a bottleneck that impairs the performance of random access (RA) user equipment (UE) in grant-free RA (GFRA). In this paper, by leveraging distributed massive multiple input multiple output (mMIMO) together with machine learning, a novel machine learning based framework solution is proposed to address the preamble collision problem in GFRA. The key idea is to identify and employ the neighboring access points (APs) of a collided RA UE for its data decoding rather than all the APs, so that the mutual interference among collided RA UEs can be effectively mitigated. To this end, we first design a tailored deep neural network (DNN) to enable the preamble multiplicity estimation in GFRA, where an energy detection (ED) method is also proposed for performance comparison. With the estimated preamble multiplicity, we then propose a K-means AP clustering algorithm to cluster the neighboring APs of collided RA UEs and organize each AP cluster to decode the received data individually. Simulation results show that a decent performance of preamble multiplicity estimation in terms of accuracy and reliability can be achieved by the proposed DNN, and confirm that the proposed schemes are effective in preamble collision resolution in GFRA, which are able to achieve a near-optimal performance in terms of uplink achievable rate per collided RA UE, and offer significant performance improvement over traditional schemes. △ Less

Submitted 27 December, 2020; v1 submitted 7 June, 2020; originally announced June 2020.

Comments: 31 pages, 7 figures

arXiv:1905.00005 [pdf, ps, other]

Optimal Preamble Length for Spectral Efficiency in Grant-Free RA with Massive MIMO

Authors: Jie Ding, Daiming Qu, Hao Jiang

Abstract: Grant-free random access (RA) with massive MIMO is a promising RA technique for massive access with low signaling overhead. In the grant-free RA with massive MIMO, preamble length has a critical impact on the performance of the system. In this paper, the optimal preamble length is investigated to maximize spectral efficiency (SE) of the grant-free RA with massive MIMO, where effects of the preambl… ▽ More Grant-free random access (RA) with massive MIMO is a promising RA technique for massive access with low signaling overhead. In the grant-free RA with massive MIMO, preamble length has a critical impact on the performance of the system. In this paper, the optimal preamble length is investigated to maximize spectral efficiency (SE) of the grant-free RA with massive MIMO, where effects of the preamble length on the preamble collision and preamble overhead as well as channel estimation accuracy are taken into account. Simulation results agree well with our analyses and confirm the existence of optimal preamble length for SE maximization in the grant-free RA with massive MIMO. Moreover, properties of the optimal preamble length with respect to system parameters are revealed. Compared to the granted access, it is shown that longer preamble length is required for SE maximization in the grant-free RA. △ Less

Submitted 29 April, 2019; originally announced May 2019.

Comments: Accepted By IEEE ICEIC 2019. arXiv admin note: text overlap with arXiv:1805.08345

arXiv:1806.06994 [pdf, ps, other]

Smoothed SVD-based Beamforming for FBMC/OQAM Systems Based on Frequency Spreading

Authors: Yu Qiu, Daiming Qu, Da Chen, Tao Jiang

Abstract: The combination of singular value decomposition (SVD)-based beamforming and filter bank multicarrier with offset quadrature amplitude modulation (FBMC/OQAM) has not been successful to date. The difficulty of this combination is that, the beamformers may experience significant changes between adjacent subchannels, therefore destroy the orthogonality among FBMC/OQAM real-valued symbols, even under c… ▽ More The combination of singular value decomposition (SVD)-based beamforming and filter bank multicarrier with offset quadrature amplitude modulation (FBMC/OQAM) has not been successful to date. The difficulty of this combination is that, the beamformers may experience significant changes between adjacent subchannels, therefore destroy the orthogonality among FBMC/OQAM real-valued symbols, even under channels with moderate frequency selectivity. In this paper, we address this problem from two aspects: i) an SVD-FS-FBMC architecture is adopted to support beamforming with finer granularity in frequency domain, based on the frequency spreading FBMC (FS-FBMC) structure, i.e., beamforming on FS-FBMC tones rather than on subchannels; ii) criterion and methods are proposed to smooth the beamformers from tone to tone. The proposed finer beamforming and smoothing greatly improve the smoothness of beamformers, therefore effectively suppress the leaked ICI/ISI. Simulations are conducted under the scenario of IEEE 802.11n wireless LAN. Results show that the proposed SVD-FS-FBMC system shares close BER performance with its orthogonal frequency division multiplexing (OFDM) counterpart under the frequency selective channels. △ Less

Submitted 18 June, 2018; originally announced June 2018.

Showing 1–5 of 5 results for author: Qu, D