-
Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models
Authors:
Ziyun Cui,
Chang Lei,
Wen Wu,
Yinan Duan,
Diyang Qu,
Ji Wu,
Runsen Chen,
Chao Zhang
Abstract:
The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse…
▽ More
The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse acoustic and linguistic features embedded in spontaneous speech, both the Whisper speech model and textual large language models (LLMs) are used for suicide risk detection. Both all-parameter finetuning and parameter-efficient finetuning approaches are used to adapt the pre-trained models for suicide risk detection, and multiple audio-text fusion approaches are evaluated to combine the representations of Whisper and the LLM. The proposed system achieves a detection accuracy of 0.807 and an F1-score of 0.846 on the test set with 119 subjects, indicating promising potential for real suicide risk detection applications.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation
Authors:
Hao Zhang,
Nianwen Si,
Yaqi Chen,
Wenlin Zhang,
Xukui Yang,
Dan Qu,
Zhen Li
Abstract:
Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training. However, transcriptions are not always available, and how to improve the ST model performance without transcription, i.e., data efficiency, has rarely been studied in…
▽ More
Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training. However, transcriptions are not always available, and how to improve the ST model performance without transcription, i.e., data efficiency, has rarely been studied in the literature. In this paper, we propose Decoupled Non-parametric Knowledge Distillation (DNKD) from data perspective to improve the data efficiency. Our method follows the knowledge distillation paradigm. However, instead of obtaining the teacher distribution from a sophisticated MT model, we construct it from a non-parametric datastore via k-Nearest-Neighbor (kNN) retrieval, which removes the dependence on transcription and MT model. Then we decouple the classic knowledge distillation loss into target and non-target distillation to enhance the effect of the knowledge among non-target logits, which is the prominent "dark knowledge". Experiments on MuST-C corpus show that, the proposed method can achieve consistent improvement over the strong baseline without requiring any transcription.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Machine Learning Enabled Preamble Collision Resolution in Distributed Massive MIMO
Authors:
Jie Ding,
Daiming Qu,
Pei Liu,
**ho Choi
Abstract:
Preamble collision is a bottleneck that impairs the performance of random access (RA) user equipment (UE) in grant-free RA (GFRA). In this paper, by leveraging distributed massive multiple input multiple output (mMIMO) together with machine learning, a novel machine learning based framework solution is proposed to address the preamble collision problem in GFRA. The key idea is to identify and empl…
▽ More
Preamble collision is a bottleneck that impairs the performance of random access (RA) user equipment (UE) in grant-free RA (GFRA). In this paper, by leveraging distributed massive multiple input multiple output (mMIMO) together with machine learning, a novel machine learning based framework solution is proposed to address the preamble collision problem in GFRA. The key idea is to identify and employ the neighboring access points (APs) of a collided RA UE for its data decoding rather than all the APs, so that the mutual interference among collided RA UEs can be effectively mitigated. To this end, we first design a tailored deep neural network (DNN) to enable the preamble multiplicity estimation in GFRA, where an energy detection (ED) method is also proposed for performance comparison. With the estimated preamble multiplicity, we then propose a K-means AP clustering algorithm to cluster the neighboring APs of collided RA UEs and organize each AP cluster to decode the received data individually. Simulation results show that a decent performance of preamble multiplicity estimation in terms of accuracy and reliability can be achieved by the proposed DNN, and confirm that the proposed schemes are effective in preamble collision resolution in GFRA, which are able to achieve a near-optimal performance in terms of uplink achievable rate per collided RA UE, and offer significant performance improvement over traditional schemes.
△ Less
Submitted 27 December, 2020; v1 submitted 7 June, 2020;
originally announced June 2020.
-
Optimal Preamble Length for Spectral Efficiency in Grant-Free RA with Massive MIMO
Authors:
Jie Ding,
Daiming Qu,
Hao Jiang
Abstract:
Grant-free random access (RA) with massive MIMO is a promising RA technique for massive access with low signaling overhead. In the grant-free RA with massive MIMO, preamble length has a critical impact on the performance of the system. In this paper, the optimal preamble length is investigated to maximize spectral efficiency (SE) of the grant-free RA with massive MIMO, where effects of the preambl…
▽ More
Grant-free random access (RA) with massive MIMO is a promising RA technique for massive access with low signaling overhead. In the grant-free RA with massive MIMO, preamble length has a critical impact on the performance of the system. In this paper, the optimal preamble length is investigated to maximize spectral efficiency (SE) of the grant-free RA with massive MIMO, where effects of the preamble length on the preamble collision and preamble overhead as well as channel estimation accuracy are taken into account. Simulation results agree well with our analyses and confirm the existence of optimal preamble length for SE maximization in the grant-free RA with massive MIMO. Moreover, properties of the optimal preamble length with respect to system parameters are revealed. Compared to the granted access, it is shown that longer preamble length is required for SE maximization in the grant-free RA.
△ Less
Submitted 29 April, 2019;
originally announced May 2019.
-
Smoothed SVD-based Beamforming for FBMC/OQAM Systems Based on Frequency Spreading
Authors:
Yu Qiu,
Daiming Qu,
Da Chen,
Tao Jiang
Abstract:
The combination of singular value decomposition (SVD)-based beamforming and filter bank multicarrier with offset quadrature amplitude modulation (FBMC/OQAM) has not been successful to date. The difficulty of this combination is that, the beamformers may experience significant changes between adjacent subchannels, therefore destroy the orthogonality among FBMC/OQAM real-valued symbols, even under c…
▽ More
The combination of singular value decomposition (SVD)-based beamforming and filter bank multicarrier with offset quadrature amplitude modulation (FBMC/OQAM) has not been successful to date. The difficulty of this combination is that, the beamformers may experience significant changes between adjacent subchannels, therefore destroy the orthogonality among FBMC/OQAM real-valued symbols, even under channels with moderate frequency selectivity. In this paper, we address this problem from two aspects: i) an SVD-FS-FBMC architecture is adopted to support beamforming with finer granularity in frequency domain, based on the frequency spreading FBMC (FS-FBMC) structure, i.e., beamforming on FS-FBMC tones rather than on subchannels; ii) criterion and methods are proposed to smooth the beamformers from tone to tone. The proposed finer beamforming and smoothing greatly improve the smoothness of beamformers, therefore effectively suppress the leaked ICI/ISI. Simulations are conducted under the scenario of IEEE 802.11n wireless LAN. Results show that the proposed SVD-FS-FBMC system shares close BER performance with its orthogonal frequency division multiplexing (OFDM) counterpart under the frequency selective channels.
△ Less
Submitted 18 June, 2018;
originally announced June 2018.