-
Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models
Authors:
Ziyun Cui,
Chang Lei,
Wen Wu,
Yinan Duan,
Diyang Qu,
Ji Wu,
Runsen Chen,
Chao Zhang
Abstract:
The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse…
▽ More
The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse acoustic and linguistic features embedded in spontaneous speech, both the Whisper speech model and textual large language models (LLMs) are used for suicide risk detection. Both all-parameter finetuning and parameter-efficient finetuning approaches are used to adapt the pre-trained models for suicide risk detection, and multiple audio-text fusion approaches are evaluated to combine the representations of Whisper and the LLM. The proposed system achieves a detection accuracy of 0.807 and an F1-score of 0.846 on the test set with 119 subjects, indicating promising potential for real suicide risk detection applications.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts
Authors:
Dominik Scheuble,
Chenyang Lei,
Seung-Hwan Baek,
Mario Bijelic,
Felix Heide
Abstract:
Lidar has become a cornerstone sensing modality for 3D vision, especially for large outdoor scenarios and autonomous driving. Conventional lidar sensors are capable of providing centimeter-accurate distance information by emitting laser pulses into a scene and measuring the time-of-flight (ToF) of the reflection. However, the polarization of the received light that depends on the surface orientati…
▽ More
Lidar has become a cornerstone sensing modality for 3D vision, especially for large outdoor scenarios and autonomous driving. Conventional lidar sensors are capable of providing centimeter-accurate distance information by emitting laser pulses into a scene and measuring the time-of-flight (ToF) of the reflection. However, the polarization of the received light that depends on the surface orientation and material properties is usually not considered. As such, the polarization modality has the potential to improve scene reconstruction beyond distance measurements. In this work, we introduce a novel long-range polarization wavefront lidar sensor (PolLidar) that modulates the polarization of the emitted and received light. Departing from conventional lidar sensors, PolLidar allows access to the raw time-resolved polarimetric wavefronts. We leverage polarimetric wavefronts to estimate normals, distance, and material properties in outdoor scenarios with a novel learned reconstruction method. To train and evaluate the method, we introduce a simulated and real-world long-range dataset with paired raw lidar data, ground truth distance, and normal maps. We find that the proposed method improves normal and distance reconstruction by 53\% mean angular error and 41\% mean absolute error compared to existing shape-from-polarization (SfP) and ToF methods. Code and data are open-sourced at https://light.princeton.edu/pollidar.
△ Less
Submitted 11 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Edge Information Hub: Orchestrating Satellites, UAVs, MEC, Sensing and Communications for 6G Closed-Loop Controls
Authors:
Chengleyang Lei,
Wei Feng,
Peng Wei,
Yunfei Chen,
Ning Ge,
Shiwen Mao
Abstract:
An increasing number of field robots would be used for mission-critical tasks in remote or post-disaster areas. Due to usually-limited individual abilities, these robots require an edge information hub (EIH), which is capable of not only communications but also sensing and computing. Such EIH could be deployed on a flexibly-dispatched unmanned aerial vehicle (UAV). Different from traditional aeria…
▽ More
An increasing number of field robots would be used for mission-critical tasks in remote or post-disaster areas. Due to usually-limited individual abilities, these robots require an edge information hub (EIH), which is capable of not only communications but also sensing and computing. Such EIH could be deployed on a flexibly-dispatched unmanned aerial vehicle (UAV). Different from traditional aerial base stations or mobile edge computing (MEC), the EIH would direct the operations of robots via sensing-communication-computing-control ($\textbf{SC}^3$) closed-loop orchestration. This paper aims to optimize the closed-loop control performance of multiple $\textbf{SC}^3$ loops, under the constraints of satellite-backhaul rate, computing capability, and on-board energy. Specifically, the linear quadratic regulator (LQR) control cost is used to measure the closed-loop utility, and a sum LQR cost minimization problem is formulated to jointly optimize the splitting of sensor data and allocation of communication and computing resources. We first derive the optimal splitting ratio of sensor data, and then recast the problem to a more tractable form. An iterative algorithm is finally proposed to provide a sub-optimal solution. Simulation results demonstrate the superiority of the proposed algorithm. We also uncover the influence of $\textbf{SC}^3$ parameters on closed-loop controls, highlighting more systematic understanding.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition
Authors:
Chengxi Lei,
Satwinder Singh,
Feng Hou,
Xiaoyun Jia,
Ruili Wang
Abstract:
Most of the current speech data augmentation methods operate on either the raw waveform or the amplitude spectrum of speech. In this paper, we propose a novel speech data augmentation method called PhasePerturbation that operates dynamically on the phase spectrum of speech. Instead of statically rotating a phase by a constant degree, PhasePerturbation utilizes three dynamic phase spectrum operatio…
▽ More
Most of the current speech data augmentation methods operate on either the raw waveform or the amplitude spectrum of speech. In this paper, we propose a novel speech data augmentation method called PhasePerturbation that operates dynamically on the phase spectrum of speech. Instead of statically rotating a phase by a constant degree, PhasePerturbation utilizes three dynamic phase spectrum operations, i.e., a randomization operation, a frequency masking operation, and a temporal masking operation, to enhance the diversity of speech data. We conduct experiments on wav2vec2.0 pre-trained ASR models by fine-tuning them with the PhasePerturbation augmented TIMIT corpus. The experimental results demonstrate 10.9\% relative reduction in the word error rate (WER) compared with the baseline model fine-tuned without any augmentation operation. Furthermore, the proposed method achieves additional improvements (12.9\% and 15.9\%) in WER by complementing the Vocal Tract Length Perturbation (VTLP) and the SpecAug, which are both amplitude spectrum-based augmentation methods. The results highlight the capability of PhasePerturbation to improve the current amplitude spectrum-based augmentation methods.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
A Fuzzy Cascaded Proportional-Derivative Controller for Under-actuated Flexible Joint Manipulators Using Bayesian Optimization
Authors:
Changyi Lei,
Quanmin Zhu
Abstract:
This paper proposes a novel fuzzy cascaded Proportional-Derivative (PD) controller for under-actuated single-link flexible joint manipulators. The original flexible joint system is considered as two coupled $2^{nd}$-order sub-systems. The proposed controller is composed of two cascaded PD controllers and two fuzzy logic regulators (FLRs). The first (virtual) PD controller is used to generate desir…
▽ More
This paper proposes a novel fuzzy cascaded Proportional-Derivative (PD) controller for under-actuated single-link flexible joint manipulators. The original flexible joint system is considered as two coupled $2^{nd}$-order sub-systems. The proposed controller is composed of two cascaded PD controllers and two fuzzy logic regulators (FLRs). The first (virtual) PD controller is used to generate desired control input that stabilizes the first $2^{nd}$-order sub-system. Solving the equation by considering the coupling terms as design variables, the reference signal is generated for the second sub-system. Then through simple compensation design, together with the second PD controller, the cascaded PD controller is derived. In order to further improve the performance, two FLRs are implemented that adaptively tune the parameters of PD controllers. Under natural assumptions, the cascaded fuzzy PD controller is proved to possess locally asymptotic stability. All the offline tuning processes are completed data-efficiently by Bayesian Optimization. The results in simulation illustrate the stability and validity of our proposed method. Besides, the idea of cascaded PD controller presented here may be extended as a novel control method for other under-actuated systems, and the stability analysis renders a new perspective towards the stability proof of all other fuzzy-enhanced PID controllers.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Control-Oriented Power Allocation for Integrated Satellite-UAV Networks
Authors:
Chengleyang Lei,
Wei Feng,
Jue Wang,
Shi **,
Ning Ge
Abstract:
This letter presents a sensing-communication-computing-control (SC3) integrated satellite unmanned aerial vehicle (UAV) network, where the UAV is equipped with on-board sensors, mobile edge computing (MEC) servers, base stations and satellite communication module. Like the nervous system, this integrated network is capable of organizing multiple field robots in remote areas, so as to perform missi…
▽ More
This letter presents a sensing-communication-computing-control (SC3) integrated satellite unmanned aerial vehicle (UAV) network, where the UAV is equipped with on-board sensors, mobile edge computing (MEC) servers, base stations and satellite communication module. Like the nervous system, this integrated network is capable of organizing multiple field robots in remote areas, so as to perform mission-critical tasks which are dangerous for human. Aiming at activating this nervous system with multiple SC3 loops, we present a control-oriented optimization problem. Different from traditional studies which mainly focused on communication metrics, we address the power allocation issue to minimize the sum linear quadratic regulator (LQR) control cost of all SC3 loops. Specifically, we show the convexity of the formulated problem and reveal the relationship between optimal transmit power and intrinsic entropy rate of different SC3 loops. For the assure-to-be-stable case, we derive a closed-form solution for ease of practical applications. After demonstrating the superiority of the control-oriented power allocation, we further highlight its difference with classic capacity-oriented water-filling method.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
Physical Layer Security for UAV Communications in 5G and Beyond Networks
Authors:
Jue Wang,
Xuanxuan Wang,
Ruifeng Gao,
Chengleyang Lei,
Wei Feng,
Ning Ge,
Shi **,
Tony Q. S. Quek
Abstract:
Due to its high mobility and flexible deployment, unmanned aerial vehicle (UAV) is drawing unprecedented interest in both military and civil applications to enable agile wireless communications and provide ubiquitous connectivity. Mainly operating in an open environment, UAV communications can benefit from dominant line-of-sight links; however, it on the other hand renders the UAVs more vulnerable…
▽ More
Due to its high mobility and flexible deployment, unmanned aerial vehicle (UAV) is drawing unprecedented interest in both military and civil applications to enable agile wireless communications and provide ubiquitous connectivity. Mainly operating in an open environment, UAV communications can benefit from dominant line-of-sight links; however, it on the other hand renders the UAVs more vulnerable to malicious eavesdrop** or jamming attacks. Recently, physical layer security (PLS), which exploits the inherent randomness of the wireless channels for secure communications, has been introduced to UAV systems as an important complement to the conventional cryptography-based approaches. In this paper, a comprehensive survey on the current achievements of the UAV-aided wireless communications is conducted from the PLS perspective. We first introduce the basic concepts of UAV communications including the typical static/mobile deployment scenarios, the unique characteristics of air-to-ground channels, as well as various roles that a UAV may act when PLS is concerned. Then, we introduce the widely used secrecy performance metrics and start by reviewing the secrecy performance analysis and enhancing techniques for statically deployed UAV systems, and extend the discussion to a more general scenario where the UAVs' mobility is further exploited. For both cases, respectively, we summarize the commonly adopted methodologies in the corresponding analysis and design, then describe important works in the literature in detail. Finally, potential research directions and challenges are discussed to provide an outlook for future works in the area of UAV-PLS in 5G and beyond networks.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
Neural Camera Simulators
Authors:
Hao Ouyang,
Zifan Shi,
Chenyang Lei,
Ka Lung Law,
Qifeng Chen
Abstract:
We present a controllable camera simulator based on deep neural networks to synthesize raw image data under different camera settings, including exposure time, ISO, and aperture. The proposed simulator includes an exposure module that utilizes the principle of modern lens designs for correcting the luminance level. It also contains a noise module using the noise level function and an aperture modu…
▽ More
We present a controllable camera simulator based on deep neural networks to synthesize raw image data under different camera settings, including exposure time, ISO, and aperture. The proposed simulator includes an exposure module that utilizes the principle of modern lens designs for correcting the luminance level. It also contains a noise module using the noise level function and an aperture module with adaptive attention to simulate the side effects on noise and defocus blur. To facilitate the learning of a simulator model, we collect a dataset of the 10,000 raw images of 450 scenes with different exposure settings. Quantitative experiments and qualitative comparisons show that our approach outperforms relevant baselines in raw data synthesize on multiple cameras. Furthermore, the camera simulator enables various applications, including large-aperture enhancement, HDR, auto exposure, and data augmentation for training local feature detectors. Our work represents the first attempt to simulate a camera sensor's behavior leveraging both the advantage of traditional raw sensor features and the power of data-driven deep learning.
△ Less
Submitted 9 August, 2021; v1 submitted 12 April, 2021;
originally announced April 2021.
-
NTIRE 2020 Challenge on Real Image Denoising: Dataset, Methods and Results
Authors:
Abdelrahman Abdelhamed,
Mahmoud Afifi,
Radu Timofte,
Michael S. Brown,
Yue Cao,
Zhilu Zhang,
Wangmeng Zuo,
Xiaoling Zhang,
Jiye Liu,
Wendong Chen,
Changyuan Wen,
Meng Liu,
Shuailin Lv,
Yunchao Zhang,
Zhihong Pan,
Baopu Li,
Teng Xi,
Yanwen Fan,
Xiyu Yu,
Gang Zhang,
**gtuo Liu,
Junyu Han,
Errui Ding,
Songhyun Yu,
Bumjun Park
, et al. (65 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2020 challenge on real image denoising with focus on the newly introduced dataset, the proposed methods and their results. The challenge is a new version of the previous NTIRE 2019 challenge on real image denoising that was based on the SIDD benchmark. This challenge is based on a newly collected validation and testing image datasets, and hence, named SIDD+. This chall…
▽ More
This paper reviews the NTIRE 2020 challenge on real image denoising with focus on the newly introduced dataset, the proposed methods and their results. The challenge is a new version of the previous NTIRE 2019 challenge on real image denoising that was based on the SIDD benchmark. This challenge is based on a newly collected validation and testing image datasets, and hence, named SIDD+. This challenge has two tracks for quantitatively evaluating image denoising performance in (1) the Bayer-pattern rawRGB and (2) the standard RGB (sRGB) color spaces. Each track ~250 registered participants. A total of 22 teams, proposing 24 methods, competed in the final phase of the challenge. The proposed methods by the participating teams represent the current state-of-the-art performance in image denoising targeting real noisy images. The newly collected SIDD+ datasets are publicly available at: https://bit.ly/siddplus_data.
△ Less
Submitted 8 May, 2020;
originally announced May 2020.
-
A Preliminary Study on Data Augmentation of Deep Learning for Image Classification
Authors:
Benlin Hu,
Cheng Lei,
Dong Wang,
Shu Zhang,
Zhenyu Chen
Abstract:
Deep learning models have a large number of freeparameters that need to be calculated by effective trainingof the models on a great deal of training data to improvetheir generalization performance. However, data obtaining andlabeling is expensive in practice. Data augmentation is one of themethods to alleviate this problem. In this paper, we conduct apreliminary study on how three variables (augme…
▽ More
Deep learning models have a large number of freeparameters that need to be calculated by effective trainingof the models on a great deal of training data to improvetheir generalization performance. However, data obtaining andlabeling is expensive in practice. Data augmentation is one of themethods to alleviate this problem. In this paper, we conduct apreliminary study on how three variables (augmentation method,augmentation rate and size of basic dataset per label) can affectthe accuracy of deep learning for image classification. The studyprovides some guidelines: (1) it is better to use transformationsthat alter the geometry of the images rather than those justlighting and color. (2) 2-3 times augmentation rate is good enoughfor training. (3) the smaller amount of data, the more obviouscontributions could have.
△ Less
Submitted 9 June, 2019;
originally announced June 2019.