Search | arXiv e-print repository

Zero-Shot Audio Captioning Using Soft and Hard Prompts

Authors: Yiming Zhang, Xuenan Xu, Ruoyi Du, Haohe Liu, Yuan Dong, Zheng-Hua Tan, Wenwu Wang, Zhanyu Ma

Abstract: In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these model… ▽ More In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these models often suffer from performance degradation in cross-domain scenarios, i.e., when the input audio comes from a different domain than the training set, which, however, has received little attention. We propose an effective audio captioning method based on the contrastive language-audio pre-training (CLAP) model to address these issues. Our proposed method requires only textual data for training, enabling the model to generate text from the textual feature in the cross-modal semantic space.In the inference stage, the model generates the descriptive text for the given audio from the audio feature by leveraging the audio-text alignment from CLAP.We devise two strategies to mitigate the discrepancy between text and audio embeddings: a mixed-augmentation-based soft prompt and a retrieval-based acoustic-aware hard prompt. These approaches are designed to enhance the generalization performance of our proposed model, facilitating the model to generate captions more robustly and accurately. Extensive experiments on AudioCaps and Clotho benchmarks show the effectiveness of our proposed method, which outperforms other zero-shot audio captioning approaches for in-domain scenarios and outperforms the compared methods for cross-domain scenarios, underscoring the generalization ability of our method. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

arXiv:2406.02233 [pdf, other]

Towards Out-of-Distribution Detection in Vocoder Recognition via Latent Feature Reconstruction

Authors: Renmingyue Du, Jixun Yao, Qiuqiang Kong, Yin Cao

Abstract: Advancements in synthesized speech have created a growing threat of impersonation, making it crucial to develop deepfake algorithm recognition. One significant aspect is out-of-distribution (OOD) detection, which has gained notable attention due to its important role in deepfake algorithm recognition. However, most of the current approaches for detecting OOD in deepfake algorithm recognition rely… ▽ More Advancements in synthesized speech have created a growing threat of impersonation, making it crucial to develop deepfake algorithm recognition. One significant aspect is out-of-distribution (OOD) detection, which has gained notable attention due to its important role in deepfake algorithm recognition. However, most of the current approaches for detecting OOD in deepfake algorithm recognition rely on probability-score or classified-distance, which may lead to limitations in the accuracy of the sample at the edge of the threshold. In this study, we propose a reconstruction-based detection approach that employs an autoencoder architecture to compress and reconstruct the acoustic feature extracted from a pre-trained WavLM model. Each acoustic feature belonging to a specific vocoder class is only aptly reconstructed by its corresponding decoder. When none of the decoders can satisfactorily reconstruct a feature, it is classified as an OOD sample. To enhance the distinctiveness of the reconstructed features by each decoder, we incorporate contrastive learning and an auxiliary classifier to further constrain the reconstructed feature. Experiments demonstrate that our proposed approach surpasses baseline systems by a relative margin of 10\% in the evaluation dataset. Ablation studies further validate the effectiveness of both the contrastive constraint and the auxiliary classifier within our proposed approach. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 5 pages, 4 figures

arXiv:2405.17100 [pdf, other]

Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems

Authors: Haozhe Xu, Cong Wu, Yangyang Gu, Xingcan Shang, **g Chen, Kun He, Ruiying Du

Abstract: The integration of Voice Control Systems (VCS) into smart devices and their growing presence in daily life accentuate the importance of their security. Current research has uncovered numerous vulnerabilities in VCS, presenting significant risks to user privacy and security. However, a cohesive and systematic examination of these vulnerabilities and the corresponding solutions is still absent. This… ▽ More The integration of Voice Control Systems (VCS) into smart devices and their growing presence in daily life accentuate the importance of their security. Current research has uncovered numerous vulnerabilities in VCS, presenting significant risks to user privacy and security. However, a cohesive and systematic examination of these vulnerabilities and the corresponding solutions is still absent. This lack of comprehensive analysis presents a challenge for VCS designers in fully understanding and mitigating the security issues within these systems. Addressing this gap, our study introduces a hierarchical model structure for VCS, providing a novel lens for categorizing and analyzing existing literature in a systematic manner. We classify attacks based on their technical principles and thoroughly evaluate various attributes, such as their methods, targets, vectors, and behaviors. Furthermore, we consolidate and assess the defense mechanisms proposed in current research, offering actionable recommendations for enhancing VCS security. Our work makes a significant contribution by simplifying the complexity inherent in VCS security, aiding designers in effectively identifying and countering potential threats, and setting a foundation for future advancements in VCS security research. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2404.15854 [pdf, other]

CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning

Authors: Haolin Wu, **g Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu

Abstract: The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation… ▽ More The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation attacks. Surprisingly, even manipulations like volume control can significantly bypass detection without affecting human perception. To address this, we propose CLAD (Contrastive Learning-based Audio deepfake Detector) to enhance the robustness against manipulation attacks. The key idea is to incorporate contrastive learning to minimize the variations introduced by manipulations, therefore enhancing detection robustness. Additionally, we incorporate a length loss, aiming to improve the detection accuracy by clustering real audios more closely in the feature space. We comprehensively evaluated the most widely adopted audio deepfake detection models and our proposed CLAD against various manipulation attacks. The detection models exhibited vulnerabilities, with FAR rising to 36.69%, 31.23%, and 51.28% under volume control, fading, and noise injection, respectively. CLAD enhanced robustness, reducing the FAR to 0.81% under noise injection and consistently maintaining an FAR below 1.63% across all tests. Our source code and documentation are available in the artifact repository (https://github.com/CLAD23/CLAD). △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: Submitted to IEEE TDSC

arXiv:2402.09636 [pdf, other]

Spatiotemporal Disentanglement of Arteriovenous Malformations in Digital Subtraction Angiography

Authors: Kathleen Baur, Xin Xiong, Erickson Torio, Rose Du, Parikshit Juvekar, Reuben Dorent, Alexandra Golby, Sarah Frisken, Nazim Haouchine

Abstract: Although Digital Subtraction Angiography (DSA) is the most important imaging for visualizing cerebrovascular anatomy, its interpretation by clinicians remains difficult. This is particularly true when treating arteriovenous malformations (AVMs), where entangled vasculature connecting arteries and veins needs to be carefully identified.The presented method aims to enhance DSA image series by highli… ▽ More Although Digital Subtraction Angiography (DSA) is the most important imaging for visualizing cerebrovascular anatomy, its interpretation by clinicians remains difficult. This is particularly true when treating arteriovenous malformations (AVMs), where entangled vasculature connecting arteries and veins needs to be carefully identified.The presented method aims to enhance DSA image series by highlighting critical information via automatic classification of vessels using a combination of two learning models: An unsupervised machine learning method based on Independent Component Analysis that decomposes the phases of flow and a convolutional neural network that automatically delineates the vessels in image space. The proposed method was tested on clinical DSA images series and demonstrated efficient differentiation between arteries and veins that provides a viable solution to enhance visualizations for clinical use. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Paper accepted for publication at SPIE Medical Imaging 2024

arXiv:2402.05887 [pdf, other]

Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers

Authors: Onur G. Guleryuz, Philip A. Chou, Berivan Isik, Hugues Hoppe, Danhang Tang, Ruofei Du, Jonathan Taylor, Philip Davidson, Sean Fanello

Abstract: We propose sandwiching standard image and video codecs between pre- and post-processing neural networks. The networks are jointly trained through a differentiable codec proxy to minimize a given rate-distortion loss. This sandwich architecture not only improves the standard codec's performance on its intended content, it can effectively adapt the codec to other types of image/video content and to… ▽ More We propose sandwiching standard image and video codecs between pre- and post-processing neural networks. The networks are jointly trained through a differentiable codec proxy to minimize a given rate-distortion loss. This sandwich architecture not only improves the standard codec's performance on its intended content, it can effectively adapt the codec to other types of image/video content and to other distortion measures. Essentially, the sandwich learns to transmit ``neural code images'' that optimize overall rate-distortion performance even when the overall problem is well outside the scope of the codec's design. Through a variety of examples, we apply the sandwich architecture to sources with different numbers of channels, higher resolution, higher dynamic range, and perceptual distortion measures. The results demonstrate substantial improvements (up to 9 dB gains or up to 30\% bitrate reductions) compared to alternative adaptations. We derive VQ equivalents for the sandwich, establish optimality properties, and design differentiable codec proxies approximating current standard codecs. We further analyze model complexity, visual quality under perceptual metrics, as well as sandwich configurations that offer interesting potentials in image/video compression and streaming. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2310.17661 [pdf, other]

An Overview on IEEE 802.11bf: WLAN Sensing

Authors: Rui Du, Haocheng Hua, Hailiang Xie, Xianxin Song, Zhonghao Lyu, Mengshi Hu, Narengerile, Yan Xin, Stephen McCann, Michael Montemurro, Tony Xiao Han, Jie Xu

Abstract: With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent requirements for emerging sensing applications.… ▽ More With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent requirements for emerging sensing applications. To resolve this issue, a new Task Group (TG), namely IEEE 802.11bf, has been established by the IEEE 802.11 working group, with the objective of creating a new amendment to the WLAN standard to meet advanced sensing requirements while minimizing the effect on communications. This paper provides a comprehensive overview on the up-to-date efforts in the IEEE 802.11bf TG. First, we introduce the definition of the 802.11bf amendment and its formation and standardization timeline. Next, we discuss the WLAN sensing use cases with the corresponding key performance indicator (KPI) requirements. After reviewing previous WLAN sensing research based on communication-oriented WLAN standards, we identify their limitations and underscore the practical need for the new sensing-oriented amendment in 802.11bf. Furthermore, we discuss the WLAN sensing framework and procedure used for measurement acquisition, by considering both sensing at sub-7GHz and directional multi-gigabit (DMG) sensing at 60 GHz, respectively, and address their shared features, similarities, and differences. In addition, we present various candidate technical features for IEEE 802.11bf, including waveform/sequence design, feedback types, as well as quantization and compression techniques. We also describe the methodologies and the channel modeling used by the IEEE 802.11bf TG for evaluation. Finally, we discuss the challenges and future research directions to motivate more research endeavors towards this field in details. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: 31 pages, 25 figures, this is a significant updated version of arXiv:2207.04859

arXiv:2305.06777 [pdf]

Generating high-quality 3DMPCs by adaptive data acquisition and NeREF-based radiometric calibration with UGV plant phenoty** system

Authors: Pengyao Xie, Zhihong Ma, Ruiming Du, Xin Yang, Haiyan Cen

Abstract: Fusion of 3D and MS imaging data has a great potential for high-throughput plant phenoty** of structural and biochemical as well as physiological traits simultaneously, which is important for decision support in agriculture and for crop breeders in selecting the best genotypes. However, lacking of 3D data integrity of various plant canopy structures and low-quality of MS images caused by the com… ▽ More Fusion of 3D and MS imaging data has a great potential for high-throughput plant phenoty** of structural and biochemical as well as physiological traits simultaneously, which is important for decision support in agriculture and for crop breeders in selecting the best genotypes. However, lacking of 3D data integrity of various plant canopy structures and low-quality of MS images caused by the complex illumination effects make a great challenge, especially at the proximal imaging scale. Therefore, this study proposed a novel approach for adaptive data acquisition and radiometric calibration to generate high-quality 3DMPCs of plants. An efficient NBV planning method based on an UGV plant phenoty** system with a multi-sensor-equipped robotic arm was proposed to achieve adaptive data acquisition. The NeREF was employed to predict the DN values of the hemispherical reference for radiometric calibration. For NBV planning, the average total time for single plant at a joint speed of 1.55 rad/s was about 62.8 s, with an average reduction of 18.0% compared to the unplanned. The integrity of the whole-plant data was improved by an average of 23.6% compared to the fixed viewpoints alone. Compared with the ASD measurements, the RMSE of the reflectance spectra obtained from 3DMPCs at different regions of interest was 0.08 with an average decrease of 58.93% compared to the results obtained from the single-frame of MS images without 3D radiometric calibration. The 3D-calibrated plant 3DMPCs improved the predictive accuracy of PLSR for chlorophyll content, with an average increase of 0.07 in R2 and an average decrease of 21.25% in RMSE. Our approach introduced a fresh perspective on generating high-quality 3DMPCs of plants under the natural light condition, enabling more precise analysis of plant morphological and physiological parameters. △ Less

Submitted 1 December, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:2210.11684 [pdf, other]

Change Point Detection Approach for Online Control of Unknown Time Varying Dynamical Systems

Authors: Deepan Muthirayan, Ruijie Du, Yanning Shen, Pramod P. Khargonekar

Abstract: We propose a novel change point detection approach for online learning control with full information feedback (state, disturbance, and cost feedback) for unknown time-varying dynamical systems. We show that our algorithm can achieve a sub-linear regret with respect to the class of Disturbance Action Control (DAC) policies, which are a widely studied class of policies for online control of dynamica… ▽ More We propose a novel change point detection approach for online learning control with full information feedback (state, disturbance, and cost feedback) for unknown time-varying dynamical systems. We show that our algorithm can achieve a sub-linear regret with respect to the class of Disturbance Action Control (DAC) policies, which are a widely studied class of policies for online control of dynamical systems, for any sub-linear number of changes and very general class of systems: (i) matched disturbance system with general convex cost functions, (ii) general system with linear cost functions. Specifically, a (dynamic) regret of $Γ_T^{1/5}T^{4/5}$ can be achieved for these class of systems, where $Γ_T$ is the number of changes of the underlying system and $T$ is the duration of the control episode. That is, the change point detection approach achieves a sub-linear regret for any sub-linear number of changes, which other previous algorithms such as in \cite{minasyan2021online} cannot. Numerically, we demonstrate that the change point detection approach is superior to a standard restart approach \cite{minasyan2021online} and to standard online learning approaches for time-invariant dynamical systems. Our work presents the first regret guarantee for unknown time-varying dynamical systems in terms of a stronger notion of variability like the number of changes in the underlying system. The extension of our work to state and output feedback controllers is a subject of future work. △ Less

Submitted 24 March, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

arXiv:2207.10306 [pdf, ps, other]

Fundamental Limits and Optimization of Multiband Sensing

Authors: Yubo Wan, An Liu, Rui Du, Tony Xiao Han

Abstract: Multiband sensing is a promising technology that utilizes multiple non-contiguous frequency bands to achieve high-resolution target sensing. In this paper, we investigate the fundamental limits and optimization of multiband sensing, focusing on the fundamental limits associated with time delay. We first derive a Fisher information matrix (FIM) with a compact form using the Dirichlet kernel and the… ▽ More Multiband sensing is a promising technology that utilizes multiple non-contiguous frequency bands to achieve high-resolution target sensing. In this paper, we investigate the fundamental limits and optimization of multiband sensing, focusing on the fundamental limits associated with time delay. We first derive a Fisher information matrix (FIM) with a compact form using the Dirichlet kernel and then derive a closed-form expression of the Cramer-Rao bound (CRB) for the delay separation in a simplified case to reveal useful insights. Then, a metric called the statistical resolution limit (SRL) that provides a resolution limit is employed to investigate the fundamental limits of delay resolution. The fundamental limits of delay estimation are also investigated based on the CRB and Ziv-Zakai bound (ZZB). Based on the above derived fundamental limits, numerical results are presented to analyze the effect of frequency band apertures and phase distortions on the performance limits of the multiband sensing systems. We formulate an optimization problem to find the optimal system configuration in multiband sensing systems with the objective of minimizing the delay SRL. To solve this non-convex constrained problem, we propose an efficient alternating optimization (AO) algorithm which iteratively optimizes the variables using successive convex approximation (SCA) and one-dimensional search. Simulation results demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 31 January, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

arXiv:2207.04859 [pdf, ps, other]

An Overview on IEEE 802.11bf: WLAN Sensing

Authors: Rui Du, Hailiang Xie, Mengshi Hu, Narengerile, Yan Xin, Stephen McCann, Michael Montemurro, Tony Xiao Han, Jie Xu

Abstract: With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent sensing requirements in emerging applications. T… ▽ More With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent sensing requirements in emerging applications. To resolve this issue, a new Task Group (TG), namely IEEE 802.11bf, has been established by the IEEE 802.11 working group, with the objective of creating a new amendment to the WLAN standard to provide advanced sensing requirements while minimizing the effect on communications. This paper provides a comprehensive overview on the up-to-date efforts in the IEEE 802.11bf TG. First, we introduce the definition of the 802.11bf amendment and its standardization timeline. Then, we discuss the WLAN sensing procedure and framework used for measurement acquisition, by considering both conventional sensing at sub-7 GHz and directional multi-gigabit (DMG) sensing at 60 GHz, respectively. Next, we present various candidate technical features for IEEE 802.11bf, including waveform/sequence design, feedback types, quantization, as well as security and privacy. Finally, we describe the methodologies used by the IEEE 802.11bf TG to evaluate the alternative performance. It is desired that this overview paper provide useful insights on IEEE 802.11 WLAN sensing to people with great interests and promote the IEEE 802.11bf standard to be widely deployed. △ Less

Submitted 11 July, 2022; originally announced July 2022.

arXiv:2206.00493 [pdf, ps, other]

Networked Sensing in 6G Cellular Networks: Opportunities and Challenges

Authors: Liang Liu, Shuowen Zhang, Rui Du, Tong Xiao Han, Shuguang Cui

Abstract: Radar and wireless communication are widely acknowledged as the two most successful applications of the radio technology over the past decades. Recently, there is a trend in both academia and industry to achieve integrated sensing and communication (ISAC) in one system via utilizing a common radio spectrum and the same hardware platform. This article will discuss about the possibility of exploitin… ▽ More Radar and wireless communication are widely acknowledged as the two most successful applications of the radio technology over the past decades. Recently, there is a trend in both academia and industry to achieve integrated sensing and communication (ISAC) in one system via utilizing a common radio spectrum and the same hardware platform. This article will discuss about the possibility of exploiting the future sixth-generation (6G) cellular network to realize ISAC. Our vision is that the cellular base stations (BSs) deployed all over the world can be transformed into a powerful sensor to provide highresolution localization services. Specifically, motivated by the joint encoding/decoding gain in multi-cell coordinated communication, we advocate the adoption of the networked sensing technique in 6G network to achieve the above goal, where the BSs can share the sensing information with each other for jointly estimating the locations and velocities of the targets. Several opportunities and challenges to realize networked sensing in the 6G era will be revealed in this article. Moreover, the future research directions for this promising trend will be outlined as well. △ Less

Submitted 1 June, 2022; originally announced June 2022.

arXiv:2204.08409 [pdf, other]

Caption Feature Space Regularization for Audio Captioning

Authors: Yiming Zhang, Hong Yu, Ruoyi Du, Zhanyu Ma, Yuan Dong

Abstract: Audio captioning aims at describing the content of audio clips with human language. Due to the ambiguity of audio, different people may perceive the same audio differently, resulting in caption disparities (i.e., one audio may correlate to several captions with diverse semantics). For that, general audio captioning models achieve the one-to-many training by randomly selecting a correlated caption… ▽ More Audio captioning aims at describing the content of audio clips with human language. Due to the ambiguity of audio, different people may perceive the same audio differently, resulting in caption disparities (i.e., one audio may correlate to several captions with diverse semantics). For that, general audio captioning models achieve the one-to-many training by randomly selecting a correlated caption as the ground truth for each audio. However, it leads to a significant variation in the optimization directions and weakens the model stability. To eliminate this negative effect, in this paper, we propose a two-stage framework for audio captioning: (i) in the first stage, via the contrastive learning, we construct a proxy feature space to reduce the distances between captions correlated to the same audio, and (ii) in the second stage, the proxy feature space is utilized as additional supervision to encourage the model to be optimized in the direction that benefits all the correlated captions. We conducted extensive experiments on two datasets using four commonly used encoder and decoder architectures. Experimental results demonstrate the effectiveness of the proposed method. The code is available at https://github.com/PRIS-CV/Caption-Feature-Space-Regularization. △ Less

Submitted 18 April, 2022; originally announced April 2022.

arXiv:2202.11134 [pdf]

doi 10.1145/3491102.3502020

ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users

Authors: Dhruv Jain, Khoa Huynh Anh Nguyen, Steven Goodman, Rachel Grossman-Kahn, Hung Ngo, Aditya Kusupati, Ruofei Du, Alex Olwal, Leah Findlater, Jon E. Froehlich

Abstract: Recent advances have enabled automatic sound recognition systems for deaf and hard of hearing (DHH) users on mobile devices. However, these tools use pre-trained, generic sound recognition models, which do not meet the diverse needs of DHH users. We introduce ProtoSound, an interactive system for customizing sound recognition models by recording a few examples, thereby enabling personalized and fi… ▽ More Recent advances have enabled automatic sound recognition systems for deaf and hard of hearing (DHH) users on mobile devices. However, these tools use pre-trained, generic sound recognition models, which do not meet the diverse needs of DHH users. We introduce ProtoSound, an interactive system for customizing sound recognition models by recording a few examples, thereby enabling personalized and fine-grained categories. ProtoSound is motivated by prior work examining sound awareness needs of DHH people and by a survey we conducted with 472 DHH participants. To evaluate ProtoSound, we characterized performance on two real-world sound datasets, showing significant improvement over state-of-the-art (e.g., +9.7% accuracy on the first dataset). We then deployed ProtoSound's end-user training and real-time recognition through a mobile application and recruited 19 hearing participants who listened to the real-world sounds and rated the accuracy across 56 locations (e.g., homes, restaurants, parks). Results show that ProtoSound personalized the model on-device in real-time and accurately learned sounds across diverse acoustic contexts. We close by discussing open challenges in personalizable sound recognition, including the need for better recording interfaces and algorithmic improvements. △ Less

Submitted 22 February, 2022; originally announced February 2022.

Comments: Published at the ACM CHI Conference on Human Factors in Computing Systems (CHI) 2022

arXiv:2104.09954 [pdf, other]

A Survey on Fundamental Limits of Integrated Sensing and Communication

Authors: An Liu, Zhe Huang, Min Li, Yubo Wan, Wenrui Li, Tony Xiao Han, Chenchen Liu, Rui Du, Danny Tan Kai Pin, Jianmin Lu, Yuan Shen, Fabiola Colone, Kevin Chetty

Abstract: The integrated sensing and communication (ISAC), in which the sensing and communication share the same frequency band and hardware, has emerged as a key technology in future wireless systems. Early works on ISAC have been focused on the design, analysis and optimization of practical ISAC technologies for various ISAC systems. While this line of works are necessary, it is equally important to study… ▽ More The integrated sensing and communication (ISAC), in which the sensing and communication share the same frequency band and hardware, has emerged as a key technology in future wireless systems. Early works on ISAC have been focused on the design, analysis and optimization of practical ISAC technologies for various ISAC systems. While this line of works are necessary, it is equally important to study the fundamental limits of ISAC in order to understand the gap between the current state-of-the-art technologies and the performance limits, and provide useful insights and guidance for the development of better ISAC technologies that can approach the performance limits. In this paper, we aim to provide a comprehensive survey for the current research progress on the fundamental limits of ISAC. Particularly, we first propose a systematic classification method for both traditional radio sensing (such as radar sensing and wireless localization) and ISAC so that they can be naturally incorporated into a unified framework. Then we summarize the major performance metrics and bounds used in sensing, communications and ISAC, respectively. After that, we present the current research progresses on fundamental limits of each class of the traditional sensing and ISAC systems. Finally, the open problems and future research directions are discussed. △ Less

Submitted 22 April, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

Comments: 32 pages, submitted to IEEE Communications Surveys and Tutorials

arXiv:2010.05440 [pdf]

Using Empirical Trajectory Data to Design Connected Autonomous Vehicle Controllers for Traffic Stabilization

Authors: Yujie Li, Sikai Chen, Runjia Du, Paul Young Joun Ha, Jiqian Dong, Samuel Labi

Abstract: Emerging transportation technologies offer unprecedented opportunities to improve the efficiency of the transportation system from the perspectives of energy consumption, congestion, and emissions. One of these technologies is connected and autonomous vehicles (CAVs). With the prospective duality of operations of CAVs and human driven vehicles in the same roadway space (also referred to as a mixed… ▽ More Emerging transportation technologies offer unprecedented opportunities to improve the efficiency of the transportation system from the perspectives of energy consumption, congestion, and emissions. One of these technologies is connected and autonomous vehicles (CAVs). With the prospective duality of operations of CAVs and human driven vehicles in the same roadway space (also referred to as a mixed stream), CAVs are expected to address a variety of traffic problems particularly those that are either caused or exacerbated by the heterogeneous nature of human driving. In efforts to realize such specific benefits of CAVs in mixed-stream traffic, it is essential to understand and simulate the behavior of human drivers in such environments, and microscopic traffic flow (MTF) models can be used to carry out this task. By hel** to comprehend the fundamental dynamics of traffic flow, MTF models serve as a powerful approach to assess the impacts of such flow in terms of safety, stability, and efficiency. In this paper, we seek to calibrate MTF models based on empirical trajectory data as basis of not only understanding traffic dynamics such as traffic instabilities, but ultimately using CAVs to mitigate stop-and-go wave propagation. The paper therefore duly considers the heterogeneity and uncertainty associated with human driving behavior in order to calibrate the dynamics of each HDV. Also, the paper designs the CAV controllers based on the microscopic HDV models that are calibrated in real time. The data for the calibration is from the Next Generation SIMulation (NGSIM) trajectory datasets. The results are encouraging, as they indicate the efficacy of the designed controller to significantly improve not only the stability of the mixed traffic stream but also the safety of both CAVs and HDVs in the traffic stream. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: TRB 2021 Annual Meeting

arXiv:2010.05439 [pdf]

A Cooperative Control Framework for CAV Lane Change in a Mixed Traffic Environment

Authors: Runjia Du, Sikai Chen, Yujie Li, Jiqian Dong, Paul Young Joun Ha, Samuel Labi

Abstract: In preparing for connected and autonomous vehicles (CAVs), a worrisome aspect is the transition era which will be characterized by mixed traffic (where CAVs and human-driven vehicles (HDVs) share the roadway). Consistent with expectations that CAVs will improve road safety, on-road CAVs may adopt rather conservative control policies, and this will likely cause HDVs to unduly exploit CAV conservati… ▽ More In preparing for connected and autonomous vehicles (CAVs), a worrisome aspect is the transition era which will be characterized by mixed traffic (where CAVs and human-driven vehicles (HDVs) share the roadway). Consistent with expectations that CAVs will improve road safety, on-road CAVs may adopt rather conservative control policies, and this will likely cause HDVs to unduly exploit CAV conservativeness by driving in ways that imperil safety. A context of this situation is lane-changing by the CAV. Without cooperation from other vehicles in the traffic stream, it can be extremely unsafe for the CAV to change lanes under dense, high-speed traffic conditions. The cooperation of neighboring vehicles is indispensable. To address this issue, this paper develops a control framework where connected HDVs and CAV can cooperate to facilitate safe and efficient lane changing by the CAV. Throughout the lane-change process, the safety of not only the CAV but also of all neighboring vehicles, is ensured through a collision avoidance mechanism in the control framework. The overall traffic flow efficiency is analyzed in terms of the ambient level of CHDV-CAV cooperation. The analysis outcomes are including the CAVs lane-change feasibility, the overall duration of the lane change. Lane change is a major source of traffic disturbance at multi-lane highways that impair their traffic flow efficiency. In providing a control framework for lane change in mixed traffic, this study shows how CHDV-CAV cooperation could help enhancing system efficiency. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: TRB 2021 Annual Meeting

arXiv:2010.05436 [pdf]

Leveraging the Capabilities of Connected and Autonomous Vehicles and Multi-Agent Reinforcement Learning to Mitigate Highway Bottleneck Congestion

Authors: Paul Young Joun Ha, Sikai Chen, Jiqian Dong, Runjia Du, Yujie Li, Samuel Labi

Abstract: Active Traffic Management strategies are often adopted in real-time to address such sudden flow breakdowns. When queuing is imminent, Speed Harmonization (SH), which adjusts speeds in upstream traffic to mitigate traffic showckwaves downstream, can be applied. However, because SH depends on driver awareness and compliance, it may not always be effective in mitigating congestion. The use of multiag… ▽ More Active Traffic Management strategies are often adopted in real-time to address such sudden flow breakdowns. When queuing is imminent, Speed Harmonization (SH), which adjusts speeds in upstream traffic to mitigate traffic showckwaves downstream, can be applied. However, because SH depends on driver awareness and compliance, it may not always be effective in mitigating congestion. The use of multiagent reinforcement learning for collaborative learning, is a promising solution to this challenge. By incorporating this technique in the control algorithms of connected and autonomous vehicle (CAV), it may be possible to train the CAVs to make joint decisions that can mitigate highway bottleneck congestion without human driver compliance to altered speed limits. In this regard, we present an RL-based multi-agent CAV control model to operate in mixed traffic (both CAVs and human-driven vehicles (HDVs)). The results suggest that even at CAV percent share of corridor traffic as low as 10%, CAVs can significantly mitigate bottlenecks in highway traffic. Another objective was to assess the efficacy of the RL-based controller vis-à-vis that of the rule-based controller. In addressing this objective, we duly recognize that one of the main challenges of RL-based CAV controllers is the variety and complexity of inputs that exist in the real world, such as the information provided to the CAV by other connected entities and sensed information. These translate as dynamic length inputs which are difficult to process and learn from. For this reason, we propose the use of Graphical Convolution Networks (GCN), a specific RL technique, to preserve information network topology and corresponding dynamic length inputs. We then use this, combined with Deep Deterministic Policy Gradient (DDPG), to carry out multi-agent training for congestion mitigation using the CAV controllers. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: TRB 20201 Annual Meeting

arXiv:2009.14665 [pdf]

Facilitating Connected Autonomous Vehicle Operations Using Space-weighted Information Fusion and Deep Reinforcement Learning Based Control

Authors: Jiqian Dong, Sikai Chen, Yujie Li, Runjia Du, Aaron Steinfeld, Samuel Labi

Abstract: The connectivity aspect of connected autonomous vehicles (CAV) is beneficial because it facilitates dissemination of traffic-related information to vehicles through Vehicle-to-External (V2X) communication. Onboard sensing equipment including LiDAR and camera can reasonably characterize the traffic environment in the immediate locality of the CAV. However, their performance is limited by their sens… ▽ More The connectivity aspect of connected autonomous vehicles (CAV) is beneficial because it facilitates dissemination of traffic-related information to vehicles through Vehicle-to-External (V2X) communication. Onboard sensing equipment including LiDAR and camera can reasonably characterize the traffic environment in the immediate locality of the CAV. However, their performance is limited by their sensor range (SR). On the other hand, longer-range information is helpful for characterizing imminent conditions downstream. By contemporaneously coalescing the short- and long-range information, the CAV can construct comprehensively its surrounding environment and thereby facilitate informed, safe, and effective movement planning in the short-term (local decisions including lane change) and long-term (route choice). In this paper, we describe a Deep Reinforcement Learning based approach that integrates the data collected through sensing and connectivity capabilities from other vehicles located in the proximity of the CAV and from those located further downstream, and we use the fused data to guide lane changing, a specific context of CAV operations. In addition, recognizing the importance of the connectivity range (CR) to the performance of not only the algorithm but also of the vehicle in the actual driving environment, the paper carried out a case study. The case study demonstrates the application of the proposed algorithm and duly identifies the appropriate CR for each level of prevailing traffic density. It is expected that implementation of the algorithm in CAVs can enhance the safety and mobility associated with CAV driving operations. From a general perspective, its implementation can provide guidance to connectivity equipment manufacturers and CAV operators, regarding the default CR settings for CAVs or the recommended CR setting in a given traffic environment. △ Less

Submitted 30 September, 2020; originally announced September 2020.

arXiv:2001.08847 [pdf, ps, other]

Wirelessly-powered Sensor Networks Power Allocation for Channel Estimation and Energy Beamforming

Authors: Rong Du, Hossein Shokri Ghadikolaei, Carlo Fischione

Abstract: Wirelessly-powered sensor networks (WPSNs) are becoming increasingly important in different monitoring applications. We consider a WPSN where a multiple-antenna base station, which is dedicated for energy transmission, sends pilot signals to estimate the channel state information and consequently shapes the energy beams toward the sensor nodes. Given a fixed energy budget at the base station, in t… ▽ More Wirelessly-powered sensor networks (WPSNs) are becoming increasingly important in different monitoring applications. We consider a WPSN where a multiple-antenna base station, which is dedicated for energy transmission, sends pilot signals to estimate the channel state information and consequently shapes the energy beams toward the sensor nodes. Given a fixed energy budget at the base station, in this paper, we investigate the novel problem of optimally allocating the power for the channel estimation and for the energy transmission. We formulate this non-convex optimization problem for general channel estimation and beamforming schemes that satisfy some qualification conditions. We provide a new solution approach and a performance analysis in terms of optimality and complexity. We also present a closed-form solution for the case where the channels are estimated based on a least square channel estimation and a maximum ratio transmit beamforming scheme. The analysis and simulations indicate a significant gain in terms of the network sensing rate, compared to the fixed power allocation, and the importance of improving the channel estimation efficiency. △ Less

Submitted 23 January, 2020; originally announced January 2020.

Comments: The paper has been accepted in IEEE Transactions on Wireless Communications on Jan. 19th, 2020. 7 figures, 35 pages

arXiv:1907.11861 [pdf, ps, other]

Deep convolution neural network model for automatic risk assessment of patients with non-metastatic nasopharyngeal carcinoma

Authors: Richard Du, Peng Cao, Lujun Han, Qiyong Ai, Ann D. King, Varut Vardhanabhuti

Abstract: Nasopharyngeal Carcinoma (NPC) is endemic cancer in the south-east Asia. With the advent of intensity-modulated radiotherapy excellent locoregional control are being achieved. Consequently, this had led to pretreatment clinical staging classification to be less prognostic of outcomes such as recurrence after treatment. Alternative pretreatment strategies for prognosis of NPC after treatment are ne… ▽ More Nasopharyngeal Carcinoma (NPC) is endemic cancer in the south-east Asia. With the advent of intensity-modulated radiotherapy excellent locoregional control are being achieved. Consequently, this had led to pretreatment clinical staging classification to be less prognostic of outcomes such as recurrence after treatment. Alternative pretreatment strategies for prognosis of NPC after treatment are needed to provide better risk stratification for NPC. In this study we proposed a deep convolution neural network model based on contrast-enhanced T1 (T1C) and T2 weighted (T2) MRI scan to predict 3-year disease progression of NPC patient after primary treatment. We retrospective obtained 596 non-metastatic NPC patients from four independent centres in Hong Kong and China. Our model first performs a segmentation of the primary NPC tumour to localise the tumour, and then uses the segmentation mask as prior knowledge along with the T1C and T2 scan to classify 3-year disease progression. For segmentation, we adapted and modified a VNet to encode both T1C and T2 scan and also encoding to classify T and overall stage classification. Our modified network performed better than baseline VNet with T1C and network with no T and overall classification. The classification result for 3-year disease progression achieved an AUC of 0.828 in the validation set but did not generalised well for the test set which consist of 146 patients from a different centre to the training data (AUC = 0.69). Our preliminary results show that deep learning may offer prognostication of disease progression of NPC patients after treatment. One advantage of our model is that it does not require manual segmentation of the region of interest, hence reducing clinician's burden. Further development in generalising multicentre data set are needed before clinical application of deep learning models in assessment of NPC. △ Less

Submitted 27 July, 2019; originally announced July 2019.

Comments: Medical Imaging with Deep Learning 2019 - Extended Abstract. MIDL 2019 [arXiv:1907.08612]

Report number: MIDL/2019/ExtendedAbstract/S1xEkdTpYN

arXiv:1704.08050 [pdf, other]

doi 10.1109/TCNS.2017.2696363

On Maximizing Sensor Network Lifetime by Energy Balancing

Authors: Rong Du, Lazaros Gkatzikis, Carlo Fischione, Ming Xiao

Abstract: Many physical systems, such as water/electricity distribution networks, are monitored by battery-powered Wireless Sensor Networks (WSNs). Since battery replacement of sensor nodes is generally difficult, long-term monitoring can be only achieved if the operation of the WSN nodes contributes to a long WSN lifetime. Two prominent techniques to long WSN lifetime are i) optimal sensor activation and i… ▽ More Many physical systems, such as water/electricity distribution networks, are monitored by battery-powered Wireless Sensor Networks (WSNs). Since battery replacement of sensor nodes is generally difficult, long-term monitoring can be only achieved if the operation of the WSN nodes contributes to a long WSN lifetime. Two prominent techniques to long WSN lifetime are i) optimal sensor activation and ii) efficient data gathering and forwarding based on compressive sensing. These techniques are feasible only if the activated sensor nodes establish a connected communication network (connectivity constraint), and satisfy a compressive sensing decoding constraint (cardinality constraint). These two constraints make the problem of maximizing network lifetime via sensor node activation and compressive sensing NP-hard. To overcome this difficulty, an alternative approach that iteratively solves energy balancing problems is proposed. However, understanding whether maximizing network lifetime and energy balancing problems are aligned objectives is a fundamental open issue. The analysis reveals that the two optimization problems give different solutions, but the difference between the lifetime achieved by the energy balancing approach and the maximum lifetime is small when the initial energy at sensor nodes is significantly larger than the energy consumed for a single transmission. The lifetime achieved by the energy balancing is asymptotically optimal, and that the achievable network lifetime is at least $50$\% of the optimum. Analysis and numerical simulations quantify the efficiency of the proposed energy balancing approach. △ Less

Submitted 26 April, 2017; originally announced April 2017.

Comments: 14 pages, 4 figures, extended version of the one accepted by IEEE Transactions on Control of Network Systems

Showing 1–22 of 22 results for author: Du, R