Search | arXiv e-print repository

Sampling Strategies in Bayesian Inversion: A Study of RTO and Langevin Methods

Authors: Remi Laumont, Yiqiu Dong, Martin Skovgaard Andersen

Abstract: This paper studies two classes of sampling methods for the solution of inverse problems, namely Randomize-Then-Optimize (RTO), which is rooted in sensitivity analysis, and Langevin methods, which are rooted in the Bayesian framework. The two classes of methods correspond to different assumptions and yield samples from different target distributions. We highlight the main conceptual and theoretical… ▽ More This paper studies two classes of sampling methods for the solution of inverse problems, namely Randomize-Then-Optimize (RTO), which is rooted in sensitivity analysis, and Langevin methods, which are rooted in the Bayesian framework. The two classes of methods correspond to different assumptions and yield samples from different target distributions. We highlight the main conceptual and theoretical differences between the two approaches and compare them from a practical point of view by tackling two classical inverse problems in imaging: deblurring and inpainting. We show that the choice of the sampling method has a significant impact on the quality of the reconstruction and that the RTO method is more robust to the choice of the parameters. △ Less

Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

MSC Class: 65K10; 65K05; 65D18; 62F15; 62C10; 68Q25; 68U10; 90C25; 65C05

arXiv:2406.15160 [pdf, other]

Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios

Authors: Ya Jiang, Qing Wang, Jun Du, Maocheng Hu, Pengfei Hu, Zeyan Liu, Shi Cheng, Zhaoxu Nian, Yuxuan Dong, Mingqi Cai, Xin Fang, Chin-Hui Lee

Abstract: This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich c… ▽ More This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich collection of audio data with multiple data augmentation techniques, to an audio-visual student model trained with only a limited set of multi-modal data. Next, we propose a two-stage audio-visual fusion strategy, consisting of an early feature fusion and a late video-guided decision fusion to exploit synergies between audio and video modalities. Finally, we introduce an innovative video pixel swap** (VPS) technique to extend an audio channel swap** (ACS) method to an audio-visual joint augmentation. Evaluation results on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge data set demonstrate significant improvements in SELD performances. Furthermore, our submission to the SELD task of the DCASE 2023 Challenge ranks first place by effectively integrating the proposed techniques into a model ensemble. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: accepted by icme2024

arXiv:2406.06295 [pdf, other]

Zero-Shot Audio Captioning Using Soft and Hard Prompts

Authors: Yiming Zhang, Xuenan Xu, Ruoyi Du, Haohe Liu, Yuan Dong, Zheng-Hua Tan, Wenwu Wang, Zhanyu Ma

Abstract: In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these model… ▽ More In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these models often suffer from performance degradation in cross-domain scenarios, i.e., when the input audio comes from a different domain than the training set, which, however, has received little attention. We propose an effective audio captioning method based on the contrastive language-audio pre-training (CLAP) model to address these issues. Our proposed method requires only textual data for training, enabling the model to generate text from the textual feature in the cross-modal semantic space.In the inference stage, the model generates the descriptive text for the given audio from the audio feature by leveraging the audio-text alignment from CLAP.We devise two strategies to mitigate the discrepancy between text and audio embeddings: a mixed-augmentation-based soft prompt and a retrieval-based acoustic-aware hard prompt. These approaches are designed to enhance the generalization performance of our proposed model, facilitating the model to generate captions more robustly and accurately. Extensive experiments on AudioCaps and Clotho benchmarks show the effectiveness of our proposed method, which outperforms other zero-shot audio captioning approaches for in-domain scenarios and outperforms the compared methods for cross-domain scenarios, underscoring the generalization ability of our method. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

arXiv:2405.19373 [pdf, other]

Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition

Authors: Yihang Dong, Xuhang Chen, Yanyan Shen, Michael Kwok-Po Ng, Tao Qian, Shuqiang Wang

Abstract: Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attem… ▽ More Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attempted to address these issues, they still face limitations in practical effectiveness and model framework unity. Current methods often struggle to capture the complex spatial-temporal dynamics of EEG signals and fail to effectively integrate multimodal information, resulting in suboptimal performance and limited generalizability across subjects. To overcome these limitations, we develop a Pre-trained model based Multimodal Mood Reader for cross-subject emotion recognition that utilizes masked brain signal modeling and interlinked spatial-temporal attention mechanism. The model learns universal latent representations of EEG signals through pre-training on large scale dataset, and employs Interlinked spatial-temporal attention mechanism to process Differential Entropy(DE) features extracted from EEG data. Subsequently, a multi-level fusion layer is proposed to integrate the discriminative features, maximizing the advantages of features across different dimensions and modalities. Extensive experiments on public datasets demonstrate Mood Reader's superior performance in cross-subject emotion recognition tasks, outperforming state-of-the-art methods. Additionally, the model is dissected from attention perspective, providing qualitative analysis of emotion-related brain areas, offering valuable insights for affective research in neural signal processing. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted by International Conference on Neural Computing for Advanced Applications, 2024

arXiv:2405.13996 [pdf, other]

Detecting Gait Abnormalities in Foot-Floor Contacts During Walking Through FootstepInduced Structural Vibrations

Authors: Yiwen Dong, Yuyan Wu, Hae Young Noh

Abstract: Gait abnormality detection is critical for the early discovery and progressive tracking of musculoskeletal and neurological disorders, such as Parkinson's and Cerebral Palsy. Especially, analyzing the foot-floor contacts during walking provides important insights into gait patterns, such as contact area, contact force, and contact time, enabling gait abnormality detection through these measurement… ▽ More Gait abnormality detection is critical for the early discovery and progressive tracking of musculoskeletal and neurological disorders, such as Parkinson's and Cerebral Palsy. Especially, analyzing the foot-floor contacts during walking provides important insights into gait patterns, such as contact area, contact force, and contact time, enabling gait abnormality detection through these measurements. Existing studies use various sensing devices to capture such information, including cameras, wearables, and force plates. However, the former two lack force-related information, making it difficult to identify the causes of gait health issues, while the latter has limited coverage of the walking path. In this study, we leverage footstep-induced structural vibrations to infer foot-floor contact profiles and detect gait abnormalities. The main challenge lies in modeling the complex force transfer mechanism between the foot and the floor surfaces, leading to difficulty in reconstructing the force and contact profile during foot-floor interaction using structural vibrations. To overcome the challenge, we first characterize the floor vibration for each contact type (e.g., heel, midfoot, and toe contact) to understand how contact forces and areas affect the induced floor vibration. Then, we leverage the time-frequency response spectrum resulting from those contacts to develop features that are representative of each contact type. Finally, gait abnormalities are detected by comparing the predicted foot-floor contact force and motion with the healthy gait. To evaluate our approach, we conducted a real-world walking experiment with 8 subjects. Our approach achieves 91.6% and 96.7% accuracy in predicting contact type and time, respectively, leading to 91.9% accuracy in detecting various types of gait abnormalities, including asymmetry, dragging, and midfoot/toe contacts. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: The 14th International Workshop on Structural Health Monitoring (IWSHM)

arXiv:2405.01119 [pdf]

Towards Understanding Worldwide Cross-cultural Differences in Implicit Driving Cues: Review, Comparative Analysis, and Research Roadmap

Authors: Yongqi Dong, Chang Liu, Yiyun Wang, Zhe Fu

Abstract: Recognizing and understanding implicit driving cues across diverse cultures is imperative for fostering safe and efficient global transportation systems, particularly when training new immigrants holding driving licenses from culturally disparate countries. Additionally, it is essential to consider cross-cultural differences in the development of Automated Driving features tailored to different co… ▽ More Recognizing and understanding implicit driving cues across diverse cultures is imperative for fostering safe and efficient global transportation systems, particularly when training new immigrants holding driving licenses from culturally disparate countries. Additionally, it is essential to consider cross-cultural differences in the development of Automated Driving features tailored to different countries. Previous piloting studies have compared and analyzed cross-cultural differences in selected implicit driving cues, but they typically examine only limited countries. However, a comprehensive worldwide comparison and analysis are lacking. This study conducts a thorough review of existing literature, online blogs, and expert insights from diverse countries to investigate cross-cultural disparities in driving behaviors, specifically focusing on implicit cues such as non-verbal communication (e.g., hand gestures, signal lighting, honking), norms, and social expectations. Through comparative analysis, variations in driving cues are illuminated across different cultural contexts. Based on the findings and identified gaps, a research roadmap is proposed for future research to further explore and address these differences, aiming to enhance intercultural communication, improve road safety, and increase transportation efficiency on a global scale. This paper presents the pioneering work towards a comprehensive understanding of the implicit driving cues across cultures. Moreover, this understanding will inform the development of automated driving systems tailored to different countries considering cross-cultural differences. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 7 pages, 1 figure, under review by the 27th IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2024)

arXiv:2403.19943 [pdf, other]

TDANet: A Novel Temporal Denoise Convolutional Neural Network With Attention for Fault Diagnosis

Authors: Zhongzhi Li, Rong Fan, **gqi Tu, **yi Ma, Jianliang Ai, Yiqun Dong

Abstract: Fault diagnosis plays a crucial role in maintaining the operational integrity of mechanical systems, preventing significant losses due to unexpected failures. As intelligent manufacturing and data-driven approaches evolve, Deep Learning (DL) has emerged as a pivotal technique in fault diagnosis research, recognized for its ability to autonomously extract complex features. However, the practical ap… ▽ More Fault diagnosis plays a crucial role in maintaining the operational integrity of mechanical systems, preventing significant losses due to unexpected failures. As intelligent manufacturing and data-driven approaches evolve, Deep Learning (DL) has emerged as a pivotal technique in fault diagnosis research, recognized for its ability to autonomously extract complex features. However, the practical application of current fault diagnosis methods is challenged by the complexity of industrial environments. This paper proposed the Temporal Denoise Convolutional Neural Network With Attention (TDANet), designed to improve fault diagnosis performance in noise environments. This model transforms one-dimensional signals into two-dimensional tensors based on their periodic properties, employing multi-scale 2D convolution kernels to extract signal information both within and across periods. This method enables effective identification of signal characteristics that vary over multiple time scales. The TDANet incorporates a Temporal Variable Denoise (TVD) module with residual connections and a Multi-head Attention Fusion (MAF) module, enhancing the saliency of information within noisy data and maintaining effective fault diagnosis performance. Evaluation on two datasets, CWRU (single sensor) and Real aircraft sensor fault (multiple sensors), demonstrates that the TDANet model significantly outperforms existing deep learning approaches in terms of diagnostic accuracy under noisy environments. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.07988 [pdf, other]

Configuration and EMT Simulation of the 240-bus MiniWECC System Integrating Offshore Wind Farms (OWFs)

Authors: Buxin She, Hisham Mahmood, Marcelo Elizondo, Veronica Adetola, Yuqing Dong

Abstract: As offshore wind farms (OWFs) become increasingly prevalent in Northern California and Southern Oregon, they introduce faster dynamics into the Western Electricity Coordinating Council (WECC) system, resha** its dynamic behavior. Accordingly, electromagnetic transient (EMT) simulation is essential to assess high frequency dynamics of the WECC system with integrated OWFs. Against this background,… ▽ More As offshore wind farms (OWFs) become increasingly prevalent in Northern California and Southern Oregon, they introduce faster dynamics into the Western Electricity Coordinating Council (WECC) system, resha** its dynamic behavior. Accordingly, electromagnetic transient (EMT) simulation is essential to assess high frequency dynamics of the WECC system with integrated OWFs. Against this background, this paper presents the integration of detailed dynamic models of OWFs into a 240-bus miniWECC system in PSCAD software. The sequential initialization technique is employed to facilitate the smooth initiation of a large-scale system in an EMT simulation. The performance of the configured model is assessed under wind speed variations and grounded faults, demonstrating the effectiveness of the miniWECC system with OWFs. This system serves as a valuable basic use case for validating the fast dynamic performance of future WECC systems with high penetration of wind energy. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 5 pages

arXiv:2403.07317 [pdf, other]

GMPC: Geometric Model Predictive Control for Wheeled Mobile Robot Trajectory Tracking

Authors: Jiawei Tang, Shuang Wu, Bo Lan, Yahui Dong, Yuqiang **, Guangjian Tian, Wen-An Zhang, Ling Shi

Abstract: The configuration of most robotic systems lies in continuous transformation groups. However, in mobile robot trajectory tracking, many recent works still naively utilize optimization methods for elements in vector space without considering the manifold constraint of the robot configuration. In this letter, we propose a geometric model predictive control (MPC) framework for wheeled mobile robot tra… ▽ More The configuration of most robotic systems lies in continuous transformation groups. However, in mobile robot trajectory tracking, many recent works still naively utilize optimization methods for elements in vector space without considering the manifold constraint of the robot configuration. In this letter, we propose a geometric model predictive control (MPC) framework for wheeled mobile robot trajectory tracking. We first derive the error dynamics of the wheeled mobile robot trajectory tracking by considering its manifold constraint and kinematic constraint simultaneously. After that, we utilize the relationship between the Lie group and Lie algebra to convexify the tracking control problem, which enables us to solve the problem efficiently. Thanks to the Lie group formulation, our method tracks the trajectory more smoothly than existing nonlinear MPC. Simulations and physical experiments verify the effectiveness of our proposed methods. Our pure Python-based simulation platform is publicly available to benefit further research in the community. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.01546 [pdf, other]

doi 10.1109/JIOT.2024.3362587

Privacy-Preserving Distributed Learning for Residential Short-Term Load Forecasting

Authors: Yi Dong, Yingjie Wang, Mariana Gama, Mustafa A. Mustafa, Geert Deconinck, Xiaowei Huang

Abstract: In the realm of power systems, the increasing involvement of residential users in load forecasting applications has heightened concerns about data privacy. Specifically, the load data can inadvertently reveal the daily routines of residential users, thereby posing a risk to their property security. While federated learning (FL) has been employed to safeguard user privacy by enabling model training… ▽ More In the realm of power systems, the increasing involvement of residential users in load forecasting applications has heightened concerns about data privacy. Specifically, the load data can inadvertently reveal the daily routines of residential users, thereby posing a risk to their property security. While federated learning (FL) has been employed to safeguard user privacy by enabling model training without the exchange of raw data, these FL models have shown vulnerabilities to emerging attack techniques, such as Deep Leakage from Gradients and poisoning attacks. To counteract these, we initially employ a Secure-Aggregation (SecAgg) algorithm that leverages multiparty computation cryptographic techniques to mitigate the risk of gradient leakage. However, the introduction of SecAgg necessitates the deployment of additional sub-center servers for executing the multiparty computation protocol, thereby escalating computational complexity and reducing system robustness, especially in scenarios where one or more sub-centers are unavailable. To address these challenges, we introduce a Markovian Switching-based distributed training framework, the convergence of which is substantiated through rigorous theoretical analysis. The Distributed Markovian Switching (DMS) topology shows strong robustness towards the poisoning attacks as well. Case studies employing real-world power system load data validate the efficacy of our proposed algorithm. It not only significantly minimizes communication complexity but also maintains accuracy levels comparable to traditional FL methods, thereby enhancing the scalability of our load forecasting algorithm. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.17593 [pdf, other]

Head and Neck Tumor Segmentation from [18F]F-FDG PET/CT Images Based on 3D Diffusion Model

Authors: Yafei Dong, Kuang Gong

Abstract: Head and neck (H&N) cancers are among the most prevalent types of cancer worldwide, and [18F]F-FDG PET/CT is widely used for H&N cancer management. Recently, the diffusion model has demonstrated remarkable performance in various image-generation tasks. In this work, we proposed a 3D diffusion model to accurately perform H&N tumor segmentation from 3D PET and CT volumes. The 3D diffusion model was… ▽ More Head and neck (H&N) cancers are among the most prevalent types of cancer worldwide, and [18F]F-FDG PET/CT is widely used for H&N cancer management. Recently, the diffusion model has demonstrated remarkable performance in various image-generation tasks. In this work, we proposed a 3D diffusion model to accurately perform H&N tumor segmentation from 3D PET and CT volumes. The 3D diffusion model was developed considering the 3D nature of PET and CT images acquired. During the reverse process, the model utilized a 3D UNet structure and took the concatenation of PET, CT, and Gaussian noise volumes as the network input to generate the tumor mask. Experiments based on the HECKTOR challenge dataset were conducted to evaluate the effectiveness of the proposed diffusion model. Several state-of-the-art techniques based on U-Net and Transformer structures were adopted as the reference methods. Benefits of employing both PET and CT as the network input as well as further extending the diffusion model from 2D to 3D were investigated based on various quantitative metrics and the uncertainty maps generated. Results showed that the proposed 3D diffusion model could generate more accurate segmentation results compared with other methods. Compared to the diffusion model in 2D format, the proposed 3D model yielded superior results. Our experiments also highlighted the advantage of utilizing dual-modality PET and CT data over only single-modality data for H&N tumor segmentation. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 28 pages, 5 figures

arXiv:2401.15321 [pdf]

doi 10.1016/j.apenergy.2024.122736

Localization of Dummy Data Injection Attacks in Power Systems Considering Incomplete Topological Information: A Spatio-Temporal Graph Wavelet Convolutional Neural Network Approach

Authors: Zhaoyang Qu, Yunchang Dong, Yang Li, Siqi Song, Tao Jiang, Min Li, Qiming Wang, Lei Wang, Xiaoyong Bo, Jiye Zang, Qi Xu

Abstract: The emergence of novel the dummy data injection attack (DDIA) poses a severe threat to the secure and stable operation of power systems. These attacks are particularly perilous due to the minimal Euclidean spatial separation between the injected malicious data and legitimate data, rendering their precise detection challenging using conventional distance-based methods. Furthermore, existing researc… ▽ More The emergence of novel the dummy data injection attack (DDIA) poses a severe threat to the secure and stable operation of power systems. These attacks are particularly perilous due to the minimal Euclidean spatial separation between the injected malicious data and legitimate data, rendering their precise detection challenging using conventional distance-based methods. Furthermore, existing research predominantly focuses on various machine learning techniques, often analyzing the temporal data sequences post-attack or relying solely on Euclidean spatial characteristics. Unfortunately, this approach tends to overlook the inherent topological correlations within the non-Euclidean spatial attributes of power grid data, consequently leading to diminished accuracy in attack localization. To address this issue, this study takes a comprehensive approach. Initially, it examines the underlying principles of these new DDIAs on power systems. Here, an intricate mathematical model of the DDIA is designed, accounting for incomplete topological knowledge and alternating current (AC) state estimation from an attacker's perspective. Subsequently, by integrating a priori knowledge of grid topology and considering the temporal correlations within measurement data and the topology-dependent attributes of the power grid, this study introduces temporal and spatial attention matrices. These matrices adaptively capture the spatio-temporal correlations within the attacks. Leveraging gated stacked causal convolution and graph wavelet sparse convolution, the study jointly extracts spatio-temporal DDIA features. Finally, the research proposes a DDIA localization method based on spatio-temporal graph neural networks. The accuracy and effectiveness of the DDIA model are rigorously demonstrated through comprehensive analytical cases. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: Accepted by Applied Energy

Journal ref: Applied Energy 360 (2024) 122736

arXiv:2401.05083 [pdf, other]

Discrete-Time Stress Matrix-Based Formation Control of General Linear Multi-Agent Systems

Authors: Okechi Onuoha, Suleiman Kurawa, Zezhi Tang, Yi Dong

Abstract: This paper considers the distributed leader-follower stress-matrix-based affine formation control problem of discrete-time linear multi-agent systems with static and dynamic leaders. In leader-follower multi-agent formation control, the aim is to drive a set of agents comprising leaders and followers to form any desired geometric pattern and simultaneously execute any required manoeuvre by control… ▽ More This paper considers the distributed leader-follower stress-matrix-based affine formation control problem of discrete-time linear multi-agent systems with static and dynamic leaders. In leader-follower multi-agent formation control, the aim is to drive a set of agents comprising leaders and followers to form any desired geometric pattern and simultaneously execute any required manoeuvre by controlling only a few agents denoted as leaders. Existing works in literature are mostly limited to the cases where the agents' inter-agent communications are either in the continuous-time settings or the sampled-data cases where the leaders are constrained to constant (or zero) velocities or accelerations. Here, we relax these constraints and study the discrete-time cases where the leaders can have stationary or time-varying velocities. We propose control laws in the study of different situations and provide some sufficient conditions to guarantee the overall system stability. Simulation study is used to demonstrate the efficacy of our proposed control laws. △ Less

Submitted 10 January, 2024; originally announced January 2024.

arXiv:2401.01496 [pdf, other]

From Pixel to Slide image: Polarization Modality-based Pathological Diagnosis Using Representation Learning

Authors: Jia Dong, Yao Yao, Yang Dong, Hui Ma

Abstract: Thyroid cancer is the most common endocrine malignancy, and accurately distinguishing between benign and malignant thyroid tumors is crucial for develo** effective treatment plans in clinical practice. Pathologically, thyroid tumors pose diagnostic challenges due to improper specimen sampling. In this study, we have designed a three-stage model using representation learning to integrate pixel-le… ▽ More Thyroid cancer is the most common endocrine malignancy, and accurately distinguishing between benign and malignant thyroid tumors is crucial for develo** effective treatment plans in clinical practice. Pathologically, thyroid tumors pose diagnostic challenges due to improper specimen sampling. In this study, we have designed a three-stage model using representation learning to integrate pixel-level and slice-level annotations for distinguishing thyroid tumors. This structure includes a pathology structure recognition method to predict structures related to thyroid tumors, an encoder-decoder network to extract pixel-level annotation information by learning the feature representations of image blocks, and an attention-based learning mechanism for the final classification task. This mechanism learns the importance of different image blocks in a pathological region, globally considering the information from each block. In the third stage, all information from the image blocks in a region is aggregated using attention mechanisms, followed by classification to determine the category of the region. Experimental results demonstrate that our proposed method can predict microscopic structures more accurately. After color-coding, the method achieves results on unstained pathology slides that approximate the quality of Hematoxylin and eosin staining, reducing the need for stained pathology slides. Furthermore, by leveraging the concept of indirect measurement and extracting polarized features from structures correlated with lesions, the proposed method can also classify samples where membrane structures cannot be obtained through sampling, providing a potential objective and highly accurate indirect diagnostic technique for thyroid tumors. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.16607 [pdf, other]

A Polarization and Radiomics Feature Fusion Network for the Classification of Hepatocellular Carcinoma and Intrahepatic Cholangiocarcinoma

Authors: Jia Dong, Yao Yao, Liyan Lin, Yang Dong, Jiachen Wan, Ran Peng, Chao Li, Hui Ma

Abstract: Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mu… ▽ More Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mueller matrix images of liver pathological samples with radiomics features derived from corresponding pathological images to classify HCC and ICC. Our fusion network integrates a two-tier fusion approach, comprising early feature-level fusion and late classification-level fusion. By harnessing the strengths of polarization imaging techniques and image feature-based machine learning, our proposed fusion network significantly enhances classification accuracy. Notably, even at reduced imaging resolutions, the fusion network maintains robust performance due to the additional information provided by polarization features, which may not align with human visual perception. Our experimental results underscore the potential of this fusion network as a powerful tool for computer-aided diagnosis of HCC and ICC, showcasing the benefits and prospects of integrating polarization imaging techniques into the current image-intensive digital pathological diagnosis. We aim to contribute this innovative approach to top-tier journals, offering fresh insights and valuable tools in the fields of medical imaging and cancer diagnosis. By introducing polarization imaging into liver cancer classification, we demonstrate its interdisciplinary potential in addressing challenges in medical image analysis, promising advancements in medical imaging and cancer diagnosis. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.04610 [pdf]

Data-driven Semi-supervised Machine Learning with Surrogate Safety Measures for Abnormal Driving Behavior Detection

Authors: Yongqi Dong, Lanxin Zhang, Haneen Farah, Arkady Zgonnikov, Bart van Arem

Abstract: Detecting abnormal driving behavior is critical for road traffic safety and the evaluation of drivers' behavior. With the advancement of machine learning (ML) algorithms and the accumulation of naturalistic driving data, many ML models have been adopted for abnormal driving behavior detection. Most existing ML-based detectors rely on (fully) supervised ML methods, which require substantial labeled… ▽ More Detecting abnormal driving behavior is critical for road traffic safety and the evaluation of drivers' behavior. With the advancement of machine learning (ML) algorithms and the accumulation of naturalistic driving data, many ML models have been adopted for abnormal driving behavior detection. Most existing ML-based detectors rely on (fully) supervised ML methods, which require substantial labeled data. However, ground truth labels are not always available in the real world, and labeling large amounts of data is tedious. Thus, there is a need to explore unsupervised or semi-supervised methods to make the anomaly detection process more feasible and efficient. To fill this research gap, this study analyzes large-scale real-world data revealing several abnormal driving behaviors (e.g., sudden acceleration, rapid lane-changing) and develops a Hierarchical Extreme Learning Machines (HELM) based semi-supervised ML method using partly labeled data to accurately detect the identified abnormal driving behaviors. Moreover, previous ML-based approaches predominantly utilize basic vehicle motion features (such as velocity and acceleration) to label and detect abnormal driving behaviors, while this study seeks to introduce Surrogate Safety Measures (SSMs) as the input features for ML models to improve the detection performance. Results from extensive experiments demonstrate the effectiveness of the proposed semi-supervised ML model with the introduced SSMs serving as important features. The proposed semi-supervised ML method outperforms other baseline semi-supervised or unsupervised methods regarding various metrics, e.g., delivering the best accuracy at 99.58% and the best F-1 measure at 0.9913. The ablation study further highlights the significance of SSMs for advancing detection performance. △ Less

Submitted 24 May, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: 22 pages, 10 figures, accepted by the 103rd Transportation Research Board (TRB) Annual Meeting, under third round review by Transportation Research Record: Journal of the Transportation Research Board

arXiv:2312.04398 [pdf]

Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning

Authors: Yongqi Dong, Xingmin Lu, Ruohan Li, Wei Song, Bart van Arem, Haneen Farah

Abstract: The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, t… ▽ More The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems. △ Less

Submitted 29 May, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: 22 pages, 6 figures, accepted by the 103rd Transportation Research Board (TRB) Annual Meeting, under review by Transportation Research Record: Journal of the Transportation Research Board

arXiv:2311.15168 [pdf, other]

A Data-Driven Approach for High-Impedance Fault Localization in Distribution Systems

Authors: Yuqi Zhou, Yuqing Dong, Rui Yang

Abstract: Accurate and quick identification of high-impedance faults is critical for the reliable operation of distribution systems. Unlike other faults in power grids, HIFs are very difficult to detect by conventional overcurrent relays due to the low fault current. Although HIFs can be affected by various factors, the voltage current characteristics can substantially imply how the system responds to the d… ▽ More Accurate and quick identification of high-impedance faults is critical for the reliable operation of distribution systems. Unlike other faults in power grids, HIFs are very difficult to detect by conventional overcurrent relays due to the low fault current. Although HIFs can be affected by various factors, the voltage current characteristics can substantially imply how the system responds to the disturbance and thus provides opportunities to effectively localize HIFs. In this work, we propose a data-driven approach for the identification of HIF events. To tackle the nonlinearity of the voltage current trajectory, first, we formulate optimization problems to approximate the trajectory with piecewise functions. Then we collect the function features of all segments as inputs and use the support vector machine approach to efficiently identify HIFs at different locations. Numerical studies on the IEEE 123-node test feeder demonstrate the validity and accuracy of the proposed approach for real-time HIF identification. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2311.13361 [pdf, other]

Applying Large Language Models to Power Systems: Potential Security Threats

Authors: Jiaqi Ruan, Gaoqi Liang, Huan Zhao, Guolong Liu, Xianzhuo Sun, **g Qiu, Zhao Xu, Fushuan Wen, Zhao Yang Dong

Abstract: Applying large language models (LLMs) to modern power systems presents a promising avenue for enhancing decision-making and operational efficiency. However, this action may also incur potential security threats, which have not been fully recognized so far. To this end, this article analyzes potential threats incurred by applying LLMs to power systems, emphasizing the need for urgent research and d… ▽ More Applying large language models (LLMs) to modern power systems presents a promising avenue for enhancing decision-making and operational efficiency. However, this action may also incur potential security threats, which have not been fully recognized so far. To this end, this article analyzes potential threats incurred by applying LLMs to power systems, emphasizing the need for urgent research and development of countermeasures. △ Less

Submitted 24 January, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.08816 [pdf, other]

Target-oriented Domain Adaptation for Infrared Image Super-Resolution

Authors: Yongsong Huang, Tomo Miyazaki, Xiaofeng Liu, Yafei Dong, Shinichiro Omachi

Abstract: Recent efforts have explored leveraging visible light images to enrich texture details in infrared (IR) super-resolution. However, this direct adaptation approach often becomes a double-edged sword, as it improves texture at the cost of introducing noise and blurring artifacts. To address these challenges, we propose the Target-oriented Domain Adaptation SRGAN (DASRGAN), an innovative framework sp… ▽ More Recent efforts have explored leveraging visible light images to enrich texture details in infrared (IR) super-resolution. However, this direct adaptation approach often becomes a double-edged sword, as it improves texture at the cost of introducing noise and blurring artifacts. To address these challenges, we propose the Target-oriented Domain Adaptation SRGAN (DASRGAN), an innovative framework specifically engineered for robust IR super-resolution model adaptation. DASRGAN operates on the synergy of two key components: 1) Texture-Oriented Adaptation (TOA) to refine texture details meticulously, and 2) Noise-Oriented Adaptation (NOA), dedicated to minimizing noise transfer. Specifically, TOA uniquely integrates a specialized discriminator, incorporating a prior extraction branch, and employs a Sobel-guided adversarial loss to align texture distributions effectively. Concurrently, NOA utilizes a noise adversarial loss to distinctly separate the generative and Gaussian noise pattern distributions during adversarial training. Our extensive experiments confirm DASRGAN's superiority. Comparative analyses against leading methods across multiple benchmarks and upsampling factors reveal that DASRGAN sets new state-of-the-art performance standards. Code are available at \url{https://github.com/yongsongH/DASRGAN}. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 11 pages, 9 figures

arXiv:2311.08808 [pdf, other]

Degradation Estimation Recurrent Neural Network with Local and Non-Local Priors for Compressive Spectral Imaging

Authors: Yubo Dong, Dahua Gao, Yuyan Li, Guangming Shi, Danhua Liu

Abstract: In the Coded Aperture Snapshot Spectral Imaging (CASSI) system, deep unfolding networks (DUNs) have demonstrated excellent performance in recovering 3D hyperspectral images (HSIs) from 2D measurements. However, some noticeable gaps exist between the imaging model used in DUNs and the real CASSI imaging process, such as the sensing error as well as photon and dark current noise, compromising the ac… ▽ More In the Coded Aperture Snapshot Spectral Imaging (CASSI) system, deep unfolding networks (DUNs) have demonstrated excellent performance in recovering 3D hyperspectral images (HSIs) from 2D measurements. However, some noticeable gaps exist between the imaging model used in DUNs and the real CASSI imaging process, such as the sensing error as well as photon and dark current noise, compromising the accuracy of solving the data subproblem and the prior subproblem in DUNs. To address this issue, we propose a Degradation Estimation Network (DEN) to correct the imaging model used in DUNs by simultaneously estimating the sensing error and the noise level, thereby improving the performance of DUNs. Additionally, we propose an efficient Local and Non-local Transformer (LNLT) to solve the prior subproblem, which not only effectively models local and non-local similarities but also reduces the computational cost of the window-based global Multi-head Self-attention (MSA). Furthermore, we transform the DUN into a Recurrent Neural Network (RNN) by sharing parameters of DNNs across stages, which not only allows DNN to be trained more adequately but also significantly reduces the number of parameters. The proposed DERNN-LNLT achieves state-of-the-art (SOTA) performance with fewer parameters on both simulation and real datasets. △ Less

Submitted 14 January, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.15767 [pdf, ps, other]

Unpaired MRI Super Resolution with Contrastive Learning

Authors: Hao Li, Quanwei Liu, Jianan Liu, Xiling Liu, Yanni Dong, Tao Huang, Zhihan Lv

Abstract: Magnetic resonance imaging (MRI) is crucial for enhancing diagnostic accuracy in clinical settings. However, the inherent long scan time of MRI restricts its widespread applicability. Deep learning-based image super-resolution (SR) methods exhibit promise in improving MRI resolution without additional cost. Due to lacking of aligned high-resolution (HR) and low-resolution (LR) MRI image pairs, uns… ▽ More Magnetic resonance imaging (MRI) is crucial for enhancing diagnostic accuracy in clinical settings. However, the inherent long scan time of MRI restricts its widespread applicability. Deep learning-based image super-resolution (SR) methods exhibit promise in improving MRI resolution without additional cost. Due to lacking of aligned high-resolution (HR) and low-resolution (LR) MRI image pairs, unsupervised approaches are widely adopted for SR reconstruction with unpaired MRI images. However, these methods still require a substantial number of HR MRI images for training, which can be difficult to acquire. To this end, we propose an unpaired MRI SR approach that employs contrastive learning to enhance SR performance with limited HR training data. Empirical results presented in this study underscore significant enhancements in the peak signal-to-noise ratio and structural similarity index, even when a paucity of HR images is available. These findings accentuate the potential of our approach in addressing the challenge of limited HR training data, thereby contributing to the advancement of MRI in clinical applications. △ Less

Submitted 16 February, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.11690 [pdf, other]

doi 10.1016/j.rser.2023.113913

Deep learning based on Transformer architecture for power system short-term voltage stability assessment with class imbalance

Authors: Yang Li, Jiting Cao, Yan Xu, Lipeng Zhu, Zhao Yang Dong

Abstract: Most existing data-driven power system short-term voltage stability assessment (STVSA) approaches presume class-balanced input data. However, in practical applications, the occurrence of short-term voltage instability following a disturbance is minimal, leading to a significant class imbalance problem and a consequent decline in classifier performance. This work proposes a Transformer-based STVSA… ▽ More Most existing data-driven power system short-term voltage stability assessment (STVSA) approaches presume class-balanced input data. However, in practical applications, the occurrence of short-term voltage instability following a disturbance is minimal, leading to a significant class imbalance problem and a consequent decline in classifier performance. This work proposes a Transformer-based STVSA method to address this challenge. By utilizing the basic Transformer architecture, a stability assessment Transformer (StaaT) is developed {as a classification model to reflect the correlation between the operational states of the system and the resulting stability outcomes}. To combat the negative impact of imbalanced datasets, this work employs a conditional Wasserstein generative adversarial network with gradient penalty (CWGAN-GP) for synthetic data generation, aiding in the creation of a balanced, representative training set for the classifier. Semi-supervised clustering learning is implemented to enhance clustering quality, addressing the lack of a unified quantitative criterion for short-term voltage stability. {Numerical tests on the IEEE 39-bus test system extensively demonstrate that the proposed method exhibits robust performance under class imbalances up to 100:1 and noisy environments, and maintains consistent effectiveness even with an increased penetration of renewable energy}. Comparative results reveal that the CWGAN-GP generates more balanced datasets than traditional oversampling methods and that the StaaT outperforms other deep learning algorithms. This study presents a compelling solution for real-world STVSA applications that often face class imbalance and data noise challenges. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted by Renewable and Sustainable Energy Reviews

Journal ref: Renewable and Sustainable Energy Reviews 189 (2024) 113913

arXiv:2310.10376 [pdf, other]

Research on Train Shunting Impedance Based on Transmission Line Theory

Authors: Yinchao Dong, Linhai Zhao

Abstract: At present, the shunting process of train to track circuit is usually studied by taking the shunting resistance of the first wheel set of train as the equivalent model, which ignores the shunting effect of other wheel sets and cannot study the fault conditions such as "pool shunting". Especially for the jointless track circuit (JTC), the compensation capacitors connected in parallel on the rail li… ▽ More At present, the shunting process of train to track circuit is usually studied by taking the shunting resistance of the first wheel set of train as the equivalent model, which ignores the shunting effect of other wheel sets and cannot study the fault conditions such as "pool shunting". Especially for the jointless track circuit (JTC), the compensation capacitors connected in parallel on the rail line will cause more complex train shunting process. To solve this problem, based on the transmission line theory, starting from the shunting resistance of each wheel set and considering the leakage of track bet, this paper established as six-terminal network model of train shunting impedance, indirectly verified the correctness of the model by using the working principle of track circuit reader (TCR), and focused on the influence of wheel set shunting resistance, compensation capacitors and rail line parameters on train shunting impedance. Finally, based on the calculation of the structural importance of train wheel set, a simplified model of train shunting impedance was constructed, providing theoretical support for the further study of fault conditions such as "poor shunting". △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.06678 [pdf, other]

Modelling and Performance Analysis of the Over-the-Air Computing in Cellular IoT Networks

Authors: Ying Dong, Haonan Hu, Qiaoshou Liu, Tingwei Lv, Qianbin Chen, Jie Zhang

Abstract: Ultra-fast wireless data aggregation (WDA) of distributed data has emerged as a critical design challenge in the ultra-densely deployed cellular internet of things network (CITN) due to limited spectral resources. Over-the-air computing (AirComp) has been proposed as an effective solution for ultra-fast WDA by exploiting the superposition property of wireless channels. However, the effect of acces… ▽ More Ultra-fast wireless data aggregation (WDA) of distributed data has emerged as a critical design challenge in the ultra-densely deployed cellular internet of things network (CITN) due to limited spectral resources. Over-the-air computing (AirComp) has been proposed as an effective solution for ultra-fast WDA by exploiting the superposition property of wireless channels. However, the effect of access radius of access point (AP) on the AirComp performance has not been investigated yet. Therefore, in this work, the mean square error (MSE) performance of AirComp in the ultra-densely deployed CITN is analyzed with the AP access radius. By modelling the spatial locations of internet of things devices as a Poisson point process, the expression of MSE is derived in an analytical form, which is validated by Monte Carlo simulations. Based on the analytical MSE, we investigate the effect of AP access radius on the MSE of AirComp numerically. The results show that there exists an optimal AP access radius for AirComp, which can decrease the MSE by up to 12.7%. It indicates that the AP access radius should be carefully chosen to improve the AirComp performance in the ultra-densely deployed CITN. △ Less

Submitted 11 August, 2023; originally announced October 2023.

arXiv:2309.16813 [pdf, other]

Wi-Fi 8: Embracing the Millimeter-Wave Era

Authors: Xiaoqian Liu, Tingwei Chen, Yuhan Dong, Zhi Mao, Ming Gan, Xun Yang, Jianmin Lu

Abstract: With the increasing demands in communication, Wi-Fi technology is advancing towards its next generation. Building on the foundation of Wi-Fi 7, millimeter-wave technology is anticipated to converge with Wi-Fi 8 in the near future. In this paper, we look into the millimeter-wave technology and other potential feasible features, providing a comprehensive perspective on the future of Wi-Fi 8. Our sim… ▽ More With the increasing demands in communication, Wi-Fi technology is advancing towards its next generation. Building on the foundation of Wi-Fi 7, millimeter-wave technology is anticipated to converge with Wi-Fi 8 in the near future. In this paper, we look into the millimeter-wave technology and other potential feasible features, providing a comprehensive perspective on the future of Wi-Fi 8. Our simulation results demonstrate that significant performance gains can be achieved, even in the presence of hardware impairments. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 7 pages, 4 figures

arXiv:2309.15951 [pdf, other]

IEEE 802.11be Wi-Fi 7: Feature Summary and Performance Evaluation

Authors: Xiaoqian Liu, Yuhan Dong, Yiqing Li, Yousi Lin, Xun Yang, Ming Gan

Abstract: While the pace of commercial scale application of Wi-Fi 6 accelerates, the IEEE 802.11 Working Group is about to complete the development of a new amendment standard IEEE 802.11be -- Extremely High Throughput (EHT), also known as Wi-Fi 7, which can be used to meet the demand for the throughput of 4K/8K videos up to tens of Gbps and low-latency video applications such as virtual reality (VR) and au… ▽ More While the pace of commercial scale application of Wi-Fi 6 accelerates, the IEEE 802.11 Working Group is about to complete the development of a new amendment standard IEEE 802.11be -- Extremely High Throughput (EHT), also known as Wi-Fi 7, which can be used to meet the demand for the throughput of 4K/8K videos up to tens of Gbps and low-latency video applications such as virtual reality (VR) and augmented reality (AR). Wi-Fi 7 not only scales Wi-Fi 6 with doubled bandwidth, but also supports real-time applications, which brings revolutionary changes to Wi-Fi. In this article, we start by introducing the main objectives and timeline of Wi-Fi 7 and then list the latest key techniques which promote the performance improvement of Wi-Fi 7. Finally, we validate the most critical objectives of Wi-Fi 7 -- the potential up to 30 Gbps throughput and lower latency. System-level simulation results suggest that by combining the new techniques, Wi-Fi 7 achieves 30 Gbps throughput and lower latency than Wi-Fi 6. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: 6 pages, 4 figures

arXiv:2309.01958 [pdf, other]

Empowering Low-Light Image Enhancer through Customized Learnable Priors

Authors: Naishan Zheng, Man Zhou, Yanmeng Dong, Xiangyu Rui, Jie Huang, Chongyi Li, Feng Zhao

Abstract: Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end map** networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issu… ▽ More Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end map** networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issues, they rely on proximal operator networks that deliver ambiguous and implicit priors. In this work, we propose a paradigm for low-light image enhancement that explores the potential of customized learnable priors to improve the transparency of the deep unfolding paradigm. Motivated by the powerful feature representation capability of Masked Autoencoder (MAE), we customize MAE-based illumination and noise priors and redevelop them from two perspectives: 1) \textbf{structure flow}: we train the MAE from a normal-light image to its illumination properties and then embed it into the proximal operator design of the unfolding architecture; and m2) \textbf{optimization flow}: we train MAE from a normal-light image to its gradient representation and then employ it as a regularization term to constrain noise in the model output. These designs improve the interpretability and representation capability of the model.Extensive experiments on multiple low-light image enhancement datasets demonstrate the superiority of our proposed paradigm over state-of-the-art methods. Code is available at https://github.com/zheng980629/CUE. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted by ICCV 2023

arXiv:2308.06979 [pdf, other]

doi 10.5334/tismir.171

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track

Authors: Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang , et al. (2 additional authors not shown)

Abstract: This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce t… ▽ More This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding. We describe the methods that achieved the highest scores in the competition. Moreover, we present a direct comparison with the previous edition of the challenge (the Music Demixing Challenge 2021): the best performing system achieved an improvement of over 1.6dB in signal-to-distortion ratio over the winner of the previous competition, when evaluated on MDXDB21. Besides relying on the signal-to-distortion ratio as objective metric, we also performed a listening test with renowned producers and musicians to study the perceptual quality of the systems and report here the results. Finally, we provide our insights into the organization of the competition and our prospects for future editions. △ Less

Submitted 19 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: Published in Transactions of the International Society for Music Information Retrieval (https://transactions.ismir.net/articles/10.5334/tismir.171)

Journal ref: Transactions of the International Society for Music Information Retrieval, 7(1), pp.63-84, 2024

arXiv:2307.09729 [pdf, other]

NTIRE 2023 Quality Assessment of Video Enhancement Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu , et al. (47 additional authors not shown)

Abstract: This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual… ▽ More This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2306.11984 [pdf, ps, other]

TauPETGen: Text-Conditional Tau PET Image Synthesis Based on Latent Diffusion Models

Authors: Se-In Jang, Cristina Lois, Emma Thibault, J. Alex Becker, Yafei Dong, Marc D. Normandin, Julie C. Price, Keith A. Johnson, Georges El Fakhri, Kuang Gong

Abstract: In this work, we developed a novel text-guided image synthesis technique which could generate realistic tau PET images from textual descriptions and the subject's MR image. The generated tau PET images have the potential to be used in examining relations between different measures and also increasing the public availability of tau PET datasets. The method was based on latent diffusion models. Both… ▽ More In this work, we developed a novel text-guided image synthesis technique which could generate realistic tau PET images from textual descriptions and the subject's MR image. The generated tau PET images have the potential to be used in examining relations between different measures and also increasing the public availability of tau PET datasets. The method was based on latent diffusion models. Both textual descriptions and the subject's MR prior image were utilized as conditions during image generation. The subject's MR image can provide anatomical details, while the text descriptions, such as gender, scan time, cognitive test scores, and amyloid status, can provide further guidance regarding where the tau neurofibrillary tangles might be deposited. Preliminary experimental results based on clinical [18F]MK-6240 datasets demonstrate the feasibility of the proposed method in generating realistic tau PET images at different clinical stages. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.11466 [pdf, other]

Comprehensive Training and Evaluation on Deep Reinforcement Learning for Automated Driving in Various Simulated Driving Maneuvers

Authors: Yongqi Dong, Tobias Datema, Vincent Wassenaar, Joris van de Weg, Cahit Tolga Kopar, Harim Suleman

Abstract: Develo** and testing automated driving models in the real world might be challenging and even dangerous, while simulation can help with this, especially for challenging maneuvers. Deep reinforcement learning (DRL) has the potential to tackle complex decision-making and controlling tasks through learning and interacting with the environment, thus it is suitable for develo** automated driving wh… ▽ More Develo** and testing automated driving models in the real world might be challenging and even dangerous, while simulation can help with this, especially for challenging maneuvers. Deep reinforcement learning (DRL) has the potential to tackle complex decision-making and controlling tasks through learning and interacting with the environment, thus it is suitable for develo** automated driving while not being explored in detail yet. This study carried out a comprehensive study by implementing, evaluating, and comparing the two DRL algorithms, Deep Q-networks (DQN) and Trust Region Policy Optimization (TRPO), for training automated driving on the highway-env simulation platform. Effective and customized reward functions were developed and the implemented algorithms were evaluated in terms of onlane accuracy (how well the car drives on the road within the lane), efficiency (how fast the car drives), safety (how likely the car is to crash into obstacles), and comfort (how much the car makes jerks, e.g., suddenly accelerates or brakes). Results show that the TRPO-based models with modified reward functions delivered the best performance in most cases. Furthermore, to train a uniform driving model that can tackle various driving maneuvers besides the specific ones, this study expanded the highway-env and developed an extra customized training environment, namely, ComplexRoads, integrating various driving maneuvers and multiple road scenarios together. Models trained on the designed ComplexRoads environment can adapt well to other driving maneuvers with promising overall performance. Lastly, several functionalities were added to the highway-env to implement this work. The codes are open on GitHub at https://github.com/alaineman/drlcarsim-paper. △ Less

Submitted 18 August, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: 6 pages, 3 figures, accepted by the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

arXiv:2306.11465 [pdf]

Safe, Efficient, Comfort, and Energy-saving Automated Driving through Roundabout Based on Deep Reinforcement Learning

Authors: Henan Yuan, Penghui Li, Bart van Arem, Liujiang Kang, Yongqi Dong

Abstract: Traffic scenarios in roundabouts pose substantial complexity for automated driving. Manually map** all possible scenarios into a state space is labor-intensive and challenging. Deep reinforcement learning (DRL) with its ability to learn from interacting with the environment emerges as a promising solution for training such automated driving models. This study explores, employs, and implements va… ▽ More Traffic scenarios in roundabouts pose substantial complexity for automated driving. Manually map** all possible scenarios into a state space is labor-intensive and challenging. Deep reinforcement learning (DRL) with its ability to learn from interacting with the environment emerges as a promising solution for training such automated driving models. This study explores, employs, and implements various DRL algorithms, namely Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and Trust Region Policy Optimization (TRPO) to instruct automated vehicles' driving through roundabouts. The driving state space, action space, and reward function are designed. The reward function considers safety, efficiency, comfort, and energy consumption to align with real-world requirements. All three tested DRL algorithms succeed in enabling automated vehicles to drive through the roundabout. To holistically evaluate the performance of these algorithms, this study establishes an evaluation methodology considering multiple indicators such as safety, efficiency, and comfort level. A method employing the Analytic Hierarchy Process is also developed to weigh these evaluation indicators. Experimental results on various testing scenarios reveal that the TRPO algorithm outperforms DDPG and PPO in terms of safety and efficiency, and PPO performs best in terms of comfort level. Lastly, to verify the model's adaptability and robustness regarding other driving scenarios, this study also deploys the model trained by TRPO to a range of different testing scenarios, e.g., highway driving and merging. Experimental results demonstrate that the TRPO model trained on only roundabout driving scenarios exhibits a certain degree of proficiency in highway driving and merging scenarios. This study provides a foundation for the application of automated driving with DRL in real traffic environments. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: 6 pages, 3 figures, under review by the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

arXiv:2306.10494 [pdf, other]

Semi-Supervised Learning for Multi-Label Cardiovascular Diseases Prediction:A Multi-Dataset Study

Authors: Rushuang Zhou, Lei Lu, Zijun Liu, Ting Xiang, Zhen Liang, David A. Clifton, Yining Dong, Yuan-Ting Zhang

Abstract: Electrocardiography (ECG) is a non-invasive tool for predicting cardiovascular diseases (CVDs). Current ECG-based diagnosis systems show promising performance owing to the rapid development of deep learning techniques. However, the label scarcity problem, the co-occurrence of multiple CVDs and the poor performance on unseen datasets greatly hinder the widespread application of deep learning-based… ▽ More Electrocardiography (ECG) is a non-invasive tool for predicting cardiovascular diseases (CVDs). Current ECG-based diagnosis systems show promising performance owing to the rapid development of deep learning techniques. However, the label scarcity problem, the co-occurrence of multiple CVDs and the poor performance on unseen datasets greatly hinder the widespread application of deep learning-based models. Addressing them in a unified framework remains a significant challenge. To this end, we propose a multi-label semi-supervised model (ECGMatch) to recognize multiple CVDs simultaneously with limited supervision. In the ECGMatch, an ECGAugment module is developed for weak and strong ECG data augmentation, which generates diverse samples for model training. Subsequently, a hyperparameter-efficient framework with neighbor agreement modeling and knowledge distillation is designed for pseudo-label generation and refinement, which mitigates the label scarcity problem. Finally, a label correlation alignment module is proposed to capture the co-occurrence information of different CVDs within labeled samples and propagate this information to unlabeled samples. Extensive experiments on four datasets and three protocols demonstrate the effectiveness and stability of the proposed model, especially on unseen datasets. As such, this model can pave the way for diagnostic systems that achieve robust performance on multi-label CVDs prediction with limited supervision. △ Less

Submitted 18 June, 2023; originally announced June 2023.

arXiv:2305.18807 [pdf]

Design of the Reverse Logistics System for Medical Waste Recycling Part II: Route Optimization with Case Study under COVID-19 Pandemic

Authors: Chaozhong Xue, Yongqi Dong, Jiaqi Liu, Yijun Liao, Lingbo Li

Abstract: Medical waste recycling and treatment has gradually drawn concerns from the whole society, as the amount of medical waste generated is increasing dramatically, especially during the pandemic of COVID-19. To tackle the emerging challenges, this study designs a reverse logistics system architecture with three modules, i.e., medical waste classification & monitoring module, temporary storage & dispos… ▽ More Medical waste recycling and treatment has gradually drawn concerns from the whole society, as the amount of medical waste generated is increasing dramatically, especially during the pandemic of COVID-19. To tackle the emerging challenges, this study designs a reverse logistics system architecture with three modules, i.e., medical waste classification & monitoring module, temporary storage & disposal site (disposal site for short) selection module, as well as route optimization module. This overall solution design won the Grand Prize of the "YUNFENG CUP" China National Contest on Green Supply and Reverse Logistics Design ranking 1st. This paper focuses on the design of the route optimization module. In this module, a route optimization problem is designed considering transportation costs and multiple risk costs (e.g., environment risk, population risk, property risk, and other accident-related risks). The Analytic Hierarchy Process is employed to determine the weights for each risk element, and a customized genetic algorithm is developed to solve the route optimization problem. A case study under the COVID-19 pandemic is further provided to verify the proposed model. Limited by length, detailed descriptions of the whole system and the other modules can be found at https://shorturl.at/cdY59. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 6 pages, 4 figures, under review by the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

arXiv:2305.17271 [pdf]

doi 10.1109/TITS.2023.3305015

Robust Lane Detection through Self Pre-training with Masked Sequential Autoencoders and Fine-tuning with Customized PolyLoss

Authors: Ruohan Li, Yongqi Dong

Abstract: Lane detection is crucial for vehicle localization which makes it the foundation for automated driving and many intelligent and advanced driving assistant systems. Available vision-based lane detection methods do not make full use of the valuable features and aggregate contextual information, especially the interrelationships between lane lines and other regions of the images in continuous frames.… ▽ More Lane detection is crucial for vehicle localization which makes it the foundation for automated driving and many intelligent and advanced driving assistant systems. Available vision-based lane detection methods do not make full use of the valuable features and aggregate contextual information, especially the interrelationships between lane lines and other regions of the images in continuous frames. To fill this research gap and upgrade lane detection performance, this paper proposes a pipeline consisting of self pre-training with masked sequential autoencoders and fine-tuning with customized PolyLoss for the end-to-end neural network models using multi-continuous image frames. The masked sequential autoencoders are adopted to pre-train the neural network models with reconstructing the missing pixels from a random masked image as the objective. Then, in the fine-tuning segmentation phase where lane detection segmentation is performed, the continuous image frames are served as the inputs, and the pre-trained model weights are transferred and further updated using the backpropagation mechanism with customized PolyLoss calculating the weighted errors between the output lane detection results and the labeled ground truth. Extensive experiment results demonstrate that, with the proposed pipeline, the lane detection model performance on both normal and challenging scenes can be advanced beyond the state-of-the-art, delivering the best testing accuracy (98.38%), precision (0.937), and F1-measure (0.924) on the normal scene testing set, together with the best overall accuracy (98.36%) and precision (0.844) in the challenging scene test set, while the training time can be substantially shortened. △ Less

Submitted 11 August, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: 12 pages, 8 figures, accepted by journal of IEEE Transactions on Intelligent Transportation Systems

arXiv:2305.10666 [pdf, other]

doi 10.1109/ICASSP48485.2024.10447144

A unified front-end framework for English text-to-speech synthesis

Authors: Zelin Ying, Chen Li, Yu Dong, Qiuqiang Kong, Qiao Tian, Yuanyuan Huo, Yuxuan Wang

Abstract: The front-end is a critical component of English text-to-speech (TTS) systems, responsible for extracting linguistic features that are essential for a text-to-speech model to synthesize speech, such as prosodies and phonemes. The English TTS front-end typically consists of a text normalization (TN) module, a prosody word prosody phrase (PWPP) module, and a grapheme-to-phoneme (G2P) module. However… ▽ More The front-end is a critical component of English text-to-speech (TTS) systems, responsible for extracting linguistic features that are essential for a text-to-speech model to synthesize speech, such as prosodies and phonemes. The English TTS front-end typically consists of a text normalization (TN) module, a prosody word prosody phrase (PWPP) module, and a grapheme-to-phoneme (G2P) module. However, current research on the English TTS front-end focuses solely on individual modules, neglecting the interdependence between them and resulting in sub-optimal performance for each module. Therefore, this paper proposes a unified front-end framework that captures the dependencies among the English TTS front-end modules. Extensive experiments have demonstrated that the proposed method achieves state-of-the-art (SOTA) performance in all modules. △ Less

Submitted 25 March, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: Accepted in ICASSP 2024

arXiv:2305.09512 [pdf, other]

Light-VQA: A Multi-Dimensional Quality Assessment Model for Low-Light Video Enhancement

Authors: Yunlong Dong, Xiaohong Liu, Yixuan Gao, Xunchu Zhou, Tao Tan, Guangtao Zhai

Abstract: Recently, Users Generated Content (UGC) videos becomes ubiquitous in our daily lives. However, due to the limitations of photographic equipments and techniques, UGC videos often contain various degradations, in which one of the most visually unfavorable effects is the underexposure. Therefore, corresponding video enhancement algorithms such as Low-Light Video Enhancement (LLVE) have been proposed… ▽ More Recently, Users Generated Content (UGC) videos becomes ubiquitous in our daily lives. However, due to the limitations of photographic equipments and techniques, UGC videos often contain various degradations, in which one of the most visually unfavorable effects is the underexposure. Therefore, corresponding video enhancement algorithms such as Low-Light Video Enhancement (LLVE) have been proposed to deal with the specific degradation. However, different from video enhancement algorithms, almost all existing Video Quality Assessment (VQA) models are built generally rather than specifically, which measure the quality of a video from a comprehensive perspective. To the best of our knowledge, there is no VQA model specially designed for videos enhanced by LLVE algorithms. To this end, we first construct a Low-Light Video Enhancement Quality Assessment (LLVE-QA) dataset in which 254 original low-light videos are collected and then enhanced by leveraging 8 LLVE algorithms to obtain 2,060 videos in total. Moreover, we propose a quality assessment model specialized in LLVE, named Light-VQA. More concretely, since the brightness and noise have the most impact on low-light enhanced VQA, we handcraft corresponding features and integrate them with deep-learning-based semantic features as the overall spatial information. As for temporal information, in addition to deep-learning-based motion features, we also investigate the handcrafted brightness consistency among video frames, and the overall temporal information is their concatenation. Subsequently, spatial and temporal information is fused to obtain the quality-aware representation of a video. Extensive experimental results show that our Light-VQA achieves the best performance against the current State-Of-The-Art (SOTA) on LLVE-QA and public dataset. Dataset and Codes can be found at https://github.com/wenzhouyidu/Light-VQA. △ Less

Submitted 6 August, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.03172 [pdf, other]

TelecomTM: A Fine-Grained and Ubiquitous Traffic Monitoring System Using Pre-Existing Telecommunication Fiber-Optic Cables as Sensors

Authors: **gxiao Liu, Siyuan Yuan, Yiwen Dong, Biondo Biondi, Hae Young Noh

Abstract: We introduce the TelecomTM system that uses pre-existing telecommunication fiber-optic cables as virtual strain sensors to sense vehicle-induced ground vibrations for fine-grained and ubiquitous traffic monitoring and characterization. Here we call it a virtual sensor because it is a software-based representation of a physical sensor. Due to the extensively installed telecommunication fiber-optic… ▽ More We introduce the TelecomTM system that uses pre-existing telecommunication fiber-optic cables as virtual strain sensors to sense vehicle-induced ground vibrations for fine-grained and ubiquitous traffic monitoring and characterization. Here we call it a virtual sensor because it is a software-based representation of a physical sensor. Due to the extensively installed telecommunication fiber-optic cables at the roadside, our system using redundant dark fibers enables to monitor traffic at low cost with low maintenance. Many existing traffic monitoring approaches use cameras, piezoelectric sensors, and smartphones, but they are limited due to privacy concerns and/or deployment requirements. Previous studies attempted to use telecommunication cables for traffic monitoring, but they were only exploratory and limited to simple tasks at a coarse granularity, e.g., vehicle detection, due to their hardware constraints and real-world challenges. In particular, those challenges are 1) unknown and heterogeneous properties of virtual sensors and 2) large and complex noise conditions. To this end, our TelecomTM system first characterizes the geographic location and analyzes the signal pattern of each virtual sensor through driving tests. We then develop a spatial-domain Bayesian filtering and smoothing algorithm to detect, track, and characterize each vehicle. Our approach uses the spatial dependency of multiple virtual sensors and Newton's laws of motion to combine the distributed sensor data to reduce uncertainties in vehicle detection and tracking. In our real-world evaluation on a two-way traffic road with 1120 virtual sensors, TelecomTM achieved 90.18% vehicle detection accuracy, 27$\times$ and 5$\times$ error reduction for vehicle position and speed tracking compared to a baseline method, and $\pm$3.92% and $\pm$11.98% percent error for vehicle wheelbase and weight estimation, respectively. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2305.00179 [pdf, other]

Integrated Sensing and Communications: Recent Advances and Ten Open Challenges

Authors: Shihang Lu, Fan Liu, Yunxin Li, Kecheng Zhang, Hongjia Huang, Jiaqi Zou, Xinyu Li, Yuxiang Dong, Fuwang Dong, Jia Zhu, Yifeng Xiong, Weijie Yuan, Yuanhao Cui, Lajos Hanzo

Abstract: It is anticipated that integrated sensing and communications (ISAC) would be one of the key enablers of next-generation wireless networks (such as beyond 5G (B5G) and 6G) for supporting a variety of emerging applications. In this paper, we provide a comprehensive review of the recent advances in ISAC systems, with a particular focus on their foundations, system design, networking aspects and ISAC… ▽ More It is anticipated that integrated sensing and communications (ISAC) would be one of the key enablers of next-generation wireless networks (such as beyond 5G (B5G) and 6G) for supporting a variety of emerging applications. In this paper, we provide a comprehensive review of the recent advances in ISAC systems, with a particular focus on their foundations, system design, networking aspects and ISAC applications. Furthermore, we discuss the corresponding open questions of the above that emerged in each issue. Hence, we commence with the information theory of sensing and communications (S$\&$C), followed by the information-theoretic limits of ISAC systems by shedding light on the fundamental performance metrics. Next, we discuss their clock synchronization and phase offset problems, the associated Pareto-optimal signaling strategies, as well as the associated super-resolution ISAC system design. Moreover, we envision that ISAC ushers in a paradigm shift for the future cellular networks relying on network sensing, transforming the classic cellular architecture, cross-layer resource management methods, and transmission protocols. In ISAC applications, we further highlight the security and privacy issues of wireless sensing. Finally, we close by studying the recent advances in a representative ISAC use case, namely the multi-object multi-task (MOMT) recognition problem using wireless signals. △ Less

Submitted 17 December, 2023; v1 submitted 29 April, 2023; originally announced May 2023.

Comments: 26 pages, 22 figures, resubmitted to IEEE Journal. Appreciation for the outstanding contributions of coauthors in the paper!

arXiv:2304.06496 [pdf, other]

EEGMatch: Learning with Incomplete Labels for Semi-Supervised EEG-based Cross-Subject Emotion Recognition

Authors: Rushuang Zhou, Weishan Ye, Zhiguo Zhang, Yanyang Luo, Li Zhang, Linling Li, Gan Huang, Yining Dong, Yuan-Ting Zhang, Zhen Liang

Abstract: Electroencephalography (EEG) is an objective tool for emotion recognition and shows promising performance. However, the label scarcity problem is a main challenge in this field, which limits the wide application of EEG-based emotion recognition. In this paper, we propose a novel semi-supervised learning framework (EEGMatch) to leverage both labeled and unlabeled EEG data. First, an EEG-Mixup based… ▽ More Electroencephalography (EEG) is an objective tool for emotion recognition and shows promising performance. However, the label scarcity problem is a main challenge in this field, which limits the wide application of EEG-based emotion recognition. In this paper, we propose a novel semi-supervised learning framework (EEGMatch) to leverage both labeled and unlabeled EEG data. First, an EEG-Mixup based data augmentation method is developed to generate more valid samples for model learning. Second, a semi-supervised two-step pairwise learning method is proposed to bridge prototype-wise and instance-wise pairwise learning, where the prototype-wise pairwise learning measures the global relationship between EEG data and the prototypical representation of each emotion class and the instance-wise pairwise learning captures the local intrinsic relationship among EEG data. Third, a semi-supervised multi-domain adaptation is introduced to align the data representation among multiple domains (labeled source domain, unlabeled source domain, and target domain), where the distribution mismatch is alleviated. Extensive experiments are conducted on two benchmark databases (SEED and SEED-IV) under a cross-subject leave-one-subject-out cross-validation evaluation protocol. The results show the proposed EEGmatch performs better than the state-of-the-art methods under different incomplete label conditions (with 6.89% improvement on SEED and 1.44% improvement on SEED-IV), which demonstrates the effectiveness of the proposed EEGMatch in dealing with the label scarcity problem in emotion recognition using EEG signals. The source code is available at https://github.com/KAZABANA/EEGMatch. △ Less

Submitted 27 March, 2023; originally announced April 2023.

arXiv:2303.09290 [pdf, other]

VDPVE: VQA Dataset for Perceptual Video Enhancement

Authors: Yixuan Gao, Yuqin Cao, Tengchuan Kou, Wei Sun, Yunlong Dong, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai

Abstract: Recently, many video enhancement methods have been proposed to improve video quality from different aspects such as color, brightness, contrast, and stability. Therefore, how to evaluate the quality of the enhanced video in a way consistent with human visual perception is an important research topic. However, most video quality assessment methods mainly calculate video quality by estimating the di… ▽ More Recently, many video enhancement methods have been proposed to improve video quality from different aspects such as color, brightness, contrast, and stability. Therefore, how to evaluate the quality of the enhanced video in a way consistent with human visual perception is an important research topic. However, most video quality assessment methods mainly calculate video quality by estimating the distortion degrees of videos from an overall perspective. Few researchers have specifically proposed a video quality assessment method for video enhancement, and there is also no comprehensive video quality assessment dataset available in public. Therefore, we construct a Video quality assessment dataset for Perceptual Video Enhancement (VDPVE) in this paper. The VDPVE has 1211 videos with different enhancements, which can be divided into three sub-datasets: the first sub-dataset has 600 videos with color, brightness, and contrast enhancements; the second sub-dataset has 310 videos with deblurring; and the third sub-dataset has 301 deshaked videos. We invited 21 subjects (20 valid subjects) to rate all enhanced videos in the VDPVE. After normalizing and averaging the subjective opinion scores, the mean opinion score of each video can be obtained. Furthermore, we split the VDPVE into a training set, a validation set, and a test set, and verify the performance of several state-of-the-art video quality assessment methods on the test set of the VDPVE. △ Less

Submitted 16 March, 2023; originally announced March 2023.

arXiv:2302.12428 [pdf]

A holistically 3D-printed flexible millimeter-wave Doppler radar: Towards fully printed high-frequency multilayer flexible hybrid electronics systems

Authors: Hong Tang, Yingjie Zhang, Bowen Zheng, Sensong An, Mohammad Haerinia, Yunxi Dong, Yi Huang, Wei Guo, Hualiang Zhang

Abstract: Flexible hybrid electronics (FHE) is an emerging technology enabled through the integration of advanced semiconductor devices and 3D printing technology. It unlocks tremendous market potential by realizing low-cost flexible circuits and systems that can be conformally integrated into various applications. However, the operating frequencies of most reported FHE systems are relatively low. It is als… ▽ More Flexible hybrid electronics (FHE) is an emerging technology enabled through the integration of advanced semiconductor devices and 3D printing technology. It unlocks tremendous market potential by realizing low-cost flexible circuits and systems that can be conformally integrated into various applications. However, the operating frequencies of most reported FHE systems are relatively low. It is also worth to note that reported FHE systems have been limited to relatively simple design concept (since complex systems will impose challenges in aspects such as multilayer interconnections, printing materials, and bonding layers). Here, we report a fully 3D-printed flexible four-layer millimeter-wave Doppler radar (i.e., a millimeter-wave FHE system). The sensing performance and flexibility of the 3D-printed radar are characterized and validated by general field tests and bending tests, respectively. Our results demonstrate the feasibility of develo** fully 3D-printed high-frequency multilayer FHE, which can be conformally integrated into irregular surfaces (e.g., vehicle bumpers) for applications such as vehicle radars and wearable electronics. △ Less

Submitted 23 February, 2023; originally announced February 2023.

MSC Class: 78-05

arXiv:2302.04961 [pdf]

Design of the Reverse Logistics System for Medical Waste Recycling Part I: System Architecture and Disposal Site Selection Algorithm

Authors: Chaozhong Xue, Yongqi Dong, Jiaqi Liu, Yijun Liao, Lingbo Li

Abstract: With social progress and the development of modern medical technology, the amount of medical waste generated is increasing dramatically. The problem of medical waste recycling and treatment has gradually drawn concerns from the whole society. The sudden outbreak of the COVID-19 epidemic further brought new challenges. To tackle the challenges, this study proposes a reverse logistics system archite… ▽ More With social progress and the development of modern medical technology, the amount of medical waste generated is increasing dramatically. The problem of medical waste recycling and treatment has gradually drawn concerns from the whole society. The sudden outbreak of the COVID-19 epidemic further brought new challenges. To tackle the challenges, this study proposes a reverse logistics system architecture with three modules, i.e., medical waste classification & monitoring module, temporary storage & disposal site (disposal site for short) selection module, as well as route optimization module. This overall solution design won the Grand Prize of the "YUNFENG CUP" China National Contest on Green Supply and Reverse Logistics Design ranking 1st. This paper focuses on the description of architectural design and the module on site selection. Specifically, regarding system architecture, a framework diagram is provided, together with brief descriptions of the three proposed modules and a case study under the COVID-19 epidemic with the customized model. Regarding the disposal site selection module, a multi-objective optimization model is developed, and considering different types of waste collection sites (i.e., prioritized large collection sites and common collection sites), a hierarchical solution method is developed employing linear programming and K-means clustering algorithms sequentially. The proposed site selection method is verified with a case study using real-world data, and compared with the baseline, it can immensely reduce the daily operational costs and working time. Limited by length, detailed descriptions of the whole system as well as the remaining medical waste classification & monitoring module and route optimization module can be found at https://shorturl.at/cdY59. △ Less

Submitted 27 May, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

Comments: 6 pages, 6 figures, submitted to and under review by the IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

arXiv:2302.01728 [pdf, other]

Decentralised and Cooperative Control of Multi-Robot Systems through Distributed Optimisation

Authors: Yi Dong, Zhongguo Li, Xingyu Zhao, Zhengtao Ding, Xiaowei Huang

Abstract: Multi-robot cooperative control has gained extensive research interest due to its wide applications in civil, security, and military domains. This paper proposes a cooperative control algorithm for multi-robot systems with general linear dynamics. The algorithm is based on distributed cooperative optimisation and output regulation, and it achieves global optimum by utilising only information share… ▽ More Multi-robot cooperative control has gained extensive research interest due to its wide applications in civil, security, and military domains. This paper proposes a cooperative control algorithm for multi-robot systems with general linear dynamics. The algorithm is based on distributed cooperative optimisation and output regulation, and it achieves global optimum by utilising only information shared among neighbouring robots. Technically, a high-level distributed optimisation algorithm for multi-robot systems is presented, which will serve as an optimal reference generator for each individual agent. Then, based on the distributed optimisation algorithm, an output regulation method is utilised to solve the optimal coordination problem for general linear dynamic systems. The convergence of the proposed algorithm is theoretically proved. Both numerical simulations and real-time physical robot experiments are conducted to validate the effectiveness of the proposed cooperative control algorithms. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Comments: Accepted by AAMAS'23

arXiv:2302.00810 [pdf, other]

Beyond KNN: Deep Neighborhood Learning for WiFi-based Indoor Positioning Systems

Authors: Yinhuan Dong, Francisco Zampella, Firas Alsehly

Abstract: K-Neares Neighbors (KNN) and its variant weighted KNN (WKNN) have been explored for years in both academy and industry to provide stable and reliable performance in WiFi-based indoor positioning systems. Such algorithms estimate the location of a given point based on the locality information from the selected nearest WiFi neighbors according to some distance metrics calculated from the combination… ▽ More K-Neares Neighbors (KNN) and its variant weighted KNN (WKNN) have been explored for years in both academy and industry to provide stable and reliable performance in WiFi-based indoor positioning systems. Such algorithms estimate the location of a given point based on the locality information from the selected nearest WiFi neighbors according to some distance metrics calculated from the combination of WiFi received signal strength (RSS). However, such a process does not consider the relational information among the given point, WiFi neighbors, and the WiFi access points (WAPs). Therefore, this study proposes a novel Deep Neighborhood Learning (DNL). The proposed DNL approach converts the WiFi neighborhood to heterogeneous graphs, and utilizes deep graph learning to extract better representation of the WiFi neighborhood to improve the positioning accuracy. Experiments on 3 real industrial datasets collected from 3 mega shop** malls on 26 floors have shown that the proposed approach can reduce the mean absolute positioning error by 10% to 50% in most of the cases. Specially, the proposed approach sharply reduces the root mean squared positioning error and 95\% percentile positioning error, being more robust to the outliers than conventional KNN and WKNN. △ Less

Submitted 1 February, 2023; originally announced February 2023.

arXiv:2212.03378 [pdf, other]

doi 10.1145/3560905.3568416

PigV$^2$: Monitoring Pig Vital Signs through Ground Vibrations Induced by Heartbeat and Respiration

Authors: Yiwen Dong, Jesse R Codling, Gary Rohrer, Jeremy Miles, Sudhendu Sharma, Tami Brown-Brandl, Pei Zhang, Hae Young Noh

Abstract: Pig vital sign monitoring (e.g., estimating the heart rate (HR) and respiratory rate (RR)) is essential to understand the stress level of the sow and detect the onset of parturition. It helps to maximize peri-natal survival and improve animal well-being in swine production. The existing approach mainly relies on manual measurement, which is labor-intensive and only provides a few points of informa… ▽ More Pig vital sign monitoring (e.g., estimating the heart rate (HR) and respiratory rate (RR)) is essential to understand the stress level of the sow and detect the onset of parturition. It helps to maximize peri-natal survival and improve animal well-being in swine production. The existing approach mainly relies on manual measurement, which is labor-intensive and only provides a few points of information. Other sensing modalities such as wearables and cameras are developed to enable more continuous measurement, but are still limited due to animal discomfort, data transfer, and storage challenges. In this paper, we introduce PigV$^2$, the first system to monitor pig heart rate and respiratory rate through ground vibrations. Our approach leverages the insight that both heartbeat and respiration generate ground vibrations when the sow is lying on the floor. We infer vital information by sensing and analyzing these vibrations. The main challenge in develo** PigV$^2$ is the overlap of vital- and non-vital-related information in the vibration signals, including pig movements, pig postures, pig-to-sensor distances, and so on. To address this issue, we first characterize their effects, extract their current status, and then reduce their impact by adaptively interpolating vital rates over multiple sensors. PigV$^2$ is evaluated through a real-world deployment with 30 pigs. It has 3.4% and 8.3% average errors in monitoring the HR and RR of the sows, respectively. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: 7 pages, 9 figures

arXiv:2212.03377 [pdf, other]

doi 10.1145/3560905.3568435

GaitVibe+: Enhancing Structural Vibration-based Footstep Localization Using Temporary Cameras for In-home Gait Analysis

Authors: Yiwen Dong, **gxiao Liu, Hae Young Noh

Abstract: In-home gait analysis is important for providing early diagnosis and adaptive treatments for individuals with gait disorders. Existing systems include wearables and pressure mats, but they have limited scalability. Recent studies have developed vision-based systems to enable scalable, accurate in-home gait analysis, but it faces privacy concerns due to the exposure of people's appearances. Our pri… ▽ More In-home gait analysis is important for providing early diagnosis and adaptive treatments for individuals with gait disorders. Existing systems include wearables and pressure mats, but they have limited scalability. Recent studies have developed vision-based systems to enable scalable, accurate in-home gait analysis, but it faces privacy concerns due to the exposure of people's appearances. Our prior work developed footstep-induced structural vibration sensing for gait monitoring, which is device-free, wide-ranged, and perceived as more privacy-friendly. Although it has succeeded in temporal gait event extraction, it shows limited performance for spatial gait parameter estimation due to imprecise footstep localization. In particular, the localization error mainly comes from the estimation error of the wave arrival time at the vibration sensors and its error propagation to wave velocity estimations. Therefore, we present GaitVibe+, a vibration-based footstep localization method fused with temporarily installed cameras for in-home gait analysis. Our method has two stages: fusion and operating. In the fusion stage, both cameras and vibration sensors are installed to record only a few trials of the subject's footstep data, through which we characterize the uncertainty in wave arrival time and model the wave velocity profiles for the given structure. In the operating stage, we remove the camera to preserve privacy at home. The footstep localization is conducted by estimating the time difference of arrival (TDoA) over multiple vibration sensors, whose accuracy is improved through the reduced uncertainty and velocity modeling during the fusion stage. We evaluate GaitVibe+ through a real-world experiment with 50 walking trials. With only 3 trials of multi-modal fusion, our approach has an average localization error of 0.22 meters, which reduces the spatial gait parameter error from 111% to 27%. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: 7 pages, 7 figures

ACM Class: J.3

arXiv:2211.06891 [pdf, other]

Residual Degradation Learning Unfolding Framework with Mixing Priors across Spectral and Spatial for Compressive Spectral Imaging

Authors: Yubo Dong, Dahua Gao, Tian Qiu, Yuyan Li, Minxi Yang, Guangming Shi

Abstract: To acquire a snapshot spectral image, coded aperture snapshot spectral imaging (CASSI) is proposed. A core problem of the CASSI system is to recover the reliable and fine underlying 3D spectral cube from the 2D measurement. By alternately solving a data subproblem and a prior subproblem, deep unfolding methods achieve good performance. However, in the data subproblem, the used sensing matrix is il… ▽ More To acquire a snapshot spectral image, coded aperture snapshot spectral imaging (CASSI) is proposed. A core problem of the CASSI system is to recover the reliable and fine underlying 3D spectral cube from the 2D measurement. By alternately solving a data subproblem and a prior subproblem, deep unfolding methods achieve good performance. However, in the data subproblem, the used sensing matrix is ill-suited for the real degradation process due to the device errors caused by phase aberration, distortion; in the prior subproblem, it is important to design a suitable model to jointly exploit both spatial and spectral priors. In this paper, we propose a Residual Degradation Learning Unfolding Framework (RDLUF), which bridges the gap between the sensing matrix and the degradation process. Moreover, a Mix$S^2$ Transformer is designed via mixing priors across spectral and spatial to strengthen the spectral-spatial representation capability. Finally, plugging the Mix$S^2$ Transformer into the RDLUF leads to an end-to-end trainable neural network RDLUF-Mix$S^2$. Experimental results establish the superior performance of the proposed method over existing ones. △ Less

Submitted 15 November, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: CVPR 2023

arXiv:2211.05535 [pdf, ps, other]

doi 10.1109/LCOMM.2023.3274295

Joint Receiver Design for Integrated Sensing and Communications

Authors: Yuxiang Dong, Fan Liu, Yifeng Xiong

Abstract: In this letter, we investigate the joint receiver design for integrated sensing and communication (ISAC) systems, where the communication signal and the target echo signal are simultaneously received and processed to achieve a balanced performance between both functionalities. In particular, we proposed two design schemes to solve the joint sensing and communication problem of receive signal proce… ▽ More In this letter, we investigate the joint receiver design for integrated sensing and communication (ISAC) systems, where the communication signal and the target echo signal are simultaneously received and processed to achieve a balanced performance between both functionalities. In particular, we proposed two design schemes to solve the joint sensing and communication problem of receive signal processing. {\color{black} The first is based on maximum likelihood (ML) detection, minimum mean squared error (MMSE) estimation and interference cancellation (IC), and the other formulates a tailored MMSE estimator for target estimation that is independent to the detector, which form a Non-IC design. We show that with the structural information of the communication signal taken into account, the Non-IC approach outperforms the IC method and achieves optimal performance.} Numerical results are provided to validate the effectiveness of the proposed optimal designs. △ Less

Submitted 5 May, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

Showing 1–50 of 99 results for author: Dong, Y