Search | arXiv e-print repository

BrainFounder: Towards Brain Foundation Models for Neuroimage Analysis

Authors: Joseph Cox, Peng Liu, Skylar E. Stolte, Yunchao Yang, Kang Liu, Kyle B. See, Huiwen Ju, Ruogu Fang

Abstract: The burgeoning field of brain health research increasingly leverages artificial intelligence (AI) to interpret and analyze neurological data. This study introduces a novel approach towards the creation of medical foundation models by integrating a large-scale multi-modal magnetic resonance imaging (MRI) dataset derived from 41,400 participants in its own. Our method involves a novel two-stage pret… ▽ More The burgeoning field of brain health research increasingly leverages artificial intelligence (AI) to interpret and analyze neurological data. This study introduces a novel approach towards the creation of medical foundation models by integrating a large-scale multi-modal magnetic resonance imaging (MRI) dataset derived from 41,400 participants in its own. Our method involves a novel two-stage pretraining approach using vision transformers. The first stage is dedicated to encoding anatomical structures in generally healthy brains, identifying key features such as shapes and sizes of different brain regions. The second stage concentrates on spatial information, encompassing aspects like location and the relative positioning of brain structures. We rigorously evaluate our model, BrainFounder, using the Brain Tumor Segmentation (BraTS) challenge and Anatomical Tracings of Lesions After Stroke v2.0 (ATLAS v2.0) datasets. BrainFounder demonstrates a significant performance gain, surpassing the achievements of the previous winning solutions using fully supervised learning. Our findings underscore the impact of scaling up both the complexity of the model and the volume of unlabeled training data derived from generally healthy brains, which enhances the accuracy and predictive capabilities of the model in complex neuroimaging tasks with MRI. The implications of this research provide transformative insights and practical applications in healthcare and make substantial steps towards the creation of foundation models for Medical AI. Our pretrained models and training code can be found at https://github.com/lab-smile/GatorBrain. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 17 pages, 5 figures, to be published in Medical Image Analysis

arXiv:2406.09950 [pdf, other]

An efficient text augmentation approach for contextualized Mandarin speech recognition

Authors: Naijun Zheng, Xucheng Wan, Kai Liu, Ziqing Du, Zhou Huan

Abstract: Although contextualized automatic speech recognition (ASR) systems are commonly used to improve the recognition of uncommon words, their effectiveness is hindered by the inherent limitations of speech-text data availability. To address this challenge, our study proposes to leverage extensive text-only datasets and contextualize pre-trained ASR models using a straightforward text-augmentation (TA)… ▽ More Although contextualized automatic speech recognition (ASR) systems are commonly used to improve the recognition of uncommon words, their effectiveness is hindered by the inherent limitations of speech-text data availability. To address this challenge, our study proposes to leverage extensive text-only datasets and contextualize pre-trained ASR models using a straightforward text-augmentation (TA) technique, all while kee** computational costs minimal. In particular, to contextualize a pre-trained CIF-based ASR, we construct a codebook using limited speech-text data. By utilizing a simple codebook lookup process, we convert available text-only data into latent text embeddings. These embeddings then enhance the inputs for the contextualized ASR. Our experiments on diverse Mandarin test sets demonstrate that our TA approach significantly boosts recognition performance. The top-performing system shows relative CER improvements of up to 30% on rare words and 15% across all words in general. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: accepted to interspeech2024

arXiv:2406.06649 [pdf, other]

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

Authors: Kai Liu, Haotong Qin, Yong Guo, Xin Yuan, Linghe Kong, Guihai Chen, Yulun Zhang

Abstract: Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their ful… ▽ More Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their full-precision (FP) counterparts. Despite several efforts to alleviate the degradation, the transformer-based SR model still suffers severe degradation due to its distinctive activation distribution. In this work, we present a dual-stage low-bit post-training quantization (PTQ) method for image super-resolution, namely 2DQuant, which achieves efficient and accurate SR under low-bit quantization. The proposed method first investigates the weight and activation and finds that the distribution is characterized by coexisting symmetry and asymmetry, long tails. Specifically, we propose Distribution-Oriented Bound Initialization (DOBI), using different searching strategies to search a coarse bound for quantizers. To obtain refined quantizer parameters, we further propose Distillation Quantization Calibration (DQC), which employs a distillation approach to make the quantized model learn from its FP counterpart. Through extensive experiments on different bits and scaling factors, the performance of DOBI can reach the state-of-the-art (SOTA) while after stage two, our method surpasses existing PTQ in both metrics and visual effects. 2DQuant gains an increase in PSNR as high as 4.52dB on Set5 (x2) compared with SOTA when quantized to 2-bit and enjoys a 3.60x compression ratio and 5.08x speedup ratio. The code and models will be available at https://github.com/Kai-Liu001/2DQuant. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 9 pages, 6 figures. The code and models will be available at https://github.com/Kai-Liu001/2DQuant

arXiv:2406.06337 [pdf, other]

System- and Sample-agnostic Isotropic 3D Microscopy by Weakly Physics-informed, Domain-shift-resistant Axial Deblurring

Authors: Jiashu Han, Kunzan Liu, Keith B. Isaacson, Kristina Monakhova, Linda G. Griffith, Sixian You

Abstract: Three-dimensional (3D) subcellular imaging is essential for biomedical research, but the diffraction limit of optical microscopy compromises axial resolution, hindering accurate 3D structural analysis. This challenge is particularly pronounced in label-free imaging of thick, heterogeneous tissues, where assumptions about data distribution (e.g. sparsity, label-specific distribution, and lateral-ax… ▽ More Three-dimensional (3D) subcellular imaging is essential for biomedical research, but the diffraction limit of optical microscopy compromises axial resolution, hindering accurate 3D structural analysis. This challenge is particularly pronounced in label-free imaging of thick, heterogeneous tissues, where assumptions about data distribution (e.g. sparsity, label-specific distribution, and lateral-axial similarity) and system priors (e.g. independent and identically distributed (i.i.d.) noise and linear shift-invariant (LSI) point-spread functions (PSFs)) are often invalid. Here, we introduce SSAI-3D, a weakly physics-informed, domain-shift-resistant framework for robust isotropic 3D imaging. SSAI-3D enables robust axial deblurring by generating a PSF-flexible, noise-resilient, sample-informed training dataset and sparsely fine-tuning a large pre-trained blind deblurring network. SSAI-3D was applied to label-free nonlinear imaging of living organoids, freshly excised human endometrium tissue, and mouse whisker pads, and further validated in publicly available ground-truth-paired experimental datasets of 3D heterogeneous biological tissues with unknown blurring and noise across different microscopy systems. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 27 pages, 6 figures

arXiv:2405.16905 [pdf, other]

Privacy and Security Trade-off in Interconnected Systems with Known or Unknown Privacy Noise Covariance

Authors: Haojun Wang, Kun Liu, Baojia Li, Emilia Fridman, Yuanqing Xia

Abstract: This paper is concerned with the security problem for interconnected systems, where each subsystem is required to detect local attacks using locally available information and the information received from its neighboring subsystems. Moreover, we consider that there exists an additional eavesdropper being able to infer the private information by eavesdrop** transmitted data between subsystems. Th… ▽ More This paper is concerned with the security problem for interconnected systems, where each subsystem is required to detect local attacks using locally available information and the information received from its neighboring subsystems. Moreover, we consider that there exists an additional eavesdropper being able to infer the private information by eavesdrop** transmitted data between subsystems. Then, a privacy-preserving method is employed by adding privacy noise to transmitted data, and the privacy level is measured by mutual information. Nevertheless, adding privacy noise to transmitted data may affect the detection performance metrics such as detection probability and false alarm probability. Thus, we theoretically analyze the trade-off between the privacy and the detection performance. An optimization problem with maximizing both the degree of privacy preservation and the detection probability is established to obtain the covariance of the privacy noise. In addition, the attack detector of each subsystem may not obtain all information about the privacy noise. We further theoretically analyze the trade-off between the privacy and the false alarm probability when the attack detector has no knowledge of the privacy noise covariance. An optimization problem with maximizing the degree of privacy preservation with guaranteeing a bound of false alarm distortion level is established to obtain {\color{black}{the covariance of the privacy noise}}. Moreover, to analyze the effect of the privacy noise on the detection probability, we consider that each subsystem can estimate the unknown privacy noise covariance by the secondary data. Based on the estimated covariance, we construct another attack detector and analyze how the privacy noise affects its detection performance. Finally, a numerical example is provided to verify the effectiveness of theoretical results. △ Less

Submitted 1 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.14905 [pdf, other]

Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation

Authors: Kang Liu, Zhuoqi Ma, Xiaolu Kang, Zhusi Zhong, Zhicheng Jiao, Grayson Baird, Harrison Bai, Qiguang Miao

Abstract: The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction a… ▽ More The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction and patient indications \textbf{I}ncorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: The code is available at https://github.com/mk-runner/SEI-Temp or https://github.com/mk-runner/SEI

arXiv:2405.11211 [pdf]

Excess Delay from GDP: Measurement and Causal Analysis

Authors: Ke Liu, Mark Hansen

Abstract: Ground Delay Programs (GDPs) have been widely used to resolve excessive demand-capacity imbalances at arrival airports by shifting foreseen airborne delay to pre-departure ground delay. While offering clear safety and efficiency benefits, GDPs may also create additional delay because of imperfect execution and uncertainty in predicting arrival airport capacity. This paper presents a methodology fo… ▽ More Ground Delay Programs (GDPs) have been widely used to resolve excessive demand-capacity imbalances at arrival airports by shifting foreseen airborne delay to pre-departure ground delay. While offering clear safety and efficiency benefits, GDPs may also create additional delay because of imperfect execution and uncertainty in predicting arrival airport capacity. This paper presents a methodology for measuring excess delay resulting from individual GDPs and investigates factors that influence excess delay using regularized regression models. We measured excess delay for 1210 GDPs from 33 U.S. airports in 2019. On a per-restricted flight basis, the mean excess delay is 35.4 min with std of 20.6 min. In our regression analysis of the variation in excess delay, ridge regression is found to perform best. The factors affecting excess delay include time variations during gate out and taxi out for flights subject to the GDP, program rate setting and revisions, and GDP time duration. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: International Conference on Research in Air Transportation (ICRAT 2022) link: https://www.icrat.org/previous-conferences/10th-international-conference/papers/

Journal ref: International Conference on Research in Air Transportation (ICRAT 2022)

arXiv:2405.09586 [pdf, other]

Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation

Authors: Kang Liu, Zhuoqi Ma, Mengmeng Liu, Zhicheng Jiao, Xiaolu Kang, Qiguang Miao, Kun Xie

Abstract: The automation of writing imaging reports is a valuable tool for alleviating the workload of radiologists. Crucial steps in this process involve the cross-modal alignment between medical images and reports, as well as the retrieval of similar historical cases. However, the presence of presentation-style vocabulary (e.g., sentence structure and grammar) in reports poses challenges for cross-modal a… ▽ More The automation of writing imaging reports is a valuable tool for alleviating the workload of radiologists. Crucial steps in this process involve the cross-modal alignment between medical images and reports, as well as the retrieval of similar historical cases. However, the presence of presentation-style vocabulary (e.g., sentence structure and grammar) in reports poses challenges for cross-modal alignment. Additionally, existing methods for similar historical cases retrieval face suboptimal performance owing to the modal gap issue. In response, this paper introduces a novel method, named Factual Serialization Enhancement (FSE), for chest X-ray report generation. FSE begins with the structural entities approach to eliminate presentation-style vocabulary in reports, providing specific input for our model. Then, uni-modal features are learned through cross-modal alignment between images and factual serialization in reports. Subsequently, we present a novel approach to retrieve similar historical cases from the training set, leveraging aligned image features. These features implicitly preserve semantic similarity with their corresponding reference reports, enabling us to calculate similarity solely among aligned features. This effectively eliminates the modal gap issue for knowledge retrieval without the requirement for disease labels. Finally, the cross-modal fusion network is employed to query valuable information from these cases, enriching image features and aiding the text decoder in generating high-quality reports. Experiments on MIMIC-CXR and IU X-ray datasets from both specific and general scenarios demonstrate the superiority of FSE over state-of-the-art approaches in both natural language generation and clinical efficacy metrics. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.09207 [pdf, other]

An Exact Theory of Causal Emergence for Linear Stochastic Iteration Systems

Authors: Kaiwei Liu, Bing Yuan, Jiang Zhang

Abstract: After coarse-graining a complex system, the dynamics of its macro-state may exhibit more pronounced causal effects than those of its micro-state. This phenomenon, known as causal emergence, is quantified by the indicator of effective information. However, two challenges confront this theory: the absence of well-developed frameworks in continuous stochastic dynamical systems and the reliance on coa… ▽ More After coarse-graining a complex system, the dynamics of its macro-state may exhibit more pronounced causal effects than those of its micro-state. This phenomenon, known as causal emergence, is quantified by the indicator of effective information. However, two challenges confront this theory: the absence of well-developed frameworks in continuous stochastic dynamical systems and the reliance on coarse-graining methodologies. In this study, we introduce an exact theoretic framework for causal emergence within linear stochastic iteration systems featuring continuous state spaces and Gaussian noise. Building upon this foundation, we derive an analytical expression for effective information across general dynamics and identify optimal linear coarse-graining strategies that maximize the degree of causal emergence when the dimension averaged uncertainty eliminated by coarse-graining has an upper bound. Our investigation reveals that the maximal causal emergence and the optimal coarse-graining methods are primarily determined by the principal eigenvalues and eigenvectors of the dynamic system's parameter matrix, with the latter not being unique. To validate our propositions, we apply our analytical models to three simplified physical systems, comparing the outcomes with numerical simulations, and consistently achieve congruent results. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.08342 [pdf]

doi 10.1109/EMBC40787.2023.10341036

Abnormal Respiratory Sound Identification Using Audio-Spectrogram Vision Transformer

Authors: Whenty Ariyanti, Kai-Chun Liu, Kuan-Yu Chen, Yu Tsao

Abstract: Respiratory disease, the third leading cause of deaths globally, is considered a high-priority ailment requiring significant research on identification and treatment. Stethoscope-recorded lung sounds and artificial intelligence-powered devices have been used to identify lung disorders and aid specialists in making accurate diagnoses. In this study, audio-spectrogram vision transformer (AS-ViT), a… ▽ More Respiratory disease, the third leading cause of deaths globally, is considered a high-priority ailment requiring significant research on identification and treatment. Stethoscope-recorded lung sounds and artificial intelligence-powered devices have been used to identify lung disorders and aid specialists in making accurate diagnoses. In this study, audio-spectrogram vision transformer (AS-ViT), a new approach for identifying abnormal respiration sounds, was developed. The sounds of the lungs are converted into visual representations called spectrograms using a technique called short-time Fourier transform (STFT). These images are then analyzed using a model called vision transformer to identify different types of respiratory sounds. The classification was carried out using the ICBHI 2017 database, which includes various types of lung sounds with different frequencies, noise levels, and backgrounds. The proposed AS-ViT method was evaluated using three metrics and achieved 79.1% and 59.8% for 60:40 split ratio and 86.4% and 69.3% for 80:20 split ratio in terms of unweighted average recall and overall scores respectively for respiratory sound detection, surpassing previous state-of-the-art results. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Published in 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

Journal ref: 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (2023) 1-4

arXiv:2405.08306 [pdf, other]

Flight Path Optimization with Optimal Control Method

Authors: Gaofeng Su, Xi Cheng, Siyuan Feng, Ke Liu, Jilin Song, Jianan Chen, Chen Zhu, Hui Lin

Abstract: This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to d… ▽ More This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to define the dynamic model of the aircraft in accordance with the controllable inputs and wind disturbances. Then we will identify a precise objective in terms of optimization and implement an optimization program to solve it under the circumstances of simulated real flight situation. Finally, the optimization result is validated and discussed by different scenarios. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.01200 [pdf, other]

Learning-to-solve unit commitment based on few-shot physics-guided spatial-temporal graph convolution network

Authors: Mei Yang, Gao Qiu andJunyong Liu, Kai Liu

Abstract: This letter proposes a few-shot physics-guided spatial temporal graph convolutional network (FPG-STGCN) to fast solve unit commitment (UC). Firstly, STGCN is tailored to parameterize UC. Then, few-shot physics-guided learning scheme is proposed. It exploits few typical UC solutions yielded via commercial optimizer to escape from local minimum, and leverages the augmented Lagrangian method for cons… ▽ More This letter proposes a few-shot physics-guided spatial temporal graph convolutional network (FPG-STGCN) to fast solve unit commitment (UC). Firstly, STGCN is tailored to parameterize UC. Then, few-shot physics-guided learning scheme is proposed. It exploits few typical UC solutions yielded via commercial optimizer to escape from local minimum, and leverages the augmented Lagrangian method for constraint satisfaction. To further enable both feasibility and continuous relaxation for integers in learning process, straight-through estimator for Tanh-Sign composition is proposed to fully differentiate the mixed integer solution space. Case study on the IEEE benchmark justifies that, our method bests mainstream learning ways on UC feasibility, and surpasses traditional solver on efficiency. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.19182 [pdf, other]

Robust Proximity Detection using On-Device Gait Monitoring

Authors: Yuqian Hu, Guozhen Zhu, Beibei Wang, K. J. Ray Liu

Abstract: Proximity detection in indoor environments based on WiFi signals has gained significant attention in recent years. Existing works rely on the dynamic signal reflections and their extracted features are dependent on motion strength. To address this issue, we design a robust WiFi-based proximity detector by considering gait monitoring. Specifically, we propose a gait score that accurately evaluates… ▽ More Proximity detection in indoor environments based on WiFi signals has gained significant attention in recent years. Existing works rely on the dynamic signal reflections and their extracted features are dependent on motion strength. To address this issue, we design a robust WiFi-based proximity detector by considering gait monitoring. Specifically, we propose a gait score that accurately evaluates gait presence by leveraging the speed estimated from the autocorrelation function (ACF) of channel state information (CSI). By combining this gait score with a proximity feature, our approach effectively distinguishes different transition patterns, enabling more reliable proximity detection. In addition, to enhance the stability of the detection process, we employ a state machine and extract temporal information, ensuring continuous proximity detection even during subtle movements. Extensive experiments conducted in different environments demonstrate an overall detection rate of 92.5% and a low false alarm rate of 1.12% with a delay of 0.825s. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: This work has been accepted in IEEE 9th World Forum on Internet of Things (WFIoT)

arXiv:2404.15290 [pdf]

A point cloud processing method of mmWave radar over automotive scenario

Authors: Qingmian Wan, Hongli Peng, Xing Liao, Kuayue Liu

Abstract: This paper introduces in detail the effective method of comprehensive target judgment by using radar RA map and point cloud map. Different output of radar can effectively judge the road boundary of target and the relative coordinates of target, avoid the error of output caused by excessive processing information, and greatly improve the processing efficiency of DBSCAN of the measured target. This paper introduces in detail the effective method of comprehensive target judgment by using radar RA map and point cloud map. Different output of radar can effectively judge the road boundary of target and the relative coordinates of target, avoid the error of output caused by excessive processing information, and greatly improve the processing efficiency of DBSCAN of the measured target. △ Less

Submitted 23 March, 2024; originally announced April 2024.

arXiv:2404.13550 [pdf, other]

Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

Authors: Kang You, Kai Liu, Li Yu, Pan Gao, Dandan Ding

Abstract: Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance an… ▽ More Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance and extremely low-decoding-latency simultaneously. Inspired by conventional Trisoup codec, a point model-based strategy is devised to characterize local surfaces. Specifically, skin features are embedded from local windows via an attention-based encoder, and dilated windows are introduced as cross-scale priors to infer the distribution of quantized features in parallel. During decoding, features undergo fast refinement, followed by a folding-based point generator that reconstructs point coordinates with fairly fast speed. Experiments show that Pointsoup achieves state-of-the-art performance on multiple benchmarks with significantly lower decoding complexity, i.e., up to 90$\sim$160$\times$ faster than the G-PCCv23 Trisoup decoder on a comparatively low-end platform (e.g., one RTX 2080Ti). Furthermore, it offers variable-rate control with a single neural model (2.9MB), which is attractive for industrial practitioners. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.06693 [pdf, other]

Binomial Self-compensation for Motion Error in Dynamic 3D Scanning

Authors: Geyou Zhang, Ce Zhu, Kai Liu

Abstract: Phase shifting profilometry (PSP) is favored in high-precision 3D scanning due to its high accuracy, robustness, and pixel-wise property. However, a fundamental assumption of PSP that the object should remain static is violated in dynamic measurement, making PSP susceptible to object moving, resulting in ripple-like errors in the point clouds. We propose a pixel-wise and frame-wise loopable binomi… ▽ More Phase shifting profilometry (PSP) is favored in high-precision 3D scanning due to its high accuracy, robustness, and pixel-wise property. However, a fundamental assumption of PSP that the object should remain static is violated in dynamic measurement, making PSP susceptible to object moving, resulting in ripple-like errors in the point clouds. We propose a pixel-wise and frame-wise loopable binomial self-compensation (BSC) algorithm to effectively and flexibly eliminate motion error in the four-step PSP. Our mathematical model demonstrates that by summing successive motion-affected phase frames weighted by binomial coefficients, motion error exponentially diminishes as the binomial order increases, accomplishing automatic error compensation through the motion-affected phase sequence, without the assistance of any intermediate variable. Extensive experiments show that our BSC outperforms the existing methods in reducing motion error, while achieving a depth map frame rate equal to the camera's acquisition rate (90 fps), enabling high-accuracy 3D reconstruction with a quasi-single-shot frame rate. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2402.05482 [pdf, other]

A Non-Intrusive Neural Quality Assessment Model for Surface Electromyography Signals

Authors: Cho-Yuan Lee, Kuan-Chen Wang, Kai-Chun Liu, Yu-Te Wang, Xugang Lu, **-Cheng Yeh, Yu Tsao

Abstract: In practical scenarios involving the measurement of surface electromyography (sEMG) in muscles, particularly those areas near the heart, one of the primary sources of contamination is the presence of electrocardiogram (ECG) signals. To assess the quality of real-world sEMG data more effectively, this study proposes QASE-net, a new non-intrusive model that predicts the SNR of sEMG signals. QASE-net… ▽ More In practical scenarios involving the measurement of surface electromyography (sEMG) in muscles, particularly those areas near the heart, one of the primary sources of contamination is the presence of electrocardiogram (ECG) signals. To assess the quality of real-world sEMG data more effectively, this study proposes QASE-net, a new non-intrusive model that predicts the SNR of sEMG signals. QASE-net combines CNN-BLSTM with attention mechanisms and follows an end-to-end training strategy. Our experimental framework utilizes real-world sEMG and ECG data from two open-access databases, the Non-Invasive Adaptive Prosthetics Database and the MIT-BIH Normal Sinus Rhythm Database, respectively. The experimental results demonstrate the superiority of QASE-net over the previous assessment model, exhibiting significantly reduced prediction errors and notably higher linear correlations with the ground truth. These findings show the potential of QASE-net to substantially enhance the reliability and precision of sEMG quality assessment in practical applications. △ Less

Submitted 13 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 5 pages, 4 figures

arXiv:2402.03808 [pdf, other]

doi 10.1109/ICASSP48485.2024.10446154

SDEMG: Score-based Diffusion Model for Surface Electromyographic Signal Denoising

Authors: Yu-Tung Liu, Kuan-Chen Wang, Kai-Chun Liu, Sheng-Yu Peng, Yu Tsao

Abstract: Surface electromyography (sEMG) recordings can be influenced by electrocardiogram (ECG) signals when the muscle being monitored is close to the heart. Several existing methods use signal-processing-based approaches, such as high-pass filter and template subtraction, while some derive map** functions to restore clean sEMG signals from noisy sEMG (sEMG with ECG interference). Recently, the score-b… ▽ More Surface electromyography (sEMG) recordings can be influenced by electrocardiogram (ECG) signals when the muscle being monitored is close to the heart. Several existing methods use signal-processing-based approaches, such as high-pass filter and template subtraction, while some derive map** functions to restore clean sEMG signals from noisy sEMG (sEMG with ECG interference). Recently, the score-based diffusion model, a renowned generative model, has been introduced to generate high-quality and accurate samples with noisy input data. In this study, we proposed a novel approach, termed SDEMG, as a score-based diffusion model for sEMG signal denoising. To evaluate the proposed SDEMG approach, we conduct experiments to reduce noise in sEMG signals, employing data from an openly accessible source, the Non-Invasive Adaptive Prosthetics database, along with ECG signals from the MIT-BIH Normal Sinus Rhythm Database. The experiment result indicates that SDEMG outperformed comparative methods and produced high-quality sEMG samples. The source code of SDEMG the framework is available at: https://github.com/tonyliu0910/SDEMG △ Less

Submitted 23 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: This paper is accepted by ICASSP 2024

arXiv:2402.00996 [pdf, other]

mmID: High-Resolution mmWave Imaging for Human Identification

Authors: Sakila S. Jayaweera, Sai Deepika Regani, Yuqian Hu, Beibei Wang, K. J. Ray Liu

Abstract: Achieving accurate human identification through RF imaging has been a persistent challenge, primarily attributed to the limited aperture size and its consequent impact on imaging resolution. The existing imaging solution enables tasks such as pose estimation, activity recognition, and human tracking based on deep neural networks by estimating skeleton joints. In contrast to estimating joints, this… ▽ More Achieving accurate human identification through RF imaging has been a persistent challenge, primarily attributed to the limited aperture size and its consequent impact on imaging resolution. The existing imaging solution enables tasks such as pose estimation, activity recognition, and human tracking based on deep neural networks by estimating skeleton joints. In contrast to estimating joints, this paper proposes to improve imaging resolution by estimating the human figure as a whole using conditional generative adversarial networks (cGAN). In order to reduce training complexity, we use an estimated spatial spectrum using the MUltiple SIgnal Classification (MUSIC) algorithm as input to the cGAN. Our system generates environmentally independent, high-resolution images that can extract unique physical features useful for human identification. We use a simple convolution layers-based classification network to obtain the final identification result. From the experimental results, we show that resolution of the image produced by our trained generator is high enough to enable human identification. Our finding indicates high-resolution accuracy with 5% mean silhouette difference to the Kinect device. Extensive experiments in different environments on multiple testers demonstrate that our system can achieve 93% overall test accuracy in unseen environments for static human target identification. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: This paper was published in the IEEE 9th World Forum on Internet of Things

arXiv:2401.16714 [pdf]

A Point Cloud Enhancement Method for 4D mmWave Radar Imagery

Authors: Qingmian Wan, Hongli Peng, Xing Liao, Kuayue Liu, Junfa Mao

Abstract: A point cloud enhancement method for 4D mmWave radar imagery is proposed in this paper. Based on the patch antenna and MIMO array theories, the MIMO array with small redundancy and high SNR is designed to provide the probability of high angular resolution and detection rate. The antenna array is deployed using a ladder shape in vertical direction to decrease the redundancy and improve the resoluti… ▽ More A point cloud enhancement method for 4D mmWave radar imagery is proposed in this paper. Based on the patch antenna and MIMO array theories, the MIMO array with small redundancy and high SNR is designed to provide the probability of high angular resolution and detection rate. The antenna array is deployed using a ladder shape in vertical direction to decrease the redundancy and improve the resolution in horizontal direction with the constrains of physical factors. Considering the complicated environment of the real world with non-uniform distributed clutters, the dynamic detection method is used to solve the weak target sensing problem. The window size of CFAR detector is assumed variant to be determined using optimization method, making it adaptive to different environments especially when weak targets exist. The angular resolution increase using FT-based DOA method and the designed antenna array is described, which provides the basis of accurate detection and dense point cloud. To verify the performance of the proposed method, experiments of simulations and practical measurements are carried out, whose results show that the accuracy and the point cloud density are improved with comparison of the original manufacturer mmWave radar of TI AWR2243. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2312.12653 [pdf, other]

Diagnosis Of Takotsubo Syndrome By Robust Feature Selection From The Complex Latent Space Of DL-based Segmentation Network

Authors: Fahim Ahmed Zaman, Wahidul Alam, Tarun Kanti Roy, Amanda Chang, Kan Liu, Xiaodong Wu

Abstract: Researchers have shown significant correlations among segmented objects in various medical imaging modalities and disease related pathologies. Several studies showed that using hand crafted features for disease prediction neglects the immense possibility to use latent features from deep learning (DL) models which may reduce the overall accuracy of differential diagnosis. However, directly using cl… ▽ More Researchers have shown significant correlations among segmented objects in various medical imaging modalities and disease related pathologies. Several studies showed that using hand crafted features for disease prediction neglects the immense possibility to use latent features from deep learning (DL) models which may reduce the overall accuracy of differential diagnosis. However, directly using classification or segmentation models on medical to learn latent features opt out robust feature selection and may lead to overfitting. To fill this gap, we propose a novel feature selection technique using the latent space of a segmentation model that can aid diagnosis. We evaluated our method in differentiating a rare cardiac disease: Takotsubo Syndrome (TTS) from the ST elevation myocardial infarction (STEMI) using echocardiogram videos (echo). TTS can mimic clinical features of STEMI in echo and extremely hard to distinguish. Our approach shows promising results in differential diagnosis of TTS with 82% diagnosis accuracy beating the previous state-of-the-art (SOTA) approach. Moreover, the robust feature selection technique using LASSO algorithm shows great potential in reducing the redundant features and creates a robust pipeline for short- and long-term disease prognoses in the downstream analysis. △ Less

Submitted 18 January, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: 5 pages, 3 figures, conference

arXiv:2312.12649 [pdf, other]

Surf-CDM: Score-Based Surface Cold-Diffusion Model For Medical Image Segmentation

Authors: Fahim Ahmed Zaman, Mathews Jacob, Amanda Chang, Kan Liu, Milan Sonka, Xiaodong Wu

Abstract: Diffusion models have shown impressive performance for image generation, often times outperforming other generative models. Since their introduction, researchers have extended the powerful noise-to-image denoising pipeline to discriminative tasks, including image segmentation. In this work we propose a conditional score-based generative modeling framework for medical image segmentation which relie… ▽ More Diffusion models have shown impressive performance for image generation, often times outperforming other generative models. Since their introduction, researchers have extended the powerful noise-to-image denoising pipeline to discriminative tasks, including image segmentation. In this work we propose a conditional score-based generative modeling framework for medical image segmentation which relies on a parametric surface representation for the segmentation masks. The surface re-parameterization allows the direct application of standard diffusion theory, as opposed to when the mask is represented as a binary mask. Moreover, we adapted an extended variant of the diffusion technique known as the "cold-diffusion" where the diffusion model can be constructed with deterministic perturbations instead of Gaussian noise, which facilitates significantly faster convergence in the reverse diffusion. We evaluated our method on the segmentation of the left ventricle from 65 transthoracic echocardiogram videos (2230 echo image frames) and compared its performance to the most popular and widely used image segmentation models. Our proposed model not only outperformed the compared methods in terms of segmentation accuracy, but also showed potential in estimating segmentation uncertainties for further downstream analyses due to its inherent generative nature. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 5 pages, 5 figures, conference

arXiv:2312.08267 [pdf, other]

TABSurfer: a Hybrid Deep Learning Architecture for Subcortical Segmentation

Authors: Aaron Cao, Vishwanatha M. Rao, Kejia Liu, Xinru Liu, Andrew F. Laine, Jia Guo

Abstract: Subcortical segmentation remains challenging despite its important applications in quantitative structural analysis of brain MRI scans. The most accurate method, manual segmentation, is highly labor intensive, so automated tools like FreeSurfer have been adopted to handle this task. However, these traditional pipelines are slow and inefficient for processing large datasets. In this study, we propo… ▽ More Subcortical segmentation remains challenging despite its important applications in quantitative structural analysis of brain MRI scans. The most accurate method, manual segmentation, is highly labor intensive, so automated tools like FreeSurfer have been adopted to handle this task. However, these traditional pipelines are slow and inefficient for processing large datasets. In this study, we propose TABSurfer, a novel 3D patch-based CNN-Transformer hybrid deep learning model designed for superior subcortical segmentation compared to existing state-of-the-art tools. To evaluate, we first demonstrate TABSurfer's consistent performance across various T1w MRI datasets with significantly shorter processing times compared to FreeSurfer. Then, we validate against manual segmentations, where TABSurfer outperforms FreeSurfer based on the manual ground truth. In each test, we also establish TABSurfer's advantage over a leading deep learning benchmark, FastSurferVINN. Together, these studies highlight TABSurfer's utility as a powerful tool for fully automated subcortical segmentation with high fidelity. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 5 pages, 3 figures, 2 tables

arXiv:2311.15221 [pdf, other]

The Local Landscape of Phase Retrieval Under Limited Samples

Authors: Kaizhao Liu, Zihao Wang, Lei Wu

Abstract: In this paper, we provide a fine-grained analysis of the local landscape of phase retrieval under the regime with limited samples. Our aim is to ascertain the minimal sample size necessary to guarantee a benign local landscape surrounding global minima in high dimensions. Let $n$ and $d$ denote the sample size and input dimension, respectively. We first explore the local convexity and establish th… ▽ More In this paper, we provide a fine-grained analysis of the local landscape of phase retrieval under the regime with limited samples. Our aim is to ascertain the minimal sample size necessary to guarantee a benign local landscape surrounding global minima in high dimensions. Let $n$ and $d$ denote the sample size and input dimension, respectively. We first explore the local convexity and establish that when $n=o(d\log d)$, for almost every fixed point in the local ball, the Hessian matrix must have negative eigenvalues as long as $d$ is sufficiently large. Consequently, the local landscape is highly non-convex. We next consider the one-point strong convexity and show that as long as $n=ω(d)$, with high probability, the landscape is one-point strongly convex in the local annulus: $\{w\in\mathbb{R}^d: o_d(1)\leqslant \|w-w^*\|\leqslant c\}$, where $w^*$ is the ground truth and $c$ is an absolute constant. This implies that gradient descent initialized from any point in this domain can converge to an $o_d(1)$-loss solution exponentially fast. Furthermore, we show that when $n=o(d\log d)$, there is a radius of $\widetildeΘ\left(\sqrt{1/d}\right)$ such that one-point convexity breaks in the corresponding smaller local ball. This indicates an impossibility to establish a convergence to exact $w^*$ for gradient descent under limited samples by relying solely on one-point convexity. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 41 pages

arXiv:2311.10568 [pdf, other]

Phase Guided Light Field for Spatial-Depth High Resolution 3D Imaging

Authors: Geyou Zhang, Ce Zhu, Kai Liu, Yipeng Liu

Abstract: On 3D imaging, light field cameras typically are of single shot, and however, they heavily suffer from low spatial resolution and depth accuracy. In this paper, by employing an optical projector to project a group of single high-frequency phase-shifted sinusoid patterns, we propose a phase guided light field algorithm to significantly improve both the spatial and depth resolutions for off-the-shel… ▽ More On 3D imaging, light field cameras typically are of single shot, and however, they heavily suffer from low spatial resolution and depth accuracy. In this paper, by employing an optical projector to project a group of single high-frequency phase-shifted sinusoid patterns, we propose a phase guided light field algorithm to significantly improve both the spatial and depth resolutions for off-the-shelf light field cameras. First, for correcting the axial aberrations caused by the main lens of our light field camera, we propose a deformed cone model to calibrate our structured light field system. Second, over wrapped phases computed from patterned images, we propose a stereo matching algorithm, i.e. phase guided sum of absolute difference, to robustly obtain the correspondence for each pair of neighbored two lenslets. Finally, by introducing a virtual camera according to the basic geometrical optics of light field imaging, we propose a reorganization strategy to reconstruct 3D point clouds with spatial-depth high resolution. Experimental results show that, compared with the state-of-the-art active light field methods, the proposed reconstructs 3D point clouds with a spatial resolution of 1280$\times$720 with factors 10$\times$ increased, while maintaining the same high depth resolution and needing merely a single group of high-frequency patterns. △ Less

Submitted 9 April, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

arXiv:2310.20148 [pdf, other]

Decision-Making for Autonomous Vehicles with Interaction-Aware Behavioral Prediction and Social-Attention Neural Network

Authors: Xiao Li, Kaiwen Liu, H. Eric Tseng, Anouck Girard, Ilya Kolmanovsky

Abstract: Autonomous vehicles need to accomplish their tasks while interacting with human drivers in traffic. It is thus crucial to equip autonomous vehicles with artificial reasoning to better comprehend the intentions of the surrounding traffic, thereby facilitating the accomplishments of the tasks. In this work, we propose a behavioral model that encodes drivers' interacting intentions into latent social… ▽ More Autonomous vehicles need to accomplish their tasks while interacting with human drivers in traffic. It is thus crucial to equip autonomous vehicles with artificial reasoning to better comprehend the intentions of the surrounding traffic, thereby facilitating the accomplishments of the tasks. In this work, we propose a behavioral model that encodes drivers' interacting intentions into latent social-psychological parameters. Leveraging a Bayesian filter, we develop a receding-horizon optimization-based controller for autonomous vehicle decision-making which accounts for the uncertainties in the interacting drivers' intentions. For online deployment, we design a neural network architecture based on the attention mechanism which imitates the behavioral model with online estimated parameter priors. We also propose a decision tree search algorithm to solve the decision-making problem online. The proposed behavioral model is then evaluated in terms of its capabilities for real-world trajectory prediction. We further conduct extensive evaluations of the proposed decision-making module, in forced highway merging scenarios, using both simulated environments and real-world traffic datasets. The results demonstrate that our algorithms can complete the forced merging tasks in various traffic conditions while ensuring driving safety. △ Less

Submitted 31 October, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.16102 [pdf, other]

Learned, Uncertainty-driven Adaptive Acquisition for Photon-Efficient Multiphoton Microscopy

Authors: Cassandra Tong Ye, Jiashu Han, Kunzan Liu, Anastasios Angelopoulos, Linda Griffith, Kristina Monakhova, Sixian You

Abstract: Multiphoton microscopy (MPM) is a powerful imaging tool that has been a critical enabler for live tissue imaging. However, since most multiphoton microscopy platforms rely on point scanning, there is an inherent trade-off between acquisition time, field of view (FOV), phototoxicity, and image quality, often resulting in noisy measurements when fast, large FOV, and/or gentle imaging is needed. Deep… ▽ More Multiphoton microscopy (MPM) is a powerful imaging tool that has been a critical enabler for live tissue imaging. However, since most multiphoton microscopy platforms rely on point scanning, there is an inherent trade-off between acquisition time, field of view (FOV), phototoxicity, and image quality, often resulting in noisy measurements when fast, large FOV, and/or gentle imaging is needed. Deep learning could be used to denoise multiphoton microscopy measurements, but these algorithms can be prone to hallucination, which can be disastrous for medical and scientific applications. We propose a method to simultaneously denoise and predict pixel-wise uncertainty for multiphoton imaging measurements, improving algorithm trustworthiness and providing statistical guarantees for the deep learning predictions. Furthermore, we propose to leverage this learned, pixel-wise uncertainty to drive an adaptive acquisition technique that rescans only the most uncertain regions of a sample. We demonstrate our method on experimental noisy MPM measurements of human endometrium tissues, showing that we can maintain fine features and outperform other denoising methods while predicting uncertainty at each pixel. Finally, with our adaptive acquisition technique, we demonstrate a 120X reduction in acquisition time and total light dose while successfully recovering fine features in the sample. We are the first to demonstrate distribution-free uncertainty quantification for a denoising task with real experimental data and the first to propose adaptive acquisition based on reconstruction uncertainty △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2309.14497 [pdf, other]

Interaction-Aware Decision-Making for Autonomous Vehicles in Forced Merging Scenario Leveraging Social Psychology Factors

Authors: Xiao Li, Kaiwen Liu, H. Eric Tseng, Anouck Girard, Ilya Kolmanovsky

Abstract: Understanding the intention of vehicles in the surrounding traffic is crucial for an autonomous vehicle to successfully accomplish its driving tasks in complex traffic scenarios such as highway forced merging. In this paper, we consider a behavioral model that incorporates both social behaviors and personal objectives of the interacting drivers. Leveraging this model, we develop a receding-horizon… ▽ More Understanding the intention of vehicles in the surrounding traffic is crucial for an autonomous vehicle to successfully accomplish its driving tasks in complex traffic scenarios such as highway forced merging. In this paper, we consider a behavioral model that incorporates both social behaviors and personal objectives of the interacting drivers. Leveraging this model, we develop a receding-horizon control-based decision-making strategy, that estimates online the other drivers' intentions using Bayesian filtering and incorporates predictions of nearby vehicles' behaviors under uncertain intentions. The effectiveness of the proposed decision-making strategy is demonstrated and evaluated based on simulation studies in comparison with a game theoretic controller and a real-world traffic dataset. △ Less

Submitted 25 September, 2023; originally announced September 2023.

arXiv:2308.15943 [pdf]

Improved Ultrasound Attenuation Coefficient Estimation Using Spectral Normalization on Local Interference-Free Single-Scatterer Power Spectrum

Authors: Kun-Lin Liu, Yu-Heng Chen, Chiao-Yin Wang, Po-Hsiang Tsui, Meng-Lin Li

Abstract: Ultrasound attenuation coefficient estimation (ACE) can be utilized to quantify liver fat content, offering significant diagnostic potential in addressing the growing global public health issue of non-alcoholic fatty liver and other chronic liver diseases. Among ACE methods, the reference frequency method (RFM) proposed recently possesses the advantages of being system-independent and not requirin… ▽ More Ultrasound attenuation coefficient estimation (ACE) can be utilized to quantify liver fat content, offering significant diagnostic potential in addressing the growing global public health issue of non-alcoholic fatty liver and other chronic liver diseases. Among ACE methods, the reference frequency method (RFM) proposed recently possesses the advantages of being system-independent and not requiring reference phantom. However, the presence of large oscillations in frequency power ratio decay curves leads to unstable ACE results with RFM, originating from noise as well as constructive and destructive interference in the backscattered signal's power spectrum. To address this issue, we propose an improved RFM version where a single-scatterer power spectrum estimator is incorporated to restore interference free single-scatterer power spectrum, thereby reducing oscillations in the frequency power ratio decay curves and greatly improving the accuracy of ACE. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.02190 [pdf, other]

doi 10.1145/3581783.3611704

Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition

Authors: Jiaxin Ye, Yujie Wei, Xin-Cheng Wen, Chenglong Ma, Zhizhong Huang, Kunhong Liu, Hongming Shan

Abstract: Cross-corpus speech emotion recognition (SER) seeks to generalize the ability of inferring speech emotion from a well-labeled corpus to an unlabeled one, which is a rather challenging task due to the significant discrepancy between two corpora. Existing methods, typically based on unsupervised domain adaptation (UDA), struggle to learn corpus-invariant features by global distribution alignment, bu… ▽ More Cross-corpus speech emotion recognition (SER) seeks to generalize the ability of inferring speech emotion from a well-labeled corpus to an unlabeled one, which is a rather challenging task due to the significant discrepancy between two corpora. Existing methods, typically based on unsupervised domain adaptation (UDA), struggle to learn corpus-invariant features by global distribution alignment, but unfortunately, the resulting features are mixed with corpus-specific features or not class-discriminative. To tackle these challenges, we propose a novel Emotion Decoupling aNd Alignment learning framework (EMO-DNA) for cross-corpus SER, a novel UDA method to learn emotion-relevant corpus-invariant features. The novelties of EMO-DNA are two-fold: contrastive emotion decoupling and dual-level emotion alignment. On one hand, our contrastive emotion decoupling achieves decoupling learning via a contrastive decoupling loss to strengthen the separability of emotion-relevant features from corpus-specific ones. On the other hand, our dual-level emotion alignment introduces an adaptive threshold pseudo-labeling to select confident target samples for class-level alignment, and performs corpus-level alignment to jointly guide model for learning class-discriminative corpus-invariant features across corpora. Extensive experimental results demonstrate the superior performance of EMO-DNA over the state-of-the-art methods in several cross-corpus scenarios. Source code is available at https://github.com/Jiaxin-Ye/Emo-DNA. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: Accepted by ACM MM 2023

arXiv:2307.15510 [pdf, other]

Formation Control for Moving Target Enclosing via Relative Localization

Authors: Xueming Liu, Kunda Liu, Tianjiang Hu, Qingrui Zhang

Abstract: In this paper, we investigate the problem of controlling multiple unmanned aerial vehicles (UAVs) to enclose a moving target in a distributed fashion based on a relative distance and self-displacement measurements. A relative localization technique is developed based on the recursive least square estimation (RLSE) technique with a forgetting factor to estimates both the ``UAV-UAV'' and ``UAV-targe… ▽ More In this paper, we investigate the problem of controlling multiple unmanned aerial vehicles (UAVs) to enclose a moving target in a distributed fashion based on a relative distance and self-displacement measurements. A relative localization technique is developed based on the recursive least square estimation (RLSE) technique with a forgetting factor to estimates both the ``UAV-UAV'' and ``UAV-target'' relative positions. The formation enclosing motion is planned using a coupled oscillator model, which generates desired motion for UAVs to distribute evenly on a circle. The coupled-oscillator-based motion can also facilitate the exponential convergence of relative localization due to its persistent excitation nature. Based on the generation strategy of desired formation pattern and relative localization estimates, a cooperative formation tracking control scheme is proposed, which enables the formation geometric center to asymptotically converge to the moving target. The asymptotic convergence performance is analyzed theoretically for both the relative localization technique and the formation control algorithm. Numerical simulations are provided to show the efficiency of the proposed algorithm. Experiments with three quadrotors tracking one target are conducted to evaluate the proposed target enclosing method in real platforms. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 8 Pages, accepted by IEEE CDC 2023

arXiv:2307.09714 [pdf]

Flexible single multimode fiber imaging using white LED

Authors: Minyu Fan, Kun Liu, Jie Zhu, Yu Cao, Sha Wang

Abstract: Multimode fiber (MMF) has been proven to have good potential in imaging and optical communication because of its advantages of small diameter and large mode numbers. However, due to the mode coupling and modal dispersion, it is very sensitive to environmental changes. Minor changes in the fiber shape can lead to difficulties in information reconstruction. Here, white LED and cascaded Unet are used… ▽ More Multimode fiber (MMF) has been proven to have good potential in imaging and optical communication because of its advantages of small diameter and large mode numbers. However, due to the mode coupling and modal dispersion, it is very sensitive to environmental changes. Minor changes in the fiber shape can lead to difficulties in information reconstruction. Here, white LED and cascaded Unet are used to achieve MMF imaging to eliminate the effect of fiber perturbations. The output speckle patterns in three different color channels of the CCD camera produced by transferring images through the MMF are concatenated and inputted into the cascaded Unet using channel stitching technology to improve the reconstruction effects. The average Pearson correlation coefficient (PCC) of the reconstructed images from the Fashion-MINIST dataset is 0.83. In order to check the flexibility of such a system, perturbation tests on the image reconstruction capability by changing the fiber shapes are conducted. The experimental results show that the MMF imaging system has good robustness properties, i. e. the average PCC remains 0.83 even after completely changing the shape of the MMF. This research potentially provides a flexible approach for the practical application of MMF imaging. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2305.00216 [pdf, other]

Physics-Guided Graph Neural Networks for Real-time AC/DC Power Flow Analysis

Authors: Mei Yang, Gao Qiu, Yong Wu, Junyong Liu, Nina Dai, Yue Shui, Kai Liu, Lijie Ding

Abstract: The increasing scale of alternating current and direct current (AC/DC) hybrid systems necessitates a faster power flow analysis tool than ever. This letter thus proposes a specific physics-guided graph neural network (PG-GNN). The tailored graph modelling of AC and DC grids is firstly advanced to enhance the topology adaptability of the PG-GNN. To eschew unreliable experience emulation from data,… ▽ More The increasing scale of alternating current and direct current (AC/DC) hybrid systems necessitates a faster power flow analysis tool than ever. This letter thus proposes a specific physics-guided graph neural network (PG-GNN). The tailored graph modelling of AC and DC grids is firstly advanced to enhance the topology adaptability of the PG-GNN. To eschew unreliable experience emulation from data, AC/DC physics are embedded in the PG-GNN using duality. Augmented Lagrangian method-based learning scheme is then presented to help the PG-GNN better learn nonconvex patterns in an unsupervised label-free manner. Multi-PG-GNN is finally conducted to master varied DC control modes. Case study shows that, relative to the other 7 data-driven rivals, only the proposed method matches the performance of the model-based benchmark, also beats it in computational efficiency beyond 10 times. △ Less

Submitted 29 April, 2023; originally announced May 2023.

arXiv:2304.07813 [pdf, other]

Deep Reinforcement Learning-Assisted Age-optimal Transmission Policy for HARQ-aided NOMA Networks

Authors: Kunpeng Liu, Aimin Li, Shaohua Wu

Abstract: The recent interweaving of AI-6G technologies has sparked extensive research interest in further enhancing reliable and timely communications. \emph{Age of Information} (AoI), as a novel and integrated metric implying the intricate trade-offs among reliability, latency, and update frequency, has been well-researched since its conception. This paper contributes new results in this area by employing… ▽ More The recent interweaving of AI-6G technologies has sparked extensive research interest in further enhancing reliable and timely communications. \emph{Age of Information} (AoI), as a novel and integrated metric implying the intricate trade-offs among reliability, latency, and update frequency, has been well-researched since its conception. This paper contributes new results in this area by employing a Deep Reinforcement Learning (DRL) approach to intelligently decide how to allocate power resources and when to retransmit in a \emph{freshness-sensitive} downlink multi-user Hybrid Automatic Repeat reQuest with Chase Combining (HARQ-CC) aided Non-Orthogonal Multiple Access (NOMA) network. Specifically, an AoI minimization problem is formulated as a Markov Decision Process (MDP) problem. Then, to achieve deterministic, age-optimal, and intelligent power allocations and retransmission decisions, the Double-Dueling-Deep Q Network (DQN) is adopted. Furthermore, a more flexible retransmission scheme, referred to as Retransmit-At-Will scheme, is proposed to further facilitate the timeliness of the HARQ-aided NOMA network. Simulation results verify the superiority of the proposed intelligent scheme and demonstrate the threshold structure of the retransmission policy. Also, answers to whether user pairing is necessary are discussed by extensive simulation results. △ Less

Submitted 16 April, 2023; originally announced April 2023.

arXiv:2304.06335 [pdf]

Deep Learning-based Fall Detection Algorithm Using Ensemble Model of Coarse-fine CNN and GRU Networks

Authors: Chien-Pin Liu, Ju-Hsuan Li, En-** Chu, Chia-Yeh Hsieh, Kai-Chun Liu, Chia-Tai Chan, Yu Tsao

Abstract: Falls are the public health issue for the elderly all over the world since the fall-induced injuries are associated with a large amount of healthcare cost. Falls can cause serious injuries, even leading to death if the elderly suffers a "long-lie". Hence, a reliable fall detection (FD) system is required to provide an emergency alarm for first aid. Due to the advances in wearable device technology… ▽ More Falls are the public health issue for the elderly all over the world since the fall-induced injuries are associated with a large amount of healthcare cost. Falls can cause serious injuries, even leading to death if the elderly suffers a "long-lie". Hence, a reliable fall detection (FD) system is required to provide an emergency alarm for first aid. Due to the advances in wearable device technology and artificial intelligence, some fall detection systems have been developed using machine learning and deep learning methods to analyze the signal collected from accelerometer and gyroscopes. In order to achieve better fall detection performance, an ensemble model that combines a coarse-fine convolutional neural network and gated recurrent unit is proposed in this study. The parallel structure design used in this model restores the different grains of spatial characteristics and capture temporal dependencies for feature representation. This study applies the FallAllD public dataset to validate the reliability of the proposed model, which achieves a recall, precision, and F-score of 92.54%, 96.13%, and 94.26%, respectively. The results demonstrate the reliability of the proposed ensemble model in discriminating falls from daily living activities and its superior performance compared to the state-of-the-art convolutional neural network long short-term memory (CNN-LSTM) for FD. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2304.05685 [pdf]

Multisensor fusion-based digital twin in additive manufacturing for in-situ quality monitoring and defect correction

Authors: Lequn Chen, Xiling Yao, Kui Liu, Chaolin Tan, Seung Ki Moon

Abstract: Early detection and correction of defects are critical in additive manufacturing (AM) to avoid build failures. In this paper, we present a multisensor fusion-based digital twin for in-situ quality monitoring and defect correction in a robotic laser direct energy deposition process. Multisensor fusion sources consist of an acoustic sensor, an infrared thermal camera, a coaxial vision camera, and a… ▽ More Early detection and correction of defects are critical in additive manufacturing (AM) to avoid build failures. In this paper, we present a multisensor fusion-based digital twin for in-situ quality monitoring and defect correction in a robotic laser direct energy deposition process. Multisensor fusion sources consist of an acoustic sensor, an infrared thermal camera, a coaxial vision camera, and a laser line scanner. The key novelty and contribution of this work are to develop a spatiotemporal data fusion method that synchronizes and registers the multisensor features within the part's 3D volume. The fused dataset can be used to predict location-specific quality using machine learning. On-the-fly identification of regions requiring material addition or removal is feasible. Robot toolpath and auto-tuned process parameters are generated for defecting correction. In contrast to traditional single-sensor-based monitoring, multisensor fusion allows for a more in-depth understanding of underlying process physics, such as pore formation and laser-material interactions. The proposed methods pave the way for self-adaptation AM with higher efficiency, less waste, and cleaner production. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: 11 pages, 9 figures. Accepted at 24th International Conference on Engineering Design (ICED23)

arXiv:2303.13243 [pdf, other]

Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition

Authors: Kai Liu, Hailiang Xiong, Gangqiang Yang, Zhengfeng Du, Yewen Cao, Danyal Shah

Abstract: As one of the major branches of automatic speech recognition, attention-based models greatly improves the feature representation ability of the model. In particular, the multi-head mechanism is employed in the attention, ho** to learn speech features of more aspects in different attention subspaces. For speech recognition of complex languages, on the one hand, a small head size will lead to an o… ▽ More As one of the major branches of automatic speech recognition, attention-based models greatly improves the feature representation ability of the model. In particular, the multi-head mechanism is employed in the attention, ho** to learn speech features of more aspects in different attention subspaces. For speech recognition of complex languages, on the one hand, a small head size will lead to an obvious shortage of learnable aspects. On the other hand, we need to reduce the dimension of each subspace to keep the size of the overall feature space unchanged when we increase the number of heads, which will significantly weaken the ability to represent the feature of each subspace. Therefore, this paper explores how to use a small attention subspace to represent complete speech features while ensuring many heads. In this work we propose a novel neural network architecture, namely, pyramid multi-branch fusion DCNN with multi-head self-attention. The proposed architecture is inspired by Dilated Convolution Neural Networks (DCNN), it uses multiple branches with DCNN to extract the feature of the input speech under different receptive fields. To reduce the number of parameters, every two branches are merged until all the branches are merged into one. Thus, its shape is like a pyramid rotated 90 degrees. We demonstrate that on Aishell-1, a widely used Mandarin speech dataset, our model achieves a character error rate (CER) of 6.45% on the test sets. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.06597 [pdf, ps, other]

doi 10.1109/TCCN.2023.3306852

Non-Orthogonal Multiple Access Enhanced Multi-User Semantic Communication

Authors: Weizhi Li, Haotai Liang, Chen Dong, Xiaodong Xu, ** Zhang, Kaijun Liu

Abstract: Semantic communication serves as a novel paradigm and attracts the broad interest of researchers. One critical aspect of it is the multi-user semantic communication theory, which can further promote its application to the practical network environment. While most existing works focused on the design of end-to-end single-user semantic transmission, a novel non-orthogonal multiple access (NOMA)-base… ▽ More Semantic communication serves as a novel paradigm and attracts the broad interest of researchers. One critical aspect of it is the multi-user semantic communication theory, which can further promote its application to the practical network environment. While most existing works focused on the design of end-to-end single-user semantic transmission, a novel non-orthogonal multiple access (NOMA)-based multi-user semantic communication system named NOMASC is proposed in this paper. The proposed system can support semantic tranmission of multiple users with diverse modalities of source information. To avoid high demand for hardware, an asymmetric quantizer is employed at the end of the semantic encoder for discretizing the continuous full-resolution semantic feature. In addition, a neural network model is proposed for map** the discrete feature into self-learned symbols and accomplishing intelligent multi-user detection (MUD) at the receiver. Simulation results demonstrate that the proposed system holds good performance in non-orthogonal transmission of multiple user signals and outperforms the other methods, especially at low-to-medium SNRs. Moreover, it has high robustness under various simulation settings and mismatched test scenarios. △ Less

Submitted 20 November, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

Comments: accepted by IEEE Transactions on Cognitive Communications and Networking

arXiv:2303.05023 [pdf, other]

X-SepFormer: End-to-end Speaker Extraction Network with Explicit Optimization on Speaker Confusion

Authors: Kai Liu, Ziqing Du, Xucheng Wan, Huan Zhou

Abstract: Target speech extraction (TSE) systems are designed to extract target speech from a multi-talker mixture. The popular training objective for most prior TSE networks is to enhance reconstruction performance of extracted speech waveform. However, it has been reported that a TSE system delivers high reconstruction performance may still suffer low-quality experience problems in practice. One such expe… ▽ More Target speech extraction (TSE) systems are designed to extract target speech from a multi-talker mixture. The popular training objective for most prior TSE networks is to enhance reconstruction performance of extracted speech waveform. However, it has been reported that a TSE system delivers high reconstruction performance may still suffer low-quality experience problems in practice. One such experience problem is wrong speaker extraction (called speaker confusion, SC), which leads to strong negative experience and hampers effective conversations. To mitigate the imperative SC issue, we reformulate the training objective and propose two novel loss schemes that explore the metric of reconstruction improvement performance defined at small chunk-level and leverage the metric associated distribution information. Both loss schemes aim to encourage a TSE network to pay attention to those SC chunks based on the said distribution information. On this basis, we present X-SepFormer, an end-to-end TSE model with proposed loss schemes and a backbone of SepFormer. Experimental results on the benchmark WSJ0-2mix dataset validate the effectiveness of our proposals, showing consistent improvements on SC errors (by 14.8% relative). Moreover, with SI-SDRi of 19.4 dB and PESQ of 3.81, our best system significantly outperforms the current SOTA systems and offers the top TSE results reported till date on the WSJ0-2mix. △ Less

Submitted 8 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2303.03634 [pdf]

PreFallKD: Pre-Impact Fall Detection via CNN-ViT Knowledge Distillation

Authors: Tin-Han Chi, Kai-Chun Liu, Chia-Yeh Hsieh, Yu Tsao, Chia-Tai Chan

Abstract: Fall accidents are critical issues in an aging and aged society. Recently, many researchers developed pre-impact fall detection systems using deep learning to support wearable-based fall protection systems for preventing severe injuries. However, most works only employed simple neural network models instead of complex models considering the usability in resource-constrained mobile devices and stri… ▽ More Fall accidents are critical issues in an aging and aged society. Recently, many researchers developed pre-impact fall detection systems using deep learning to support wearable-based fall protection systems for preventing severe injuries. However, most works only employed simple neural network models instead of complex models considering the usability in resource-constrained mobile devices and strict latency requirements. In this work, we propose a novel pre-impact fall detection via CNN-ViT knowledge distillation, namely PreFallKD, to strike a balance between detection performance and computational complexity. The proposed PreFallKD transfers the detection knowledge from the pre-trained teacher model (vision transformer) to the student model (lightweight convolutional neural networks). Additionally, we apply data augmentation techniques to tackle issues of data imbalance. We conduct the experiment on the KFall public dataset and compare PreFallKD with other state-of-the-art models. The experiment results show that PreFallKD could boost the student model during the testing phase and achieves reliable F1-score (92.66%) and lead time (551.3 ms). △ Less

Submitted 28 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

arXiv:2302.06727 [pdf, other]

doi 10.1038/s41598-024-54251-1

Deep Learning Predicts Prevalent and Incident Parkinson's Disease From UK Biobank Fundus Imaging

Authors: Charlie Tran, Kai Shen, Kang Liu, Akshay Ashok, Adolfo Ramirez-Zamora, **ghua Chen, Yulin Li, Ruogu Fang

Abstract: Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable scr… ▽ More Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable screening should be diagnostically accurate even before the onset of symptoms to allow medical interventions. We highlight retinal fundus imaging, often termed a window to the brain, as a diagnostic screening modality for Parkinson's disease. We conducted a systematic evaluation of conventional machine learning and deep learning techniques to classify Parkinson's disease from UK Biobank fundus imaging. Our results show that Parkinson's disease individuals can be differentiated from age and gender-matched healthy subjects with an Area Under the Curve (AUC) of 0.77. This accuracy is maintained when predicting either prevalent or incident Parkinson's disease. Explainability and trustworthiness are enhanced by visual attribution maps of localized biomarkers and quantified metrics of model robustness to data perturbations. △ Less

Submitted 18 February, 2024; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: 17 pages, 4 figures, 2 tables, 4 supplementary tables

arXiv:2301.06277 [pdf, ps, other]

Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings

Authors: Kai Liu, Xucheng Wan, Ziqing Du, Huan Zhou

Abstract: As a practical alternative of speech separation, target speaker extraction (TSE) aims to extract the speech from the desired speaker using additional speaker cue extracted from the speaker. Its main challenge lies in how to properly extract and leverage the speaker cue to benefit the extracted speech quality. The cue extraction method adopted in majority existing TSE studies is to directly utilize… ▽ More As a practical alternative of speech separation, target speaker extraction (TSE) aims to extract the speech from the desired speaker using additional speaker cue extracted from the speaker. Its main challenge lies in how to properly extract and leverage the speaker cue to benefit the extracted speech quality. The cue extraction method adopted in majority existing TSE studies is to directly utilize discriminative speaker embedding, which is extracted from the pre-trained models for speaker verification. Although the high speaker discriminability is a most desirable property for speaker verification task, we argue that it may be too sophisticated for TSE. In this study, we propose that a simplified speaker cue with clear class separability might be preferred for TSE. To verify our proposal, we introduce several forms of speaker cues, including naive speaker embedding (such as, x-vector and xi-vector) and new speaker embeddings produced from sparse LDA-transform. Corresponding TSE models are built by integrating these speaker cues with SepFormer (one SOTA speech separation model). Performances of these TSE models are examined on the benchmark WSJ0-2mix dataset. Experimental results validate the effectiveness and generalizability of our proposal, showing up to 9.9% relative improvement in SI-SDRi. Moreover, with SI-SDRi of 19.4 dB and PESQ of 3.78, our best TSE system significantly outperforms the current SOTA systems and offers the top TSE results reported till date on the WSJ0-2mix. △ Less

Submitted 16 January, 2023; originally announced January 2023.

Comments: ACCEPTED by NCMMSC 2022

arXiv:2301.02383 [pdf, other]

Deep Biological Pathway Informed Pathology-Genomic Multimodal Survival Prediction

Authors: Lin Qiu, Aminollah Khormali, Kai Liu

Abstract: The integration of multi-modal data, such as pathological images and genomic data, is essential for understanding cancer heterogeneity and complexity for personalized treatments, as well as for enhancing survival predictions. Despite the progress made in integrating pathology and genomic data, most existing methods cannot mine the complex inter-modality relations thoroughly. Additionally, identify… ▽ More The integration of multi-modal data, such as pathological images and genomic data, is essential for understanding cancer heterogeneity and complexity for personalized treatments, as well as for enhancing survival predictions. Despite the progress made in integrating pathology and genomic data, most existing methods cannot mine the complex inter-modality relations thoroughly. Additionally, identifying explainable features from these models that govern preclinical discovery and clinical prediction is crucial for cancer diagnosis, prognosis, and therapeutic response studies. We propose PONET- a novel biological pathway-informed pathology-genomic deep model that integrates pathological images and genomic data not only to improve survival prediction but also to identify genes and pathways that cause different survival rates in patients. Empirical results on six of The Cancer Genome Atlas (TCGA) datasets show that our proposed method achieves superior predictive performance and reveals meaningful biological interpretations. The proposed method establishes insight into how to train biologically informed deep networks on multimodal biomedical data which will have general applicability for understanding diseases and predicting response and resistance to treatment. △ Less

Submitted 6 January, 2023; originally announced January 2023.

arXiv:2301.00553 [pdf, other]

Lightweight Image Inpainting by Stripe Window Transformer with Joint Attention to CNN

Authors: Tsung-Jung Liu, Bo-Wei Chen, Kuan-Hsien Liu

Abstract: Image inpainting is an important task in computer vision. As admirable methods are presented, the inpainted image is getting closer to reality. However, the result is still not good enough in the reconstructed texture and structure based on human vision. Although recent advances in computer hardware have enabled the development of larger and more complex models, there is still a need for lightweig… ▽ More Image inpainting is an important task in computer vision. As admirable methods are presented, the inpainted image is getting closer to reality. However, the result is still not good enough in the reconstructed texture and structure based on human vision. Although recent advances in computer hardware have enabled the development of larger and more complex models, there is still a need for lightweight models that can be used by individuals and small-sized institutions. Therefore, we propose a lightweight model that combines a specialized transformer with a traditional convolutional neural network (CNN). Furthermore, we have noticed most researchers only consider three primary colors (RGB) in inpainted images, but we think this is not enough. So we propose a new loss function to intensify color details. Extensive experiments on commonly seen datasets (Places2 and CelebA) validate the efficacy of our proposed model compared with other state-of-the-art methods. Index Terms: HSV color space, image inpainting, joint attention, stripe window, transformer △ Less

Submitted 23 August, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

Comments: 6 pages and 5 images, contributions to MLSP 2023

arXiv:2212.03515 [pdf, other]

FPGA Implementation of Multi-Layer Machine Learning Equalizer with On-Chip Training

Authors: Keren Liu, Erik Börjeson, Christian Häger, Per Larsson-Edefors

Abstract: We design and implement an adaptive machine learning equalizer that alternates multiple linear and nonlinear computational layers on an FPGA. On-chip training via gradient backpropagation is shown to allow for real-time adaptation to time-varying channel impairments. We design and implement an adaptive machine learning equalizer that alternates multiple linear and nonlinear computational layers on an FPGA. On-chip training via gradient backpropagation is shown to allow for real-time adaptation to time-varying channel impairments. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: To be presented at the 2023 Optical Fiber Communication Conference (OFC)

arXiv:2211.10658 [pdf, other]

EDGE: Editable Dance Generation From Music

Authors: Jonathan Tseng, Rodrigo Castellon, C. Karen Liu

Abstract: Dance is an important human art form, but creating new dances can be difficult and time-consuming. In this work, we introduce Editable Dance GEneration (EDGE), a state-of-the-art method for editable dance generation that is capable of creating realistic, physically-plausible dances while remaining faithful to the input music. EDGE uses a transformer-based diffusion model paired with Jukebox, a str… ▽ More Dance is an important human art form, but creating new dances can be difficult and time-consuming. In this work, we introduce Editable Dance GEneration (EDGE), a state-of-the-art method for editable dance generation that is capable of creating realistic, physically-plausible dances while remaining faithful to the input music. EDGE uses a transformer-based diffusion model paired with Jukebox, a strong music feature extractor, and confers powerful editing capabilities well-suited to dance, including joint-wise conditioning, and in-betweening. We introduce a new metric for physical plausibility, and evaluate dance quality generated by our method extensively through (1) multiple quantitative metrics on physical plausibility, beat alignment, and diversity benchmarks, and more importantly, (2) a large-scale user study, demonstrating a significant improvement over previous state-of-the-art methods. Qualitative samples from our model can be found at our website. △ Less

Submitted 27 November, 2022; v1 submitted 19 November, 2022; originally announced November 2022.

Comments: Project website: https://edge-dance.github.io

arXiv:2211.08233 [pdf, other]

doi 10.1109/ICASSP49357.2023.10096370

Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition

Authors: Jiaxin Ye, Xin-cheng Wen, Yujie Wei, Yong Xu, Kunhong Liu, Hongming Shan

Abstract: Speech emotion recognition (SER) plays a vital role in improving the interactions between humans and machines by inferring human emotion and affective states from speech signals. Whereas recent works primarily focus on mining spatiotemporal information from hand-crafted features, we explore how to model the temporal patterns of speech emotions from dynamic temporal scales. Towards that goal, we in… ▽ More Speech emotion recognition (SER) plays a vital role in improving the interactions between humans and machines by inferring human emotion and affective states from speech signals. Whereas recent works primarily focus on mining spatiotemporal information from hand-crafted features, we explore how to model the temporal patterns of speech emotions from dynamic temporal scales. Towards that goal, we introduce a novel temporal emotional modeling approach for SER, termed Temporal-aware bI-direction Multi-scale Network (TIM-Net), which learns multi-scale contextual affective representations from various time scales. Specifically, TIM-Net first employs temporal-aware blocks to learn temporal affective representation, then integrates complementary information from the past and the future to enrich contextual representations, and finally, fuses multiple time scale features for better adaptation to the emotional variation. Extensive experimental results on six benchmark SER datasets demonstrate the superior performance of TIM-Net, gaining 2.34% and 2.61% improvements of the average UAR and WAR over the second-best on each corpus. The source code is available at https://github.com/Jiaxin-Ye/TIM-Net_SER. △ Less

Submitted 14 August, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: ICASSP 2023

Journal ref: IEEE ICASSP 2023

arXiv:2210.17386 [pdf, other]

Cooperative Sensing and Uploading for Quality-Cost Tradeoff of Digital Twins in VEC

Authors: Kai Liu, Xincao Xu, Penglin Dai, Biwen Chen

Abstract: Recent advances in sensing technologies, wireless communications, and computing paradigms drive the evolution of vehicles in becoming an intelligent and electronic consumer products. This paper investigates enabling digital twins in vehicular edge computing (DT-VEC) via cooperative sensing and uploading, and makes the first attempt to achieve the quality-cost tradeoff in DT-VEC. First, a DT-VEC ar… ▽ More Recent advances in sensing technologies, wireless communications, and computing paradigms drive the evolution of vehicles in becoming an intelligent and electronic consumer products. This paper investigates enabling digital twins in vehicular edge computing (DT-VEC) via cooperative sensing and uploading, and makes the first attempt to achieve the quality-cost tradeoff in DT-VEC. First, a DT-VEC architecture is presented, where the heterogeneous information can be sensed by vehicles and uploaded to the edge node via vehicle-to-infrastructure (V2I) communications. The digital twins are modeled based on the sensed information, which are utilized to from the logical view to reflect the real-time status of the physical vehicular environment. Second, we derive the cooperative sensing model and the V2I uploading model by considering the timeliness and consistency of digital twins, and the redundancy, sensing cost, and transmission cost. On this basis, a bi-objective problem is formulated to maximize the system quality and minimize the system cost. Third, we propose a solution based on multi-agent multi-objective (MAMO) deep reinforcement learning, where a dueling critic network is proposed to evaluate the agent action based on the value of state and the advantage of action. Finally, we give a comprehensive performance evaluation, demonstrating the superiority of MAMO. △ Less

Submitted 27 January, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: arXiv admin note: text overlap with arXiv:2209.12265

arXiv:2210.15834 [pdf, other]

doi 10.1016/j.specom.2022.07.005

GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition

Authors: Jia-Xin Ye, Xin-Cheng Wen, Xuan-Ze Wang, Yong Xu, Yan Luo, Chang-Li Wu, Li-Yan Chen, Kun-Hong Liu

Abstract: In human-computer interaction, Speech Emotion Recognition (SER) plays an essential role in understanding the user's intent and improving the interactive experience. While similar sentimental speeches own diverse speaker characteristics but share common antecedents and consequences, an essential challenge for SER is how to produce robust and discriminative representations through causality between… ▽ More In human-computer interaction, Speech Emotion Recognition (SER) plays an essential role in understanding the user's intent and improving the interactive experience. While similar sentimental speeches own diverse speaker characteristics but share common antecedents and consequences, an essential challenge for SER is how to produce robust and discriminative representations through causality between speech emotions. In this paper, we propose a Gated Multi-scale Temporal Convolutional Network (GM-TCNet) to construct a novel emotional causality representation learning component with a multi-scale receptive field. GM-TCNet deploys a novel emotional causality representation learning component to capture the dynamics of emotion across the time domain, constructed with dilated causal convolution layer and gating mechanism. Besides, it utilizes skip connection fusing high-level features from different gated convolution blocks to capture abundant and subtle emotion changes in human speech. GM-TCNet first uses a single type of feature, mel-frequency cepstral coefficients, as inputs and then passes them through the gated temporal convolutional module to generate the high-level features. Finally, the features are fed to the emotion classifier to accomplish the SER task. The experimental results show that our model maintains the highest performance in most cases compared to state-of-the-art techniques. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: The source code is available at: https://github.com/Jiaxin-Ye/GM-TCNet

Journal ref: speech communication, 145, November 2022, 21-35

arXiv:2210.13271 [pdf, other]

ECG Artifact Removal from Single-Channel Surface EMG Using Fully Convolutional Networks

Authors: Kuan-Chen Wang, Kai-Chun Liu, Sheng-Yu Peng, Yu Tsao

Abstract: Electrocardiogram (ECG) artifact contamination often occurs in surface electromyography (sEMG) applications when the measured muscles are in proximity to the heart. Previous studies have developed and proposed various methods, such as high-pass filtering, template subtraction and so forth. However, these methods remain limited by the requirement of reference signals and distortion of original sEMG… ▽ More Electrocardiogram (ECG) artifact contamination often occurs in surface electromyography (sEMG) applications when the measured muscles are in proximity to the heart. Previous studies have developed and proposed various methods, such as high-pass filtering, template subtraction and so forth. However, these methods remain limited by the requirement of reference signals and distortion of original sEMG. This study proposed a novel denoising method to eliminate ECG artifacts from the single-channel sEMG signals using fully convolutional networks (FCN). The proposed method adopts a denoise autoencoder structure and powerful nonlinear map** capability of neural networks for sEMG denoising. We compared the proposed approach with conventional approaches, including high-pass filters and template subtraction, on open datasets called the Non-Invasive Adaptive Prosthetics database and MIT-BIH normal sinus rhythm database. The experimental results demonstrate that the FCN outperforms conventional methods in sEMG reconstruction quality under a wide range of signal-to-noise ratio inputs. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 5 pages, 5 figures

Showing 1–50 of 131 results for author: Liu, K