Search | arXiv e-print repository

arXiv:2406.15222 [pdf]

Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed as having other acute chest pain conditions. Subsequently, these AAS patients will undergo clinically inaccurate or suboptimal differential diagnosis. Fortunately, even under these suboptimal protocols, nearly all these patients underwent non-contrast CT covering the aorta anatomy at the early stage of differential diagnosis. In this study, we developed an artificial intelligence model (DeepAAS) using non-contrast CT, which is highly accurate for identifying AAS and provides interpretable results to assist in clinical decision-making. Performance was assessed in two major phases: a multi-center retrospective study (n = 20,750) and an exploration in real-world emergency scenarios (n = 137,525). In the multi-center cohort, DeepAAS achieved a mean area under the receiver operating characteristic curve of 0.958 (95% CI 0.950-0.967). In the real-world cohort, DeepAAS detected 109 AAS patients with misguided initial suspicion, achieving 92.6% (95% CI 76.2%-97.5%) in mean sensitivity and 99.2% (95% CI 99.1%-99.3%) in mean specificity. Our AI model performed well on non-contrast CT at all applicable early stages of differential diagnosis workflows, effectively reduced the overall missed diagnosis and misdiagnosis rate from 48.8% to 4.8% and shortened the diagnosis time for patients with misguided initial suspicion from an average of 681.8 (74-11,820) mins to 68.5 (23-195) mins. DeepAAS could effectively fill the gap in the current clinical workflow without requiring additional tests. △ Less

Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: under peer review

arXiv:2405.09298 [pdf]

Deep Blur Multi-Model (DeepBlurMM) -- a strategy to mitigate the impact of image blur on deep learning model performance in histopathology image analysis

Authors: Yujie Xiang, Bo**g Liu, Mattias Rantalainen

Abstract: AI-based analysis of histopathology whole slide images (WSIs) is central in computational pathology. However, image quality, including unsharp areas of WSIs, impacts model performance. We investigate the impact of blur and propose a multi-model approach to mitigate negative impact of unsharp image areas. In this study, we use a simulation approach, evaluating model performance under varying levels… ▽ More AI-based analysis of histopathology whole slide images (WSIs) is central in computational pathology. However, image quality, including unsharp areas of WSIs, impacts model performance. We investigate the impact of blur and propose a multi-model approach to mitigate negative impact of unsharp image areas. In this study, we use a simulation approach, evaluating model performance under varying levels of added Gaussian blur to image tiles from >900 H&E-stained breast cancer WSIs. To reduce impact of blur, we propose a novel multi-model approach (DeepBlurMM) where multiple models trained on data with variable amounts of Gaussian blur are used to predict tiles based on their blur levels. Using histological grade as a principal example, we found that models trained with mildly blurred tiles improved performance over the base model when moderate-high blur was present. DeepBlurMM outperformed the base model in presence of moderate blur across all tiles (AUC:0.764 vs. 0.710), and in presence of a mix of low, moderate, and high blur across tiles (AUC:0.821 vs. 0.789). Unsharp image tiles in WSIs impact prediction performance. DeepBlurMM improved prediction performance under some conditions and has the potential to increase quality in both research and clinical applications. △ Less

Submitted 23 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

ACM Class: I.4; J.3

arXiv:2405.05498 [pdf, other]

The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge

Authors: **gguang Tian, Shuaishuai Ye, Shunfei Chen, Yang Xiang, Zhaohui Yin, Xinhui Hu, Xinkang Xu

Abstract: This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on t… ▽ More This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on the development set. For speech recognition, we utilize self-supervised learning representations to train end-to-end ASR models. By integrating these models, we achieve a character error rate (CER) of 16.93\% on the track 1 evaluation set, and a concatenated minimum permutation character error rate (cpCER) of 25.88\% on the track 2 evaluation set. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2404.09307 [pdf, other]

doi 10.1109/TSMC.2024.3379408

Cost-effective company response policy for product co-creation in company-sponsored online community

Authors: Jiamin Hu, Lu-Xing Yang, Xiaofan Yang, Kaifan Huang, Gang Li, Yong Xiang

Abstract: Product co-creation based on company-sponsored online community has come to be a paradigm of develo** new products collaboratively with customers. In such a product co-creation campaign, the sponsoring company needs to interact intensively with active community members about the design scheme of the product. We call the collection of the rates of the company's response to active community member… ▽ More Product co-creation based on company-sponsored online community has come to be a paradigm of develo** new products collaboratively with customers. In such a product co-creation campaign, the sponsoring company needs to interact intensively with active community members about the design scheme of the product. We call the collection of the rates of the company's response to active community members at all time in the co-creation campaign as a company response policy (CRP). This paper addresses the problem of finding a cost-effective CRP (the CRP problem). First, we introduce a novel community state evolutionary model and, thereby, establish an optimal control model for the CRP problem (the CRP model). Second, based on the optimality system for the CRP model, we present an iterative algorithm for solving the CRP model (the CRP algorithm). Thirdly, through extensive numerical experiments, we conclude that the CRP algorithm converges and the resulting CRP exhibits excellent cost benefit. Consequently, we recommend the resulting CRP to companies that embrace product co-creation. Next, we discuss how to implement the resulting CRP. Finally, we investigate the effect of some factors on the cost benefit of the resulting CRP. To our knowledge, this work is the first attempt to study value co-creation through optimal control theoretic approach. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.06452 [pdf, other]

PAAM: A Framework for Coordinated and Priority-Driven Accelerator Management in ROS 2

Authors: Daniel Enright, Yecheng Xiang, Hyunjong Choi, Hyoseung Kim

Abstract: This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor th… ▽ More This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor that acts as an accelerator resource server, arbitrating accelerator access requests from all other callbacks at the application layer. This approach enables coordinated and priority-driven accelerator access management in multi-process robotic systems. The framework design is directly applicable to all types of accelerators and enables granular control over how specific chains access accelerators, making it possible to achieve predictable real-time support for accelerators used by safety-critical callback chains without making changes to underlying accelerator device drivers. The paper shows that PAAM also offers a theoretical analysis that can upper bound the worst-case response time of safety-critical callback chains that necessitate accelerator access. This paper also demonstrates that complex robotic systems with extensive accelerator usage that are integrated with PAAM may achieve up to a 91\% reduction in end-to-end response time of their critical callback chains. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 14 Pages, 14 Figures

arXiv:2401.05437 [pdf, other]

Representation Learning for Wearable-Based Applications in the Case of Missing Data

Authors: Janosch Jungo, Yutong Xiang, Shkurta Gashi, Christian Holz

Abstract: Wearable devices continuously collect sensor data and use it to infer an individual's behavior, such as sleep, physical activity, and emotions. Despite the significant interest and advancements in this field, modeling multimodal sensor data in real-world environments is still challenging due to low data quality and limited data annotations. In this work, we investigate representation learning for… ▽ More Wearable devices continuously collect sensor data and use it to infer an individual's behavior, such as sleep, physical activity, and emotions. Despite the significant interest and advancements in this field, modeling multimodal sensor data in real-world environments is still challenging due to low data quality and limited data annotations. In this work, we investigate representation learning for imputing missing wearable data and compare it with state-of-the-art statistical approaches. We investigate the performance of the transformer model on 10 physiological and behavioral signals with different masking ratios. Our results show that transformers outperform baselines for missing data imputation of signals that change more frequently, but not for monotonic signals. We further investigate the impact of imputation strategies and masking rations on downstream classification tasks. Our study provides insights for the design and development of masking-based self-supervised learning tasks and advocates the adoption of hybrid-based imputation strategies to address the challenge of missing data in wearable devices. △ Less

Submitted 12 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: Paper accepted in Human-Centric Representation Learning workshop at AAAI 2024 (https://hcrl-workshop.github.io/2024/)

arXiv:2312.09620 [pdf, other]

A Deep Representation Learning-based Speech Enhancement Method Using Complex Convolution Recurrent Variational Autoencoder

Authors: Yang Xiang, **gguang Tian, Xinhui Hu, Xinkang Xu, ZhaoHui Yin

Abstract: Generally, the performance of deep neural networks (DNNs) heavily depends on the quality of data representation learning. Our preliminary work has emphasized the significance of deep representation learning (DRL) in the context of speech enhancement (SE) applications. Specifically, our initial SE algorithm employed a gated recurrent unit variational autoencoder (VAE) with a Gaussian distribution t… ▽ More Generally, the performance of deep neural networks (DNNs) heavily depends on the quality of data representation learning. Our preliminary work has emphasized the significance of deep representation learning (DRL) in the context of speech enhancement (SE) applications. Specifically, our initial SE algorithm employed a gated recurrent unit variational autoencoder (VAE) with a Gaussian distribution to enhance the performance of certain existing SE systems. Building upon our preliminary framework, this paper introduces a novel approach for SE using deep complex convolutional recurrent networks with a VAE (DCCRN-VAE). DCCRN-VAE assumes that the latent variables of signals follow complex Gaussian distributions that are modeled by DCCRN, as these distributions can better capture the behaviors of complex signals. Additionally, we propose the application of a residual loss in DCCRN-VAE to further improve the quality of the enhanced speech. {Compared to our preliminary work, DCCRN-VAE introduces a more sophisticated DCCRN structure and probability distribution for DRL. Furthermore, in comparison to DCCRN, DCCRN-VAE employs a more advanced DRL strategy. The experimental results demonstrate that the proposed SE algorithm outperforms both our preliminary SE framework and the state-of-the-art DCCRN SE method in terms of scale-invariant signal-to-distortion ratio, speech quality, and speech intelligibility. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024

arXiv:2308.11654 [pdf, other]

Large Transformers are Better EEG Learners

Authors: Bingxin Wang, Xiaowen Fu, Yuan Lan, Luchan Zhang, Wei Zheng, Yang Xiang

Abstract: Pre-trained large transformer models have achieved remarkable performance in the fields of natural language processing and computer vision. However, the limited availability of public electroencephalogram (EEG) data presents a unique challenge for extending the success of these models to EEG-based tasks. To address this gap, we propose AdaCT, plug-and-play Adapters designed for Converting Time ser… ▽ More Pre-trained large transformer models have achieved remarkable performance in the fields of natural language processing and computer vision. However, the limited availability of public electroencephalogram (EEG) data presents a unique challenge for extending the success of these models to EEG-based tasks. To address this gap, we propose AdaCT, plug-and-play Adapters designed for Converting Time series data into spatio-temporal 2D pseudo-images or text forms. Essentially, AdaCT-I transforms multi-channel or lengthy single-channel time series data into spatio-temporal 2D pseudo-images for fine-tuning pre-trained vision transformers, while AdaCT-T converts short single-channel data into text for fine-tuning pre-trained language transformers. The proposed approach allows for seamless integration of pre-trained vision models and language models in time series decoding tasks, particularly in EEG data analysis. Experimental results on diverse benchmark datasets, including Epileptic Seizure Recognition, Sleep-EDF, and UCI HAR, demonstrate the superiority of AdaCT over baseline methods. Overall, we provide a promising transfer learning framework for leveraging the capabilities of pre-trained vision and language models in EEG-based tasks, thereby advancing the field of time series decoding and enhancing interpretability in EEG data analysis. Our code will be available at https://github.com/wangbxj1234/AdaCE. △ Less

Submitted 13 April, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

arXiv:2308.10119 [pdf, other]

Error Probability Bounds for Invariant Causal Prediction via Multiple Access Channels

Authors: Austin Goddard, Yu Xiang, Ilya Soloveychik

Abstract: We consider the problem of lower bounding the error probability under the invariant causal prediction (ICP) framework. To this end, we examine and draw connections between ICP and the zero-rate Gaussian multiple access channel by first proposing a variant of the original invariant prediction assumption, and then considering a special case of the Gaussian multiple access channel where a codebook is… ▽ More We consider the problem of lower bounding the error probability under the invariant causal prediction (ICP) framework. To this end, we examine and draw connections between ICP and the zero-rate Gaussian multiple access channel by first proposing a variant of the original invariant prediction assumption, and then considering a special case of the Gaussian multiple access channel where a codebook is shared between an unknown number of senders. This connection allows us to develop three types of lower bounds on the error probability, each with different assumptions and constraints, leveraging techniques for multiple access channels. The proposed bounds are evaluated with respect to existing causal discovery methods as well as a proposed heuristic method based on minimum distance decoding. △ Less

Submitted 19 August, 2023; originally announced August 2023.

Comments: Accepted to the 2023 Asilomar Conference on Signals, Systems, and Computers

arXiv:2308.05987 [pdf, other]

Large-Scale Learning on Overlapped Speech Detection: New Benchmark and New General System

Authors: Zhaohui Yin, **gguang Tian, Xinhui Hu, Xinkang Xu, Yang Xiang

Abstract: Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which led to the robustness of them lacks benchmark for evaluation and the accuracy of them remains inadequate in realistic acoustic environments. To solve t… ▽ More Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which led to the robustness of them lacks benchmark for evaluation and the accuracy of them remains inadequate in realistic acoustic environments. To solve these problem, we conduct a study of large-scale learning (LSL) in OSD tasks and propose a new general OSD system named CF-OSD with LSL based on Conformer network and LSL. In our study, a large-scale test set consisting of 151h labeled speech of different styles, languages and sound-source distances is produced and used as a new benchmark for evaluating the generality of OSD systems. Rigorous comparative experiments are designed and used to evaluate the effectiveness of LSL in OSD tasks and define the OSD model of our general OSD system. The experiment results show that LSL can significantly improve the accuracy and robustness of OSD systems, and the CF-OSD with LSL system significantly outperforms other OSD systems on our proposed benchmark. Moreover, our system has also achieved state-of-the-art performance on existing small dataset benchmarks, reaching 81.6\% and 53.8\% in the Alimeeting testset and DIHARD II evaluation set, respectively. △ Less

Submitted 7 September, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2308.04805 [pdf, other]

doi 10.1145/3581783.3613750

DiVa: An Iterative Framework to Harvest More Diverse and Valid Labels from User Comments for Music

Authors: Hongru Liang, **gyao Liu, Yuanxin Xiang, Jiachen Du, Lanjun Zhou, Shushen Pan, Wenqiang Lei

Abstract: Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough map**s to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an… ▽ More Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough map**s to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an essential but under-explored setting, where the model is required to harvest more diverse and valid labels from the users' comments given limited gold labels. To this end, we design an iterative framework (DiVa) to harvest more $\underline{\text{Di}}$verse and $\underline{\text{Va}}$lid labels from user comments for music. The framework makes a classifier able to form complete sets of labels for songs via pseudo-labels inferred from pre-trained classifiers and a novel joint score function. The experiment on a densely annotated testing set reveals the superiority of the Diva over state-of-the-art solutions in producing more diverse labels missed by the gold labels. We hope our work can inspire future research on automated music labeling. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 11 pages, 5 figures, published to ACM MM 2023

arXiv:2307.09850 [pdf, ps, other]

Communication-Efficient Distribution-Free Inference Over Networks

Authors: Mehrdad Pournaderi, Yu Xiang

Abstract: Consider a star network where each local node possesses a set of test statistics that exhibit a symmetric distribution around zero when their corresponding null hypothesis is true. This paper investigates statistical inference problems in networks concerning the aggregation of this general type of statistics and global error rate control under communication constraints in various scenarios. The st… ▽ More Consider a star network where each local node possesses a set of test statistics that exhibit a symmetric distribution around zero when their corresponding null hypothesis is true. This paper investigates statistical inference problems in networks concerning the aggregation of this general type of statistics and global error rate control under communication constraints in various scenarios. The study proposes communication-efficient algorithms that are built on established non-parametric methods, such as the Wilcoxon and sign tests, as well as modern inference methods such as the Benjamini-Hochberg (BH) and Barber-Candes (BC) procedures, coupled with sampling and quantization operations. The proposed methods are evaluated through extensive simulation studies. △ Less

Submitted 28 November, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

Comments: Presented in the Asilomar Conference on Signals, Systems, and Computers (2023)

arXiv:2306.08303 [pdf, other]

Pedestrian Recognition with Radar Data-Enhanced Deep Learning Approach Based on Micro-Doppler Signatures

Authors: Haoming Li, Yu Xiang, Haodong Xu, Wenyong Wang

Abstract: As a hot topic in recent years, the ability of pedestrians identification based on radar micro-Doppler signatures is limited by the lack of adequate training data. In this paper, we propose a data-enhanced multi-characteristic learning (DEMCL) model with data enhancement (DE) module and multi-characteristic learning (MCL) module to learn more complementary pedestrian micro-Doppler (m-D) signatures… ▽ More As a hot topic in recent years, the ability of pedestrians identification based on radar micro-Doppler signatures is limited by the lack of adequate training data. In this paper, we propose a data-enhanced multi-characteristic learning (DEMCL) model with data enhancement (DE) module and multi-characteristic learning (MCL) module to learn more complementary pedestrian micro-Doppler (m-D) signatures. In DE module, a range-Doppler generative adversarial network (RDGAN) is proposed to enhance free walking datasets, and MCL module with multi-scale convolution neural network (MCNN) and radial basis function neural network (RBFNN) is trained to learn m-D signatures extracted from enhanced datasets. Experimental results show that our model is 3.33% to 10.24% more accurate than other studies and has a short run time of 0.9324 seconds on a 25-minute walking dataset. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 6 pages,17 figures

arXiv:2305.11202 [pdf]

LLM-based Frameworks for Power Engineering from Routine to Novel Tasks

Authors: Ran Li, Chuanqing Pu, Junyi Tao, Canbing Li, Feilong Fan, Yue Xiang, Sijie Chen

Abstract: The digitalization of energy sectors has expanded the coding responsibilities for power engineers and researchers. This research article explores the potential of leveraging Large Language Models (LLMs) to alleviate this burden. Here, we propose LLM-based frameworks for different programming tasks in power systems. For well-defined and routine tasks like the classic unit commitment (UC) problem, w… ▽ More The digitalization of energy sectors has expanded the coding responsibilities for power engineers and researchers. This research article explores the potential of leveraging Large Language Models (LLMs) to alleviate this burden. Here, we propose LLM-based frameworks for different programming tasks in power systems. For well-defined and routine tasks like the classic unit commitment (UC) problem, we deploy an end-to-end framework to systematically assesses four leading LLMs-ChatGPT 3.5, ChatGPT 4.0, Claude and Google Bard in terms of success rate, consistency, and robustness. For complex tasks with limited prior knowledge, we propose a human-in-the-loop framework to enable engineers and LLMs to collaboratively solve the problem through interactive-learning of method recommendation, problem de-composition, subtask programming and synthesis. Through a comparative study between two frameworks, we find that human-in-the-loop features like web access, problem decomposition with field knowledge and human-assisted code synthesis are essential as LLMs currently still fall short in acquiring cutting-edge and domain-specific knowledge to complete a holistic problem-solving project. △ Less

Submitted 19 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.04269 [pdf, other]

Dual Residual Attention Network for Image Denoising

Authors: Wencong Wu, Shijie Liu, Yi Zhou, Yungang Zhang, Yu Xiang

Abstract: In image denoising, deep convolutional neural networks (CNNs) can obtain favorable performance on removing spatially invariant noise. However, many of these networks cannot perform well on removing the real noise (i.e. spatially variant noise) generated during image acquisition or transmission, which severely sets back their application in practical image denoising tasks. Instead of continuously i… ▽ More In image denoising, deep convolutional neural networks (CNNs) can obtain favorable performance on removing spatially invariant noise. However, many of these networks cannot perform well on removing the real noise (i.e. spatially variant noise) generated during image acquisition or transmission, which severely sets back their application in practical image denoising tasks. Instead of continuously increasing the network depth, many researchers have revealed that expanding the width of networks can also be a useful way to improve model performance. It also has been verified that feature filtering can promote the learning ability of the models. Therefore, in this paper, we propose a novel Dual-branch Residual Attention Network (DRANet) for image denoising, which has both the merits of a wide model architecture and attention-guided feature learning. The proposed DRANet includes two different parallel branches, which can capture complementary features to enhance the learning ability of the model. We designed a new residual attention block (RAB) and a novel hybrid dilated residual attention block (HDRAB) for the upper and the lower branches, respectively. The RAB and HDRAB can capture rich local features through multiple skip connections between different convolutional layers, and the unimportant features are dropped by the residual attention modules. Meanwhile, the long skip connections in each branch, and the global feature fusion between the two parallel branches can capture the global features as well. Moreover, the proposed DRANet uses downsampling operations and dilated convolutions to increase the size of the receptive field, which can enable DRANet to capture more image context information. Extensive experiments demonstrate that compared with other state-of-the-art denoising methods, our DRANet can produce competitive denoising performance both on synthetic and real-world noise removal. △ Less

Submitted 7 May, 2023; originally announced May 2023.

arXiv:2302.08271 [pdf, ps, other]

LiQuiD-MIMO Radar: Distributed MIMO Radar with Low-Bit Quantization

Authors: Yikun Xiang, Feng Xi, Shengyao Chen

Abstract: Distributed MIMO radar is known to achieve superior sensing performance by employing widely separated antennas. However, it is challenging to implement a low-complexity distributed MIMO radar due to the complex operations at both the receivers and the fusion center. This work proposed a low-bit quantized distributed MIMO (LiQuiD-MIMO) radar to significantly reduce the burden of signal acquisition… ▽ More Distributed MIMO radar is known to achieve superior sensing performance by employing widely separated antennas. However, it is challenging to implement a low-complexity distributed MIMO radar due to the complex operations at both the receivers and the fusion center. This work proposed a low-bit quantized distributed MIMO (LiQuiD-MIMO) radar to significantly reduce the burden of signal acquisition and data transmission. In the LiQuiD-MIMO radar, the widely-separated receivers are restricted to operating with low-resolution ADCs and deliver the low-bit quantized data to the fusion center. At the fusion center, the induced quantization distortion is explicitly compensated via digital processing. By exploiting the inherent structure of our problem, a quantized version of the robust principal component analysis (RPCA) problem is formulated to simultaneously recover the low-rank target information matrices as well as the sparse data transmission errors. The least squares-based method is then employed to estimate the targets' positions and velocities from the recovered target information matrices. Numerical experiments demonstrate that the proposed LiQuiD-MIMO radar, configured with the developed algorithm, can achieve accurate target parameter estimation. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: 5 pages, 4 figures

arXiv:2301.07409 [pdf, other]

doi 10.1109/TPAMI.2024.3386985

Representing Noisy Image Without Denoising

Authors: Shuren Qi, Yushu Zhang, Chao Wang, Tao Xiang, Xiaochun Cao, Yong Xiang

Abstract: A long-standing topic in artificial intelligence is the effective recognition of patterns from noisy images. In this regard, the recent data-driven paradigm considers 1) improving the representation robustness by adding noisy samples in training phase (i.e., data augmentation) or 2) pre-processing the noisy image by learning to solve the inverse problem (i.e., image denoising). However, such metho… ▽ More A long-standing topic in artificial intelligence is the effective recognition of patterns from noisy images. In this regard, the recent data-driven paradigm considers 1) improving the representation robustness by adding noisy samples in training phase (i.e., data augmentation) or 2) pre-processing the noisy image by learning to solve the inverse problem (i.e., image denoising). However, such methods generally exhibit inefficient process and unstable result, limiting their practical applications. In this paper, we explore a non-learning paradigm that aims to derive robust representation directly from noisy images, without the denoising as pre-processing. Here, the noise-robust representation is designed as Fractional-order Moments in Radon space (FMR), with also beneficial properties of orthogonality and rotation invariance. Unlike earlier integer-order methods, our work is a more generic design taking such classical methods as special cases, and the introduced fractional-order parameter offers time-frequency analysis capability that is not available in classical methods. Formally, both implicit and explicit paths for constructing the FMR are discussed in detail. Extensive simulation experiments and an image security application are provided to demonstrate the uniqueness and usefulness of our FMR, especially for noise robustness, rotation invariance, and time-frequency discriminability. △ Less

Submitted 19 June, 2024; v1 submitted 18 January, 2023; originally announced January 2023.

Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

arXiv:2301.00308 [pdf, other]

High-Accuracy Absolute-Position-Aided Code Phase Tracking Based on RTK/INS Deep Integration in Challenging Static Scenarios

Authors: Yiran Luo, Li-Ta Hsu, Yang Jiang, Baoyu Liu, Zhetao Zhang, Yan Xiang, Naser El-Sheimy

Abstract: Many multi-sensor navigation systems urgently demand accurate positioning initialization from global navigation satellite systems (GNSSs) in challenging static scenarios. However, ground blockages against line-of-sight (LOS) signal reception make it difficult for GNSS users. Steering local codes in GNSS basebands is a desiring way to correct instantaneous signal phase misalignment, efficiently gat… ▽ More Many multi-sensor navigation systems urgently demand accurate positioning initialization from global navigation satellite systems (GNSSs) in challenging static scenarios. However, ground blockages against line-of-sight (LOS) signal reception make it difficult for GNSS users. Steering local codes in GNSS basebands is a desiring way to correct instantaneous signal phase misalignment, efficiently gathering useful signal power and increasing positioning accuracy. Besides, inertial navigation systems (INSs) have been used as a well-complementary dead reckoning (DR) sensor for GNSS receivers in kinematic scenarios resisting various interferences since early. But little work focuses on the case of whether the INS can improve GNSS receivers in static scenarios. Thus, this paper proposes an enhanced navigation system deeply integrated with low-cost INS solutions and GNSS high-accuracy carrier-based positioning. First, an absolute code phase is predicted from base station information, and integrated solution of the INS DR and real-time kinematic (RTK) results through an extended Kalman filter (EKF). Then, a numerically controlled oscillator (NCO) leverages the predicted code phase to improve the alignment between instantaneous local code phases and received ones. The proposed algorithm is realized in a vector-tracking GNSS software-defined radio (SDR). Real-world experiments demonstrate the proposed SDR regarding estimating time-of-arrival (TOA) and positioning accuracy. △ Less

Submitted 31 December, 2022; originally announced January 2023.

Comments: 27 pages, 18 figures

arXiv:2211.16059 [pdf, ps, other]

On Large-Scale Multiple Testing Over Networks: An Asymptotic Approach

Authors: Mehrdad Pournaderi, Yu Xiang

Abstract: This work concerns develo** communication- and computation-efficient methods for large-scale multiple testing over networks, which is of interest to many practical applications. We take an asymptotic approach and propose two methods, proportion-matching and greedy aggregation, tailored to distributed settings. The proportion-matching method achieves the global BH performance yet only requires a… ▽ More This work concerns develo** communication- and computation-efficient methods for large-scale multiple testing over networks, which is of interest to many practical applications. We take an asymptotic approach and propose two methods, proportion-matching and greedy aggregation, tailored to distributed settings. The proportion-matching method achieves the global BH performance yet only requires a one-shot communication of the (estimated) proportion of true null hypotheses as well as the number of p-values at each node. By focusing on the asymptotic optimal power, we go beyond the BH procedure by providing an explicit characterization of the asymptotic optimal solution. This leads to the greedy aggregation method that effectively approximates the optimal rejection regions at each node, while computation efficiency comes from the greedy-type approach naturally. Moreover, for both methods, we provide the rate of convergence for both the FDR and power. Extensive numerical results over a variety of challenging settings are provided to support our theoretical findings. △ Less

Submitted 16 March, 2024; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: Published in the IEEE Transactions on Signal and Information Processing over Networks

arXiv:2211.09166 [pdf, other]

A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training

Authors: Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

Abstract: This paper focuses on leveraging deep representation learning (DRL) for speech enhancement (SE). In general, the performance of the deep neural network (DNN) is heavily dependent on the learning of data representation. However, the DRL's importance is often ignored in many DNN-based SE algorithms. To obtain a higher quality enhanced speech, we propose a two-stage DRL-based SE method through advers… ▽ More This paper focuses on leveraging deep representation learning (DRL) for speech enhancement (SE). In general, the performance of the deep neural network (DNN) is heavily dependent on the learning of data representation. However, the DRL's importance is often ignored in many DNN-based SE algorithms. To obtain a higher quality enhanced speech, we propose a two-stage DRL-based SE method through adversarial training. In the first stage, we disentangle different latent variables because disentangled representations can help DNN generate a better enhanced speech. Specifically, we use the $β$-variational autoencoder (VAE) algorithm to obtain the speech and noise posterior estimations and related representations from the observed signal. However, since the posteriors and representations are intractable and we can only apply a conditional assumption to estimate them, it is difficult to ensure that these estimations are always pretty accurate, which may potentially degrade the final accuracy of the signal estimation. To further improve the quality of enhanced speech, in the second stage, we introduce adversarial training to reduce the effect of the inaccurate posterior towards signal reconstruction and improve the signal estimation accuracy, making our algorithm more robust for the potentially inaccurate posterior estimations. As a result, better SE performance can be achieved. The experimental results indicate that the proposed strategy can help similar DNN-based SE algorithms achieve higher short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and scale-invariant signal-to-distortion ratio (SI-SDR) scores. Moreover, the proposed algorithm can also outperform recent competitive SE algorithms. △ Less

Submitted 27 September, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

arXiv:2211.03885 [pdf, other]

Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Shuai Liu, Chaoyu Feng, Furui Bai, Xiaotao Wang, Lei Lei, Ziyao Yi, Yan Xiang, Zibin Liu, Shaoqing Li, Keming Shi, Dehui Kong, Ke Xu, Minsu Kwon, Yaqi Wu, Jiesi Zheng, Zhihao Fan, Xun Wu, Feng Zhang, Albert No, Minhyeok Cho, Zewen Chen, Xiaze Zhang, Ran Li , et al. (13 additional authors not shown)

Abstract: The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. Th… ▽ More The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

arXiv:2210.17408 [pdf, ps, other]

Accelerating Diffusion Models via Pre-segmentation Diffusion Sampling for Medical Image Segmentation

Authors: Xutao Guo, Yanwu Yang, Chenfei Ye, Shang Lu, Yang Xiang, Ting Ma

Abstract: Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian n… ▽ More Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian noise, resulting in extremely inefficient inference. To mitigate the issue, we propose a principled acceleration strategy, called pre-segmentation diffusion sampling DDPM (PD-DDPM), which is specially used for medical image segmentation. The key idea is to obtain pre-segmentation results based on a separately trained segmentation network, and construct noise predictions (non-Gaussian distribution) according to the forward diffusion rule. We can then start with noisy predictions and use fewer reverse steps to generate segmentation results. Experiments show that PD-DDPM yields better segmentation results over representative baseline methods even if the number of reverse steps is significantly reduced. Moreover, PD-DDPM is orthogonal to existing advanced segmentation models, which can be combined to further improve the segmentation performance. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.13721 [pdf, other]

Multi-modal Dynamic Graph Network: Coupling Structural and Functional Connectome for Disease Diagnosis and Classification

Authors: Yanwu Yang, Xutao Guo, Zhikai Chang, Chenfei Ye, Yang Xiang, Ting Ma

Abstract: Multi-modal neuroimaging technology has greatlly facilitated the efficiency and diagnosis accuracy, which provides complementary information in discovering objective disease biomarkers. Conventional deep learning methods, e.g. convolutional neural networks, overlook relationships between nodes and fail to capture topological properties in graphs. Graph neural networks have been proven to be of gre… ▽ More Multi-modal neuroimaging technology has greatlly facilitated the efficiency and diagnosis accuracy, which provides complementary information in discovering objective disease biomarkers. Conventional deep learning methods, e.g. convolutional neural networks, overlook relationships between nodes and fail to capture topological properties in graphs. Graph neural networks have been proven to be of great importance in modeling brain connectome networks and relating disease-specific patterns. However, most existing graph methods explicitly require known graph structures, which are not available in the sophisticated brain system. Especially in heterogeneous multi-modal brain networks, there exists a great challenge to model interactions among brain regions in consideration of inter-modal dependencies. In this study, we propose a Multi-modal Dynamic Graph Convolution Network (MDGCN) for structural and functional brain network learning. Our method benefits from modeling inter-modal representations and relating attentive multi-model associations into dynamic graphs with a compositional correspondence matrix. Moreover, a bilateral graph convolution layer is proposed to aggregate multi-modal representations in terms of multi-modal associations. Extensive experiments on three datasets demonstrate the superiority of our proposed method in terms of disease classification, with the accuracy of 90.4%, 85.9% and 98.3% in predicting Mild Cognitive Impairment (MCI), Parkinson's disease (PD), and schizophrenia (SCHZ) respectively. Furthermore, our statistical evaluations on the correspondence matrix exhibit a high correspondence with previous evidence of biomarkers. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2210.04435 [pdf, other]

Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning

Authors: Xiaoyu Huang, Zhongyu Li, Yanzhen Xiang, Yiming Ni, Yufeng Chi, Yunhao Li, Lizhi Yang, Xue Bin Peng, Koushil Sreenath

Abstract: We present a reinforcement learning (RL) framework that enables quadrupedal robots to perform soccer goalkee** tasks in the real world. Soccer goalkee** using quadrupeds is a challenging problem, that combines highly dynamic locomotion with precise and fast non-prehensile object (ball) manipulation. The robot needs to react to and intercept a potentially flying ball using dynamic locomotion ma… ▽ More We present a reinforcement learning (RL) framework that enables quadrupedal robots to perform soccer goalkee** tasks in the real world. Soccer goalkee** using quadrupeds is a challenging problem, that combines highly dynamic locomotion with precise and fast non-prehensile object (ball) manipulation. The robot needs to react to and intercept a potentially flying ball using dynamic locomotion maneuvers in a very short amount of time, usually less than one second. In this paper, we propose to address this problem using a hierarchical model-free RL framework. The first component of the framework contains multiple control policies for distinct locomotion skills, which can be used to cover different regions of the goal. Each control policy enables the robot to track random parametric end-effector trajectories while performing one specific locomotion skill, such as jump, dive, and sidestep. These skills are then utilized by the second part of the framework which is a high-level planner to determine a desired skill and end-effector trajectory in order to intercept a ball flying to different regions of the goal. We deploy the proposed framework on a Mini Cheetah quadrupedal robot and demonstrate the effectiveness of our framework for various agile interceptions of a fast-moving ball in the real world. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: First two authors contributed equally. Accompanying video is at https://youtu.be/iX6OgG67-ZQ

arXiv:2210.03301 [pdf, other]

GOLLIC: Learning Global Context beyond Patches for Lossless High-Resolution Image Compression

Authors: Yuan Lan, Liang Qin, Zhaoyi Sun, Yang Xiang, Jie Sun

Abstract: Neural-network-based approaches recently emerged in the field of data compression and have already led to significant progress in image compression, especially in achieving a higher compression ratio. In the lossless image compression scenario, however, existing methods often struggle to learn a probability model of full-size high-resolution images due to the limitation of the computation source.… ▽ More Neural-network-based approaches recently emerged in the field of data compression and have already led to significant progress in image compression, especially in achieving a higher compression ratio. In the lossless image compression scenario, however, existing methods often struggle to learn a probability model of full-size high-resolution images due to the limitation of the computation source. The current strategy is to crop high-resolution images into multiple non-overlap** patches and process them independently. This strategy ignores long-term dependencies beyond patches, thus limiting modeling performance. To address this problem, we propose a hierarchical latent variable model with a global context to capture the long-term dependencies of high-resolution images. Besides the latent variable unique to each patch, we introduce shared latent variables between patches to construct the global context. The shared latent variables are extracted by a self-supervised clustering module inside the model's encoder. This clustering module assigns each patch the confidence that it belongs to any cluster. Later, shared latent variables are learned according to latent variables of patches and their confidence, which reflects the similarity of patches in the same cluster and benefits the global context modeling. Experimental results show that our global context model improves compression ratio compared to the engineered codecs and deep learning models on three benchmark high-resolution image datasets, DIV2K, CLIC.pro, and CLIC.mobile. △ Less

Submitted 6 October, 2022; originally announced October 2022.

arXiv:2210.02555 [pdf, ps, other]

Sample-and-Forward: Communication-Efficient Control of the False Discovery Rate in Networks

Authors: Mehrdad Pournaderi, Yu Xiang

Abstract: This work concerns controlling the false discovery rate (FDR) in networks under communication constraints. We present sample-and-forward, a flexible and communication-efficient version of the Benjamini-Hochberg (BH) procedure for multihop networks with general topologies. Our method evidences that the nodes in a network do not need to communicate p-values to each other to achieve a decent statisti… ▽ More This work concerns controlling the false discovery rate (FDR) in networks under communication constraints. We present sample-and-forward, a flexible and communication-efficient version of the Benjamini-Hochberg (BH) procedure for multihop networks with general topologies. Our method evidences that the nodes in a network do not need to communicate p-values to each other to achieve a decent statistical power under the global FDR control constraint. Consider a network with a total of $m$ p-values, our method consists of first sampling the (empirical) CDF of the p-values at each node and then forwarding $\mathcal{O}(\log m)$ bits to its neighbors. Under the same assumptions as for the original BH procedure, our method has both the provable finite-sample FDR control as well as competitive empirical detection power, even with a few samples at each node. We provide an asymptotic analysis of power under a mixture model assumption on the p-values. △ Less

Submitted 15 May, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: Accepted to the 2023 IEEE International Symposium on Information Theory (ISIT)

arXiv:2209.12642 [pdf]

Design of Automatic Driving Safety Level and Positioning Accuracy

Authors: Tiantian Tang, Hao Xu, Chengcheng Wu, Sijie Lye, Yan Xiang

Abstract: Autonomous driving is a hot research topic in the frontier of science and technology. Technology companies and traditional car companies are develo** and designing autonomous driving technology from two different directions. Based on the automatic driving classification standard and ISO safety level, combined with the number of traffic accidents and death data in China, and referring to the risk… ▽ More Autonomous driving is a hot research topic in the frontier of science and technology. Technology companies and traditional car companies are develo** and designing autonomous driving technology from two different directions. Based on the automatic driving classification standard and ISO safety level, combined with the number of traffic accidents and death data in China, and referring to the risk allocation method of the automated driving virtual drive system in the United States, the risk allocation of China's virtual drive system will be carried out. In addition, combined with the vehicle "positioning box" model, the theoretical calculation of the alarm limit of positioning accuracy in China will be carried out and the positioning accuracy requirements of related vehicles will be designed. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: in Chinese language

arXiv:2209.08933 [pdf, ps, other]

Estimating Brain Age with Global and Local Dependencies

Authors: Yanwu Yang, Xutao Guo, Zhikai Chang, Chenfei Ye, Yang Xiang, Haiyan Lv, Ting Ma

Abstract: The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such a… ▽ More The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such as local convolution and recurrent operations that process one local neighborhood at a time. Instead, Vision Transformers learn global attentive interaction of patch tokens, introducing less inductive bias and modeling long-range dependencies. In terms of this, we proposed a novel network for learning brain age interpreting with global and local dependencies, where the corresponding representations are captured by Successive Permuted Transformer (SPT) and convolution blocks. The SPT brings computation efficiency and locates the 3D spatial information indirectly via continuously encoding 2D slices from different views. Finally, we collect a large cohort of 22645 subjects with ages ranging from 14 to 97 and our network performed the best among a series of deep learning methods, yielding a mean absolute error (MAE) of 2.855 in validation set, and 2.911 in an independent test set. △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2207.00268 [pdf, ps, other]

doi 10.1088/1674-4527/ac7cba

High-resolution Solar Image Reconstruction Based on Non-rigid Alignment

Authors: Hui Liu, Zhenyu **, Yongyuan Xiang, Kaifan Ji

Abstract: Suppressing the interference of atmospheric turbulence and obtaining observation data with a high spatial resolution is an issue to be solved urgently for ground observations. One way to solve this problem is to perform a statistical reconstruction of short-exposure speckle images. Combining the rapidity of Shift-Add and the accuracy of speckle masking, this paper proposes a novel reconstruction a… ▽ More Suppressing the interference of atmospheric turbulence and obtaining observation data with a high spatial resolution is an issue to be solved urgently for ground observations. One way to solve this problem is to perform a statistical reconstruction of short-exposure speckle images. Combining the rapidity of Shift-Add and the accuracy of speckle masking, this paper proposes a novel reconstruction algorithm-NASIR (Non-rigid Alignment based Solar Image Reconstruction). NASIR reconstructs the phase of the object image at each frequency by building a computational model between geometric distortion and intensity distribution and reconstructs the modulus of the object image on the aligned speckle images by speckle interferometry. We analyzed the performance of NASIR by using the correlation coefficient, power spectrum, and coefficient of variation of intensity profile (CVoIP) in processing data obtained by the NVST (1m New Vacuum Solar Telescope). The reconstruction experiments and analysis results show that the quality of images reconstructed by NASIR is close to speckle masking when the seeing is good, while NASIR has excellent robustness when the seeing condition becomes worse. Furthermore, NASIR reconstructs the entire field of view in parallel in one go, without phase recursion and block-by-block reconstruction, so its computation time is less than half that of speckle masking. Therefore, we consider NASIR is a robust and high-quality fast reconstruction method that can serve as an effective tool for data filtering and quick look. △ Less

Submitted 1 July, 2022; originally announced July 2022.

arXiv:2206.14362 [pdf, other]

Lower Bounds on the Error Probability for Invariant Causal Prediction

Authors: Austin Goddard, Yu Xiang, Ilya Soloveychik

Abstract: It is common practice to collect observations of feature and response pairs from different environments. A natural question is how to identify features that have consistent prediction power across environments. The invariant causal prediction framework proposes to approach this problem through invariance, assuming a linear model that is invariant under different environments. In this work, we make… ▽ More It is common practice to collect observations of feature and response pairs from different environments. A natural question is how to identify features that have consistent prediction power across environments. The invariant causal prediction framework proposes to approach this problem through invariance, assuming a linear model that is invariant under different environments. In this work, we make an attempt to shed light on this framework by connecting it to the Gaussian multiple access channel problem. Specifically, we incorporate optimal code constructions and decoding methods to provide lower bounds on the error probability. We illustrate our findings by various simulation settings. △ Less

Submitted 29 June, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

Comments: Accepted to the 2022 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)

arXiv:2205.05581 [pdf, other]

A deep representation learning speech enhancement method using $β$-VAE

Authors: Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

Abstract: In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could be improved by deep representation learning (DRL). Based on our previous work, we in this paper propose to use $β$-VAE to further improve PVAE's ability of repr… ▽ More In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could be improved by deep representation learning (DRL). Based on our previous work, we in this paper propose to use $β$-VAE to further improve PVAE's ability of representation learning. More specifically, our $β$-VAE can improve PVAE's capacity of disentangling different latent variables from the observed signal without the trade-off problem between disentanglement and signal reconstruction. This trade-off problem widely exists in previous $β$-VAE algorithms. Unlike the previous $β$-VAE algorithms, the proposed $β$-VAE strategy can also be used to optimize the DNN's structure. This means that the proposed method can not only improve PVAE's SE performance but also reduce the number of PVAE training parameters. The experimental results show that the proposed method can acquire better speech and noise latent representation than PVAE. Meanwhile, it also obtains a higher scale-invariant signal-to-distortion ratio, speech quality, and speech intelligibility. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Submitted to Eurosipco

arXiv:2203.12236 [pdf, other]

A Multi-Characteristic Learning Method with Micro-Doppler Signatures for Pedestrian Identification

Authors: Yu Xiang, Yu Huang, Haodong Xu, Guangbo Zhang, Wenyong Wang

Abstract: The identification of pedestrians using radar micro-Doppler signatures has become a hot topic in recent years. In this paper, we propose a multi-characteristic learning (MCL) model with clusters to jointly learn discrepant pedestrian micro-Doppler signatures and fuse the knowledge learned from each cluster into final decisions. Time-Doppler spectrogram (TDS) and signal statistical features extract… ▽ More The identification of pedestrians using radar micro-Doppler signatures has become a hot topic in recent years. In this paper, we propose a multi-characteristic learning (MCL) model with clusters to jointly learn discrepant pedestrian micro-Doppler signatures and fuse the knowledge learned from each cluster into final decisions. Time-Doppler spectrogram (TDS) and signal statistical features extracted from FMCW radar, as two categories of micro-Doppler signatures, are used in MCL to learn the micro-motion information inside pedestrians' free walking patterns. The experimental results show that our model achieves a higher accuracy rate and is more stable for pedestrian identification than other studies, which make our model more practical. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2203.02849 [pdf, ps, other]

doi 10.1016/j.jspi.2023.106119

Variable Selection with the Knockoffs: Composite Null Hypotheses

Authors: Mehrdad Pournaderi, Yu Xiang

Abstract: The fixed-X knockoff filter is a flexible framework for variable selection with false discovery rate (FDR) control in linear models with arbitrary design matrices (of full column rank) and it allows for finite-sample selective inference via the Lasso estimates. In this paper, we extend the theory of the knockoff procedure to tests with composite null hypotheses, which are usually more relevant to… ▽ More The fixed-X knockoff filter is a flexible framework for variable selection with false discovery rate (FDR) control in linear models with arbitrary design matrices (of full column rank) and it allows for finite-sample selective inference via the Lasso estimates. In this paper, we extend the theory of the knockoff procedure to tests with composite null hypotheses, which are usually more relevant to real-world problems. The main technical challenge lies in handling composite nulls in tandem with dependent features from arbitrary designs. We develop two methods for composite inference with the knockoffs, namely, shifted ordinary least-squares (S-OLS) and feature-response product perturbation (FRPP), building on new structural properties of test statistics under composite nulls. We also propose two heuristic variants of S-OLS method that outperform the celebrated Benjamini-Hochberg (BH) procedure for composite nulls, which serves as a heuristic baseline under dependent test statistics. Finally, we analyze the loss in FDR when the original knockoff procedure is naively applied on composite tests. △ Less

Submitted 27 November, 2023; v1 submitted 5 March, 2022; originally announced March 2022.

Journal ref: Journal of Statistical Planning and Inference, Volume 231, 2024, 106119, ISSN 0378-3758

arXiv:2202.05416 [pdf, other]

FAAG: Fast Adversarial Audio Generation through Interactive Attack Optimisation

Authors: Yuantian Miao, Chao Chen, Lei Pan, Jun Zhang, Yang Xiang

Abstract: Automatic Speech Recognition services (ASRs) inherit deep neural networks' vulnerabilities like crafted adversarial examples. Existing methods often suffer from low efficiency because the target phases are added to the entire audio sample, resulting in high demand for computational resources. This paper proposes a novel scheme named FAAG as an iterative optimization-based method to generate target… ▽ More Automatic Speech Recognition services (ASRs) inherit deep neural networks' vulnerabilities like crafted adversarial examples. Existing methods often suffer from low efficiency because the target phases are added to the entire audio sample, resulting in high demand for computational resources. This paper proposes a novel scheme named FAAG as an iterative optimization-based method to generate targeted adversarial examples quickly. By injecting the noise over the beginning part of the audio, FAAG generates adversarial audio in high quality with a high success rate timely. Specifically, we use audio's logits output to map each character in the transcription to an approximate position of the audio's frame. Thus, an adversarial example can be generated by FAAG in approximately two minutes using CPUs only and around ten seconds with one GPU while maintaining an average success rate over 85%. Specifically, the FAAG method can speed up around 60% compared with the baseline method during the adversarial example generation process. Furthermore, we found that appending benign audio to any suspicious examples can effectively defend against the targeted adversarial attack. We hope that this work paves the way for inventing new adversarial attacks against speech recognition with computational constraints. △ Less

Submitted 10 February, 2022; originally announced February 2022.

arXiv:2201.13008 [pdf, ps, other]

Communication-Efficient Distributed Multiple Testing for Large-Scale Inference

Authors: Mehrdad Pournaderi, Yu Xiang

Abstract: The Benjamini-Hochberg (BH) procedure is a celebrated method for multiple testing with false discovery rate (FDR) control. In this paper, we consider large-scale distributed networks where each node possesses a large number of p-values and the goal is to achieve the global BH performance in a communication-efficient manner. We propose that every node performs a local test with an adjusted test siz… ▽ More The Benjamini-Hochberg (BH) procedure is a celebrated method for multiple testing with false discovery rate (FDR) control. In this paper, we consider large-scale distributed networks where each node possesses a large number of p-values and the goal is to achieve the global BH performance in a communication-efficient manner. We propose that every node performs a local test with an adjusted test size according to the (estimated) global proportion of true null hypotheses. With suitable assumptions, our method is asymptotically equivalent to the global BH procedure. Motivated by this, we develop an algorithm for star networks where each node only needs to transmit an estimate of the (local) proportion of nulls and the (local) number of p-values to the center node; the center node then broadcasts a parameter (computed based on the global estimate and test size) to the local nodes. In the experiment section, we utilize existing estimators of the proportion of true nulls and consider various settings to evaluate the performance and robustness of our method. △ Less

Submitted 17 December, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

Comments: Accepted to the 2022 IEEE International Symposium on Information Theory (ISIT)

arXiv:2201.09875 [pdf, other]

A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder

Authors: Yang Xiang, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

Abstract: Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to the model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these… ▽ More Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to the model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these VAE-based methods cannot disentangle the speech and noise latent variables from the observed signal. Based on Bayesian theory, this paper derives a novel variational lower bound for VAE, which ensures that VAE can be trained in supervision, and can disentangle speech and noise latent variables from the observed signal. This means that the proposed method can apply the VAE to model both speech and noise signals, which is totally different from the previous VAE-based SE works. More specifically, the proposed DRL method can learn to impose speech and noise signal priors to different sets of latent variables for SE. The experimental results show that the proposed method can not only disentangle speech and noise latent variables from the observed signal but also obtain a higher scale-invariant signal-to-distortion ratio and speech quality score than the similar deep neural network-based (DNN) SE method. △ Less

Submitted 24 January, 2022; originally announced January 2022.

Comments: Accepted by ICASSP 2022

arXiv:2105.10892 [pdf]

Fast Crack Detection Using Convolutional Neural Network

Authors: Jiesheng Yang, Fangzheng Lin, Yusheng Xiang, Peter Katranuschkov, Raimar J. Scherer

Abstract: To improve the efficiency and reduce the labour cost of the renovation process, this study presents a lightweight Convolutional Neural Network (CNN)-based architecture to extract crack-like features, such as cracks and joints. Moreover, Transfer Learning (TF) method was used to save training time while offering comparable prediction results. For three different objectives: 1) Detection of the conc… ▽ More To improve the efficiency and reduce the labour cost of the renovation process, this study presents a lightweight Convolutional Neural Network (CNN)-based architecture to extract crack-like features, such as cracks and joints. Moreover, Transfer Learning (TF) method was used to save training time while offering comparable prediction results. For three different objectives: 1) Detection of the concrete cracks; 2) Detection of natural stone cracks; 3) Differentiation between joints and cracks in natural stone; We built a natural stone dataset with joints and cracks information as complementary for the concrete benchmark dataset. As the results show, our model is demonstrated as an effective tool for industry use. △ Less

Submitted 23 May, 2021; originally announced May 2021.

Comments: 10 pages, 11 figures

arXiv:2105.08876 [pdf, other]

doi 10.1016/j.engappai.2023.107180

A Lightweight Privacy-Preserving Scheme Using Label-based Pixel Block Mixing for Image Classification in Deep Learning

Authors: Yuexin Xiang, Tiantian Li, Wei Ren, Tianqing Zhu, Kim-Kwang Raymond Choo

Abstract: To ensure the privacy of sensitive data used in the training of deep learning models, a number of privacy-preserving methods have been designed by the research community. However, existing schemes are generally designed to work with textual data, or are not efficient when a large number of images is used for training. Hence, in this paper we propose a lightweight and efficient approach to preserve… ▽ More To ensure the privacy of sensitive data used in the training of deep learning models, a number of privacy-preserving methods have been designed by the research community. However, existing schemes are generally designed to work with textual data, or are not efficient when a large number of images is used for training. Hence, in this paper we propose a lightweight and efficient approach to preserve image privacy while maintaining the availability of the training set. Specifically, we design the pixel block mixing algorithm for image classification privacy preservation in deep learning. To evaluate its utility, we use the mixed training set to train the ResNet50, VGG16, InceptionV3 and DenseNet121 models on the WIKI dataset and the CNBC face dataset. Experimental findings on the testing set show that our scheme preserves image privacy while maintaining the availability of the training set in the deep learning models. Additionally, the experimental results demonstrate that we achieve good performance for the VGG16 model on the WIKI dataset and both ResNet50 and DenseNet121 on the CNBC dataset. The pixel block algorithm achieves fairly high efficiency in the mixing of the images, and it is computationally challenging for the attackers to restore the mixed training set to the original training set. Moreover, data augmentation can be applied to the mixed training set to improve the training's effectiveness. △ Less

Submitted 18 May, 2021; originally announced May 2021.

Comments: 11 pages, 16 figures

MSC Class: 68T07 ACM Class: I.2.6; I.2.9

Journal ref: Engineering Applications of Artificial Intelligence 126 (2023): 107180

arXiv:2101.05998 [pdf, other]

doi 10.1109/TVT.2021.3109800

A Vehicles Control Model to Alleviate Traffic Instability

Authors: Jiancheng Fang, Yu Xiang, Yu Huang, Yilong Cui, Wenyong Wang

Abstract: While bringing convenience to people, the growing number of vehicles on road already cause inevitable traffic congestion. Some traffic congestion happen with observable reasons, but others occur without apparent reasons or bottlenecks, which referred to as phantom jams, are caused by traditional vehicle following model. In order to alleviate the traffic instability caused by phantom jam, several m… ▽ More While bringing convenience to people, the growing number of vehicles on road already cause inevitable traffic congestion. Some traffic congestion happen with observable reasons, but others occur without apparent reasons or bottlenecks, which referred to as phantom jams, are caused by traditional vehicle following model. In order to alleviate the traffic instability caused by phantom jam, several models have been proposed with the development of intelligent transportation system (ITS). these have been proved to be able to suppress traffic instability in the ideal situation. But in road scenarios, uncertainties of vehicle state measurements and time delay caused by on-board sensors, inter-vehicle communications and control system of vehicles will affect the performance of the existing models severely, and cannot be ignored. In this paper, a novel predictable bilateral control model-PBCM, which consists of best estimation and state prediction is proposed to determine accurate acceleration values of the host vehicle in traffic flow to alleviate traffic instability. Theoretical analysis and simulation results show that our model could reduce the influence of the measurement errors and the delay caused by communication and control system effectively, control the state of the vehicles in traffic flow accurately, thus achieve the goal of restrain the instability of traffic flow. △ Less

Submitted 15 January, 2021; originally announced January 2021.

Comments: 13 pages, 35 figures

Report number: 9863-9876

Journal ref: IEEE Transactions on Vehicular Technology ( Volume: 70, Issue: 10, Oct. 2021)

arXiv:2009.07220 [pdf, other]

doi 10.1002/jbio.202000508

Multivariate analysis of Brillouin imaging data by supervised and unsupervised learning

Authors: YuChen Xiang, Kai Ling C. Seow, Carl Paterson, Peter Török

Abstract: Brillouin imaging relies on the reliable extraction of subtle spectral information from hyperspectral datasets. To date, the mainstream practice has been using line fitting of spectral features to retrieve the average peak shift and linewidth parameters. Good results, however, depend heavily on sufficient SNR and may not be applicable in complex samples that consist of spectral mixtures. In this w… ▽ More Brillouin imaging relies on the reliable extraction of subtle spectral information from hyperspectral datasets. To date, the mainstream practice has been using line fitting of spectral features to retrieve the average peak shift and linewidth parameters. Good results, however, depend heavily on sufficient SNR and may not be applicable in complex samples that consist of spectral mixtures. In this work, we thus propose the use of various multivariate algorithms that can be used to perform supervised or unsupervised analysis of the hyperspectral data, with which we explore advanced image analysis applications, namely unmixing, classification and segmentation in a phantom and live cells. The resulting images are shown to provide more contrast and detail, and obtained on a timescale $10^2$ faster than fitting. The estimated spectral parameters are consistent with those calculated from pure fitting. △ Less

Submitted 15 September, 2020; originally announced September 2020.

arXiv:2009.02285 [pdf, other]

Flow Field Reconstructions with GANs based on Radial Basis Functions

Authors: Liwei Hu, Wenyong Wang, Yu Xiang, Jun Zhang

Abstract: Nonlinear sparse data regression and generation have been a long-term challenge, to cite the flow field reconstruction as a typical example. The huge computational cost of computational fluid dynamics (CFD) makes it much expensive for large scale CFD data producing, which is the reason why we need some cheaper ways to do this, of which the traditional reduced order models (ROMs) were promising but… ▽ More Nonlinear sparse data regression and generation have been a long-term challenge, to cite the flow field reconstruction as a typical example. The huge computational cost of computational fluid dynamics (CFD) makes it much expensive for large scale CFD data producing, which is the reason why we need some cheaper ways to do this, of which the traditional reduced order models (ROMs) were promising but they couldn't generate a large number of full domain flow field data (FFD) to realize high-precision flow field reconstructions. Motivated by the problems of existing approaches and inspired by the success of the generative adversarial networks (GANs) in the field of computer vision, we prove an optimal discriminator theorem that the optimal discriminator of a GAN is a radial basis function neural network (RBFNN) while dealing with nonlinear sparse FFD regression and generation. Based on this theorem, two radial basis function-based GANs (RBF-GAN and RBFC-GAN), for regression and generation purposes, are proposed. Three different datasets are applied to verify the feasibility of our models. The results show that the performance of the RBF-GAN and the RBFC-GAN are better than that of GANs/cGANs by means of both the mean square error (MSE) and the mean square percentage error (MSPE). Besides, compared with GANs/cGANs, the stability of the RBF-GAN and the RBFC-GAN improve by 34.62% and 72.31%, respectively. Consequently, our proposed models can be used to generate full domain FFD from limited and sparse datasets, to meet the requirement of high-precision flow field reconstructions. △ Less

Submitted 11 August, 2020; originally announced September 2020.

arXiv:2007.07321 [pdf, other]

Loss Minimization of Traction Systems in Battery Electric Vehicles Using Variable DC-link Voltage Technique -- Experimental Study

Authors: Libo Liu, Boyang Li, Gunther Götting, Yusheng Xiang, Qusay Salem, Muhammad Hamid, Jian Xie

Abstract: A novel variable dc-link voltage technique is proposed to reduce the traction losses for electrical drive applications. A 100-unit cascaded multilevel converter is developed to generate the variable dc-link voltage. Experimental measurement shows that the machine additional losses and IGBT-inverter losses are reduced substantially. The system efficiency enhancement is at least 2%. A novel variable dc-link voltage technique is proposed to reduce the traction losses for electrical drive applications. A 100-unit cascaded multilevel converter is developed to generate the variable dc-link voltage. Experimental measurement shows that the machine additional losses and IGBT-inverter losses are reduced substantially. The system efficiency enhancement is at least 2%. △ Less

Submitted 14 July, 2020; originally announced July 2020.

Comments: 9 pages, 7 figures

arXiv:2007.02663 [pdf, other]

doi 10.1007/978-3-030-59722-1_73

An Elastic Interaction-Based Loss Function for Medical Image Segmentation

Authors: Yuan Lan, Yang Xiang, Luchan Zhang

Abstract: Deep learning techniques have shown their success in medical image segmentation since they are easy to manipulate and robust to various types of datasets. The commonly used loss functions in the deep segmentation task are pixel-wise loss functions. This results in a bottleneck for these models to achieve high precision for complicated structures in biomedical images. For example, the predicted sma… ▽ More Deep learning techniques have shown their success in medical image segmentation since they are easy to manipulate and robust to various types of datasets. The commonly used loss functions in the deep segmentation task are pixel-wise loss functions. This results in a bottleneck for these models to achieve high precision for complicated structures in biomedical images. For example, the predicted small blood vessels in retinal images are often disconnected or even missed under the supervision of the pixel-wise losses. This paper addresses this problem by introducing a long-range elastic interaction-based training strategy. In this strategy, convolutional neural network (CNN) learns the target region under the guidance of the elastic interaction energy between the boundary of the predicted region and that of the actual object. Under the supervision of the proposed loss, the boundary of the predicted region is attracted strongly by the object boundary and tends to stay connected. Experimental results show that our method is able to achieve considerable improvements compared to commonly used pixel-wise loss functions (cross entropy and dice Loss) and other recent loss functions on three retinal vessel segmentation datasets, DRIVE, STARE and CHASEDB1. △ Less

Submitted 11 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

arXiv:2006.16689 [pdf, other]

A Speech Enhancement Algorithm based on Non-negative Hidden Markov Model and Kullback-Leibler Divergence

Authors: Yang Xiang, Liming Shi, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen

Abstract: In this paper, we propose a novel supervised single-channel speech enhancement method combing the the Kullback-Leibler divergence-based non-negative matrix factorization (NMF) and hidden Markov model (NMF-HMM). With the application of HMM, the temporal dynamics information of speech signals can be taken into account. In the training stage, the sum of Poisson, leading to the KL divergence measure,… ▽ More In this paper, we propose a novel supervised single-channel speech enhancement method combing the the Kullback-Leibler divergence-based non-negative matrix factorization (NMF) and hidden Markov model (NMF-HMM). With the application of HMM, the temporal dynamics information of speech signals can be taken into account. In the training stage, the sum of Poisson, leading to the KL divergence measure, is used as the observation model for each state of HMM. This ensures that a computationally efficient multiplicative update can be used for the parameter update of the proposed model. In the online enhancement stage, we propose a novel minimum mean-square error (MMSE) estimator for the proposed NMF-HMM. This estimator can be implemented using parallel computing, saving the time complexity. The performance of the proposed algorithm is verified by objective measures. The experimental results show that the proposed strategy achieves better speech enhancement performance than state-of-the-art speech enhancement methods. More specifically, compared with the traditional NMF-based speech enhancement methods, our proposed algorithm achieves a 5\% improvement for short-time objective intelligibility (STOI) and 0.18 improvement for perceptual evaluation of speech quality (PESQ). △ Less

Submitted 30 June, 2020; originally announced June 2020.

arXiv:2006.03169 [pdf, other]

Fast CRDNN: Towards on Site Training of Mobile Construction Machines

Authors: Yusheng Xiang, Tian Tang, Tianqing Su, Christine Brach, Libo Liu, Samuel Mao, Marcus Geimer

Abstract: The CRDNN is a combined neural network that can increase the holistic efficiency of torque based mobile working machines by about 9% by means of accurately detecting the truck loading cycles. On the one hand, it is a robust but offline learning algorithm so that it is more accurate and much quicker than the previous methods. However, on the other hand, its accuracy can not always be guaranteed bec… ▽ More The CRDNN is a combined neural network that can increase the holistic efficiency of torque based mobile working machines by about 9% by means of accurately detecting the truck loading cycles. On the one hand, it is a robust but offline learning algorithm so that it is more accurate and much quicker than the previous methods. However, on the other hand, its accuracy can not always be guaranteed because of the diversity of the mobile machines industry and the nature of the offline method. To address the problem, we utilize the transfer learning algorithm and the Internet of Things (IoT) technology. Concretely, the CRDNN is first trained by computer and then saved in the on-board ECU. In case that the pre-trained CRDNN is not suitable for the new machine, the operator can label some new data by our App connected to the on-board ECU of that machine through Bluetooth. With the newly labeled data, we can directly further train the pretrained CRDNN on the ECU without overloading since transfer learning requires less computation effort than training the networks from scratch. In our paper, we prove this idea and show that CRDNN is always competent, with the help of transfer learning and IoT technology by field experiment, even the new machine may have a different distribution. Also, we compared the performance of other SOTA multivariate time series algorithms on predicting the working state of the mobile machines, which denotes that the CRDNNs are still the most suitable solution. As a by-product, we build up a human-machine communication system to label the dataset, which can be operated by engineers without knowledge about Artificial Intelligence (AI). △ Less

Submitted 4 June, 2020; originally announced June 2020.

Comments: 15 pages, 18 figures

arXiv:2003.14172 [pdf, other]

A novel Algorithm for Hydrostatic-mechanical Mobile Machines with a Dual-Clutch Transmission

Authors: Yusheng Xiang, Ruoyu Li, Christine Brach, Xiaole Liu, Marcus Geimer

Abstract: Mobile machines using a hydrostatic transmission is highly efficient under lower working-speed condition but less capable at higher transport velocities. To enhance overall efficiency, we have improved the powertrain design by combining a hydrostatic transmission with a dual-clutch transmission (DCT). Compared with other mechanical gearboxes, the DCT avoids the interruption of torque transmission… ▽ More Mobile machines using a hydrostatic transmission is highly efficient under lower working-speed condition but less capable at higher transport velocities. To enhance overall efficiency, we have improved the powertrain design by combining a hydrostatic transmission with a dual-clutch transmission (DCT). Compared with other mechanical gearboxes, the DCT avoids the interruption of torque transmission in the process of shifting without sacrificing more transmission efficiency. However, there are some problems of unstable torque transmission during the shifting process, and an excessive torque drop occurring at the end of the gear shift, which result in a poor drive comfort. To enhance the performance of the novel structural possibility of powertrain design, we designed a novel control strategy for the motor torque and the clutch torques during the shifting process. The controller's task is tracking the target drive torque and completing the gear shift as quickly as possible with acceptable smoothness requirements. In the process of the controller design, a specific control algorithm is firstly designed for a 10 ton wheel loader. The model-based simulation has validated the control effect. As a result, the control strategy employing clutch and motor torque control to achieve a smooth shifting process is feasible since drive torque is well tracked, and highly dynamical actuators are not required. Furthermore, only two calibration parameters are designed to adjust the control performance systematically. The novel control strategy is generalized for other mobile machines with different sizes. △ Less

Submitted 23 April, 2020; v1 submitted 31 March, 2020; originally announced March 2020.

Comments: 8 pages, 10 figures

arXiv:2003.10011 [pdf, other]

Optimization of Operation Strategy for Primary Torque based hydrostatic Drivetrain using Artificial Intelligence

Authors: Yusheng Xiang, Marcus Geimer

Abstract: A new primary torque control concept for hydrostatics mobile machines was introduced in 2018. The mentioned concept controls the pressure in a closed circuit by changing the angle of the hydraulic pump to achieve the desired pressure based on a feedback system. Thanks to this concept, a series of advantages are expected. However, while working in a Y cycle, the primary torque-controlled wheel load… ▽ More A new primary torque control concept for hydrostatics mobile machines was introduced in 2018. The mentioned concept controls the pressure in a closed circuit by changing the angle of the hydraulic pump to achieve the desired pressure based on a feedback system. Thanks to this concept, a series of advantages are expected. However, while working in a Y cycle, the primary torque-controlled wheel loader has worse performance in efficiency compared to secondary controlled earthmover due to lack of recuperation ability. Alternatively, we use deep learning algorithms to improve machines' regeneration performance. In this paper, we firstly make a potential analysis to show the benefit by utilizing the regeneration process, followed by proposing a series of CRDNNs, which combine CNN, RNN, and DNN, to precisely detect Y cycles. Compared to existing algorithms, the CRDNN with bi-directional LSTMs has the best accuracy, and the CRDNN with LSTMs has a comparable performance but much fewer training parameters. Based on our dataset including 119 truck loading cycles, our best neural network shows a 98.2% test accuracy. Therefore, even with a simple regeneration process, our algorithm can improve the holistic efficiency of mobile machines up to 9% during Y cycle processes if primary torque concept is used. △ Less

Submitted 31 March, 2020; v1 submitted 22 March, 2020; originally announced March 2020.

Comments: 9 pages, 23 figures

arXiv:2002.10429 [pdf]

Distributed Frequency Emergency Control with Coordinated Edge Intelligence

Authors: Yingmeng Xiang, Zhehan Yi, Xiao Lu, Zhe Yu, Di Shi, Chunlei Xu, Xueming Li, Zhiwei Wang

Abstract: Develo** effective strategies to rapidly support grid frequency while minimizing loss in case of severe contingencies is an important requirement in power systems. While distributed responsive load demands are commonly adopted for frequency regulation, it is difficult to achieve both rapid response and global accuracy in a practical and cost-effective manner. In this paper, the cyber-physical de… ▽ More Develo** effective strategies to rapidly support grid frequency while minimizing loss in case of severe contingencies is an important requirement in power systems. While distributed responsive load demands are commonly adopted for frequency regulation, it is difficult to achieve both rapid response and global accuracy in a practical and cost-effective manner. In this paper, the cyber-physical design of an Internet-of-Things (IoT) enabled system, called Grid Sense, is presented. Grid Sense utilizes a large number of distributed appliances for frequency emergency support. It features a local power loss $ΔP$ estimation approach for frequency emergency control based on coordinated edge intelligence. The specifically designed smart outlets of Grid Sense detect the frequency disturbance event locally using the parameters sent from the control center to estimate active power loss in the system and to make rapid and accurate switching decisions soon after a severe contingency. Based on a modified IEEE 24-bus system, numerical simulations and hardware experiments are conducted to demonstrate the frequency support performance of Grid Sense in the aspects of accuracy and speed. It is shown that Grid Sense equipped with its local $ΔP$-estimation frequency control approach can accurately and rapidly prevent the drop of frequency after a major power loss. △ Less

Submitted 24 February, 2020; originally announced February 2020.

arXiv:1909.08980 [pdf, other]

doi 10.1364/BOE.380798

SNR Enhancement in Brillouin Microspectroscopy using Spectrum Reconstruction

Authors: YuChen Xiang, Matthew R. Foreman, Peter Török

Abstract: Brillouin imaging suffers from intrinsically low signal-to-noise ratios (SNR). Such low SNRs can render common data analysis protocols unreliable, especially for SNRs below $\sim10$. In this work we exploit two denoising algorithms, namely maximum entropy reconstruction (MER) and wavelet analysis (WA), to improve the accuracy and precision in determination of Brillouin shifts and linewidth. Algori… ▽ More Brillouin imaging suffers from intrinsically low signal-to-noise ratios (SNR). Such low SNRs can render common data analysis protocols unreliable, especially for SNRs below $\sim10$. In this work we exploit two denoising algorithms, namely maximum entropy reconstruction (MER) and wavelet analysis (WA), to improve the accuracy and precision in determination of Brillouin shifts and linewidth. Algorithm performance is quantified using Monte-Carlo simulations and benchmarked against the Cramér-Rao lower bound. Superior estimation results are demonstrated even at low SNRS ($\geq 1$). Denoising was furthermore applied to experimental Brillouin spectra of distilled water at room temperature, allowing the speed of sound in water to be extracted. Experimental and theoretical values were found to be consistent to within $\pm1\%$ at unity SNR. △ Less

Submitted 23 January, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

Journal ref: Biomedical Optics Express Vol. 11, Issue 2, pp. 1020-1031 (2020)

arXiv:1908.10487 [pdf, ps, other]

doi 10.1109/TSP.2020.2970343

Gridless Parameter Estimation for One-Bit MIMO Radar with Time-Varying Thresholds

Authors: Feng Xi, Yijian Xiang, Shengyao Chen, Arye Nehorai

Abstract: We investigate the one-bit MIMO (1b-MIMO) radar that performs one-bit sampling with a time-varying threshold in the temporal domain and employs compressive sensing in the spatial and Doppler domains. The goals are to significantly reduce the hardware cost, energy consumption, and amount of stored data. The joint angle and Doppler frequency estimations from noisy one-bit data are studied. By showin… ▽ More We investigate the one-bit MIMO (1b-MIMO) radar that performs one-bit sampling with a time-varying threshold in the temporal domain and employs compressive sensing in the spatial and Doppler domains. The goals are to significantly reduce the hardware cost, energy consumption, and amount of stored data. The joint angle and Doppler frequency estimations from noisy one-bit data are studied. By showing that the effect of noise on one-bit sampling is equivalent to that of sparse impulsive perturbations, we formulate the one-bit $\ell_1$-regularized atomic-norm minimization (1b-ANM-L1) problem to achieve gridless parameter estimation with high accuracy. We also develop an iterative method for solving the 1b-ANM-L1 problem via the alternating direction method of multipliers. The Cram$\acute{\text{e}}$r-Rao bound (CRB) of the 1b-MIMO radar is analyzed, and the analytical performance of one-bit sampling with two different threshold strategies is discussed. Numerical experiments are presented to show that the 1b-MIMO radar can achieve high-resolution parameter estimation with a largely reduced amount of data. △ Less

Submitted 7 February, 2020; v1 submitted 27 August, 2019; originally announced August 2019.

Comments: 31 pages, 12 figures

Showing 1–50 of 52 results for author: Xiang, Y