Search | arXiv e-print repository

Degradation Estimation Recurrent Neural Network with Local and Non-Local Priors for Compressive Spectral Imaging

Authors: Yubo Dong, Dahua Gao, Yuyan Li, Guangming Shi, Danhua Liu

Abstract: In the Coded Aperture Snapshot Spectral Imaging (CASSI) system, deep unfolding networks (DUNs) have demonstrated excellent performance in recovering 3D hyperspectral images (HSIs) from 2D measurements. However, some noticeable gaps exist between the imaging model used in DUNs and the real CASSI imaging process, such as the sensing error as well as photon and dark current noise, compromising the ac… ▽ More In the Coded Aperture Snapshot Spectral Imaging (CASSI) system, deep unfolding networks (DUNs) have demonstrated excellent performance in recovering 3D hyperspectral images (HSIs) from 2D measurements. However, some noticeable gaps exist between the imaging model used in DUNs and the real CASSI imaging process, such as the sensing error as well as photon and dark current noise, compromising the accuracy of solving the data subproblem and the prior subproblem in DUNs. To address this issue, we propose a Degradation Estimation Network (DEN) to correct the imaging model used in DUNs by simultaneously estimating the sensing error and the noise level, thereby improving the performance of DUNs. Additionally, we propose an efficient Local and Non-local Transformer (LNLT) to solve the prior subproblem, which not only effectively models local and non-local similarities but also reduces the computational cost of the window-based global Multi-head Self-attention (MSA). Furthermore, we transform the DUN into a Recurrent Neural Network (RNN) by sharing parameters of DNNs across stages, which not only allows DNN to be trained more adequately but also significantly reduces the number of parameters. The proposed DERNN-LNLT achieves state-of-the-art (SOTA) performance with fewer parameters on both simulation and real datasets. △ Less

Submitted 14 January, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2309.15796 [pdf, other]

Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

Authors: Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel Povey, Sanjeev Khudanpur

Abstract: Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data. However, human annotators usually perform "non-verbatim" transcription, which can result in poorly trained models. In this paper, we propose Omni-temporal Classification (OTC), a novel training criterion that explicitly incorporates label uncertainties originating from such weak supervision. Thi… ▽ More Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data. However, human annotators usually perform "non-verbatim" transcription, which can result in poorly trained models. In this paper, we propose Omni-temporal Classification (OTC), a novel training criterion that explicitly incorporates label uncertainties originating from such weak supervision. This allows the model to effectively learn speech-text alignments while accommodating errors present in the training transcripts. OTC extends the conventional CTC objective for imperfect transcripts by leveraging weighted finite state transducers. Through experiments conducted on the LibriSpeech and LibriVox datasets, we demonstrate that training ASR models with OTC avoids performance degradation even with transcripts containing up to 70% errors, a scenario where CTC models fail completely. Our implementation is available at https://github.com/k2-fsa/icefall. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.13486 [pdf, other]

Gaining Insights into Denoising by Inpainting

Authors: Daniel Gaa, Vassillen Chizhov, Pascal Peter, Joachim Weickert, Robin Dirk Adam

Abstract: The filling-in effect of diffusion processes is a powerful tool for various image analysis tasks such as inpainting-based compression and dense optic flow computation. For noisy data, an interesting side effect occurs: The interpolated data have higher confidence, since they average information from many noisy sources. This observation forms the basis of our denoising by inpainting (DbI) framework… ▽ More The filling-in effect of diffusion processes is a powerful tool for various image analysis tasks such as inpainting-based compression and dense optic flow computation. For noisy data, an interesting side effect occurs: The interpolated data have higher confidence, since they average information from many noisy sources. This observation forms the basis of our denoising by inpainting (DbI) framework. It averages multiple inpainting results from different noisy subsets. Our goal is to obtain fundamental insights into key properties of DbI and its connections to existing methods. Like in inpainting-based image compression, we choose homogeneous diffusion as a very simple inpainting operator that performs well for highly optimized data. We propose several strategies to choose the location of the selected pixels. Moreover, to improve the global approximation quality further, we also allow to change the function values of the noisy pixels. In contrast to traditional denoising methods that adapt the operator to the data, our approach adapts the data to the operator. Experimentally we show that replacing homogeneous diffusion inpainting by biharmonic inpainting does not improve the reconstruction quality. This again emphasizes the importance of data adaptivity over operator adaptivity. On the foundational side, we establish deterministic and probabilistic theories with convergence estimates. In the non-adaptive 1-D case, we derive equivalence results between DbI on shifted regular grids and classical homogeneous diffusion filtering via an explicit relation between the density and the diffusion time. △ Less

Submitted 23 September, 2023; originally announced September 2023.

arXiv:2309.00313 [pdf, other]

Message Passing Based Block Sparse Signal Recovery for DOA Estimation Using Large Arrays

Authors: Yiwen Mao, Dawei Gao, Qinghua Guo, Ming **

Abstract: This work deals with directional of arrival (DOA) estimation with a large antenna array. We first develop a novel signal model with a sparse system transfer matrix using an inverse discrete Fourier transform (DFT) operation, which leads to the formulation of a structured block sparse signal recovery problem with a sparse sensing matrix. This enables the development of a low complexity message pass… ▽ More This work deals with directional of arrival (DOA) estimation with a large antenna array. We first develop a novel signal model with a sparse system transfer matrix using an inverse discrete Fourier transform (DFT) operation, which leads to the formulation of a structured block sparse signal recovery problem with a sparse sensing matrix. This enables the development of a low complexity message passing based Bayesian algorithm with a factor graph representation. Simulation results demonstrate the superior performance of the proposed method. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2308.15990 [pdf, other]

Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Authors: Aoqi Guo, Sichong Qian, Baoxiang Li, Dazhi Gao

Abstract: Neural beamformers, which integrate both pre-separation and beamforming modules, have demonstrated impressive effectiveness in target speech extraction. Nevertheless, the performance of these beamformers is inherently limited by the predictive accuracy of the pre-separation module. In this paper, we introduce a neural beamformer supported by a dual-path transformer. Initially, we employ the cross-… ▽ More Neural beamformers, which integrate both pre-separation and beamforming modules, have demonstrated impressive effectiveness in target speech extraction. Nevertheless, the performance of these beamformers is inherently limited by the predictive accuracy of the pre-separation module. In this paper, we introduce a neural beamformer supported by a dual-path transformer. Initially, we employ the cross-attention mechanism in the time domain to extract crucial spatial information related to beamforming from the noisy covariance matrix. Subsequently, in the frequency domain, the self-attention mechanism is employed to enhance the model's ability to process frequency-specific details. By design, our model circumvents the influence of pre-separation modules, delivering performance in a more comprehensive end-to-end manner. Experimental results reveal that our model not only outperforms contemporary leading neural beamforming algorithms in separation performance but also achieves this with a significant reduction in parameter count. △ Less

Submitted 7 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.06547 [pdf, other]

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

Authors: Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan

Abstract: When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either fi… ▽ More When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens. Applying this loss function in pseudo-labeling requires detecting incorrect tokens in the predicted pseudo-labels. In this work, we adopt a confidence-based error detection method that identifies the incorrect tokens by comparing their confidence scores with a given threshold, thus necessitating the confidence score to be discriminative. Hence, the second proposed technique is the contrastive CTC loss function that widens the confidence gap between the correctly and incorrectly predicted tokens, thereby improving the error detection ability. Additionally, obtaining satisfactory performance with confidence-based error detection typically requires extensive threshold tuning. Instead, we propose an automatic thresholding method that uses labeled data as a proxy for determining the threshold, thus saving the pain of manual tuning. △ Less

Submitted 12 August, 2023; originally announced August 2023.

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2023

arXiv:2308.00152 [pdf, other]

doi 10.1109/ISGT51731.2023.10066345

A Hybrid Optimization and Deep Learning Algorithm for Cyber-resilient DER Control

Authors: Mohammad Panahazari, Matthew Koscak, Jianhua Zhang, Daqing Hou, **g Wang, David Wenzhong Gao

Abstract: With the proliferation of distributed energy resources (DERs) in the distribution grid, it is a challenge to effectively control a large number of DERs resilient to the communication and security disruptions, as well as to provide the online grid services, such as voltage regulation and virtual power plant (VPP) dispatch. To this end, a hybrid feedback-based optimization algorithm along with deep… ▽ More With the proliferation of distributed energy resources (DERs) in the distribution grid, it is a challenge to effectively control a large number of DERs resilient to the communication and security disruptions, as well as to provide the online grid services, such as voltage regulation and virtual power plant (VPP) dispatch. To this end, a hybrid feedback-based optimization algorithm along with deep learning forecasting technique is proposed to specifically address the cyber-related issues. The online decentralized feedback-based DER optimization control requires timely, accurate voltage measurement from the grid. However, in practice such information may not be received by the control center or even be corrupted. Therefore, the long short-term memory (LSTM) deep learning algorithm is employed to forecast delayed/missed/attacked messages with high accuracy. The IEEE 37-node feeder with high penetration of PV systems is used to validate the efficiency of the proposed hybrid algorithm. The results show that 1) the LSTM-forecasted lost voltage can effectively improve the performance of the DER control algorithm in the practical cyber-physical architecture; and 2) the LSTM forecasting strategy outperforms other strategies of using previous message and skip** dual parameter update. △ Less

Submitted 31 July, 2023; originally announced August 2023.

Comments: 5 pages

Journal ref: 2023 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT)

arXiv:2306.15942 [pdf, other]

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

Authors: Aoqi Guo, Junnan Wu, Peng Gao, Wenbo Zhu, Qinwen Guo, Dazhi Gao, Yujun Wang

Abstract: Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input fea… ▽ More Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input features and improve the estimation accuracy of the speech pre-separation module by avoiding information loss caused by direct dimensionality reduction in other models. Furthermore, we introduce a multi-head cross-attention mechanism that enhances the neural beamformer's perception of spatial information by making full use of the spatial information received by the array. Experimental results demonstrate that our approach, which incorporates a more reasonable target mask estimation network and a spatial information-based cross-attention mechanism into the neural beamformer, effectively improves speech separation performance. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.01031 [pdf, other]

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

Authors: Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur

Abstract: This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models. To address this problem, we propose Bypass Temporal Classification (BTC) as an expansion of the Connectionist Temporal Classification (CTC) cr… ▽ More This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models. To address this problem, we propose Bypass Temporal Classification (BTC) as an expansion of the Connectionist Temporal Classification (CTC) criterion. BTC explicitly encodes the uncertainties associated with transcripts during training. This is accomplished by enhancing the flexibility of the training graph, which is implemented as a weighted finite-state transducer (WFST) composition. The proposed algorithm improves the robustness and accuracy of ASR systems, particularly when working with imprecisely transcribed speech corpora. Our implementation will be open-sourced. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2303.14701 [pdf, ps, other]

Mathematical Characterization of Signal Semantics and Rethinking of the Mathematical Theory of Information

Authors: Guangming Shi, Dahua Gao, Shuai Ma, Minxi Yang, Yong Xiao, Xuemei Xie

Abstract: Shannon information theory is established based on probability and bits, and the communication technology based on this theory realizes the information age. The original goal of Shannon's information theory is to describe and transmit information content. However, due to information is related to cognition, and cognition is considered to be subjective, Shannon information theory is to describe and… ▽ More Shannon information theory is established based on probability and bits, and the communication technology based on this theory realizes the information age. The original goal of Shannon's information theory is to describe and transmit information content. However, due to information is related to cognition, and cognition is considered to be subjective, Shannon information theory is to describe and transmit information-bearing signals. With the development of the information age to the intelligent age, the traditional signal-oriented processing needs to be upgraded to content-oriented processing. For example, chat generative pre-trained transformer (ChatGPT) has initially realized the content processing capability based on massive data. For many years, researchers have been searching for the answer to what the information content in the signal is, because only when the information content is mathematically and accurately described can information-based machines be truly intelligent. This paper starts from rethinking the essence of the basic concepts of the information, such as semantics, meaning, information and knowledge, presents the mathematical characterization of the information content, investigate the relationship between them, studies the transformation from Shannon's signal information theory to semantic information theory, and therefore proposes a content-oriented semantic communication framework. Furthermore, we propose semantic decomposition and composition scheme to achieve conversion between complex and simple semantics. Finally, we verify the proposed characterization of information-related concepts by implementing evolvable knowledge-based semantic recognition. △ Less

Submitted 26 March, 2023; originally announced March 2023.

arXiv:2303.08607 [pdf, other]

PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

Authors: Yuning Wu, Jiatong Shi, Tao Qian, Dongji Gao, Qin **

Abstract: Singing voice synthesis (SVS), as a specific task for generating the vocal singing voice from a music score, has drawn much attention in recent years. SVS faces the challenge that the singing has various pronunciation flexibility conditioned on the same music score. Most of the previous works of SVS can not well handle the misalignment between the music score and actual singing. In this paper, we… ▽ More Singing voice synthesis (SVS), as a specific task for generating the vocal singing voice from a music score, has drawn much attention in recent years. SVS faces the challenge that the singing has various pronunciation flexibility conditioned on the same music score. Most of the previous works of SVS can not well handle the misalignment between the music score and actual singing. In this paper, we propose an acoustic feature processing strategy, named PHONEix, with a phoneme distribution predictor, to alleviate the gap between the music score and the singing voice, which can be easily adopted in different SVS systems. Extensive experiments in various settings demonstrate the effectiveness of our PHONEix in both objective and subjective evaluations. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2303.01892 [pdf, other]

Features Disentangled Semantic Broadcast Communication Networks

Authors: Shuai Ma, Weining Qiao, Youlong Wu, Hang Li, Guangming Shi, Dahua Gao, Yuanming Shi, Shiyin Li, Naofal Al-Dhahir

Abstract: Single-user semantic communications have attracted extensive research recently, but multi-user semantic broadcast communication (BC) is still in its infancy. In this paper, we propose a practical robust features-disentangled multi-user semantic BC framework, where the transmitter includes a feature selection module and each user has a feature completion module. Instead of broadcasting all extracte… ▽ More Single-user semantic communications have attracted extensive research recently, but multi-user semantic broadcast communication (BC) is still in its infancy. In this paper, we propose a practical robust features-disentangled multi-user semantic BC framework, where the transmitter includes a feature selection module and each user has a feature completion module. Instead of broadcasting all extracted features, the semantic encoder extracts the disentangled semantic features, and then only the users' intended semantic features are selected for broadcasting, which can further improve the transmission efficiency. Within this framework, we further investigate two information-theoretic metrics, including the ultimate compression rate under both the distortion and perception constraints, and the achievable rate region of the semantic BC. Furthermore, to realize the proposed semantic BC framework, we design a lightweight robust semantic BC network by exploiting a supervised autoencoder (AE), which can controllably disentangle sematic features. Moreover, we design the first hardware proof-of-concept prototype of the semantic BC network, where the proposed semantic BC network can be implemented in real time. Simulations and experiments demonstrate that the proposed robust semantic BC network can significantly improve transmission efficiency. △ Less

Submitted 3 March, 2023; originally announced March 2023.

arXiv:2302.13560 [pdf, other]

Task-oriented Explainable Semantic Communications

Authors: Shuai Ma, Weining Qiao, Youlong Wu, Hang Li, Guangming Shi, Dahua Gao, Yuanming Shi, Shiyin Li, Naofal Al-Dhahir

Abstract: Semantic communications utilize the transceiver computing resources to alleviate scarce transmission resources, such as bandwidth and energy. Although the conventional deep learning (DL) based designs may achieve certain transmission efficiency, the uninterpretability issue of extracted features is the major challenge in the development of semantic communications. In this paper, we propose an expl… ▽ More Semantic communications utilize the transceiver computing resources to alleviate scarce transmission resources, such as bandwidth and energy. Although the conventional deep learning (DL) based designs may achieve certain transmission efficiency, the uninterpretability issue of extracted features is the major challenge in the development of semantic communications. In this paper, we propose an explainable and robust semantic communication framework by incorporating the well-established bit-level communication system, which not only extracts and disentangles features into independent and semantically interpretable features, but also only selects task-relevant features for transmission, instead of all extracted features. Based on this framework, we derive the optimal input for rate-distortion-perception theory, and derive both lower and upper bounds on the semantic channel capacity. Furthermore, based on the $β$-variational autoencoder ($β$-VAE), we propose a practical explainable semantic communication system design, which simultaneously achieves semantic features selection and is robust against semantic channel noise. We further design a real-time wireless mobile semantic communication proof-of-concept prototype. Our simulations and experiments demonstrate that our proposed explainable semantic communications system can significantly improve transmission efficiency, and also verify the effectiveness of our proposed robust semantic transmission scheme. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2301.00833 [pdf, other]

Hyperuniform disordered parametric loudspeaker array

Authors: Kun Tang, Yuqi Wang, Shaobo Wang, Da Gao, Haojie Li, Xindong Liang, Patrick Sebbah, Yibin Li, ** Zhang, Junhui Shi

Abstract: A steerable parametric loudspeaker array is known for its directivity and narrow beam width. However, it often suffers from the grating lobes due to periodic array distributions. Here we propose the array configuration of hyperuniform disorder, which is short-range random while correlated at large scales, as a promising alternative distribution of acoustic antennas in phased arrays. Angle-resolved… ▽ More A steerable parametric loudspeaker array is known for its directivity and narrow beam width. However, it often suffers from the grating lobes due to periodic array distributions. Here we propose the array configuration of hyperuniform disorder, which is short-range random while correlated at large scales, as a promising alternative distribution of acoustic antennas in phased arrays. Angle-resolved measurements reveal that the proposed array suppresses grating lobes and maintains a minimal radiation region in the vicinity of the main lobe for the primary frequency waves. These distinctive emission features benefit the secondary frequency wave in canceling the grating lobes regardless of the frequencies of the primary waves. Besides that, the hyperuniform disordered array is duplicatable, which facilitates extra-large array design without any additional computational efforts. △ Less

Submitted 13 April, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

arXiv:2211.17196 [pdf, other]

EURO: ESPnet Unsupervised ASR Open-source Toolkit

Authors: Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, Sanjeev Khudanpur

Abstract: This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extend… ▽ More This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate the toolkit's effectiveness and achieve state-of-the-art UASR performance on TIMIT and LibriSpeech datasets. EURO will be publicly available at https://github.com/espnet/espnet, aiming to promote this exciting and emerging research area based on UASR through open-source activity. △ Less

Submitted 20 May, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

arXiv:2211.10473 [pdf, other]

Dynamic Interactional And Cooperative Network For Shield Machine

Authors: Dazhi Gao, Rongyang Li, Hongbo Wang, Lingfeng Mao, Huansheng Ning

Abstract: The shield machine (SM) is a complex mechanical device used for tunneling. However, the monitoring and deciding were mainly done by artificial experience during traditional construction, which brought some limitations, such as hidden mechanical failures, human operator error, and sensor anomalies. To deal with these challenges, many scholars have studied SM intelligent methods. Most of these metho… ▽ More The shield machine (SM) is a complex mechanical device used for tunneling. However, the monitoring and deciding were mainly done by artificial experience during traditional construction, which brought some limitations, such as hidden mechanical failures, human operator error, and sensor anomalies. To deal with these challenges, many scholars have studied SM intelligent methods. Most of these methods only take SM into account but do not consider the SM operating environment. So, this paper discussed the relationship among SM, geological information, and control terminals. Then, according to the relationship, models were established for the control terminal, including SM rate prediction and SM anomaly detection. The experimental results show that compared with baseline models, the proposed models in this paper perform better. In the proposed model, the R2 and MSE of rate prediction can reach 92.2\%, and 0.0064 respectively. The abnormal detection rate of anomaly detection is up to 98.2\%. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.06891 [pdf, other]

Residual Degradation Learning Unfolding Framework with Mixing Priors across Spectral and Spatial for Compressive Spectral Imaging

Authors: Yubo Dong, Dahua Gao, Tian Qiu, Yuyan Li, Minxi Yang, Guangming Shi

Abstract: To acquire a snapshot spectral image, coded aperture snapshot spectral imaging (CASSI) is proposed. A core problem of the CASSI system is to recover the reliable and fine underlying 3D spectral cube from the 2D measurement. By alternately solving a data subproblem and a prior subproblem, deep unfolding methods achieve good performance. However, in the data subproblem, the used sensing matrix is il… ▽ More To acquire a snapshot spectral image, coded aperture snapshot spectral imaging (CASSI) is proposed. A core problem of the CASSI system is to recover the reliable and fine underlying 3D spectral cube from the 2D measurement. By alternately solving a data subproblem and a prior subproblem, deep unfolding methods achieve good performance. However, in the data subproblem, the used sensing matrix is ill-suited for the real degradation process due to the device errors caused by phase aberration, distortion; in the prior subproblem, it is important to design a suitable model to jointly exploit both spatial and spectral priors. In this paper, we propose a Residual Degradation Learning Unfolding Framework (RDLUF), which bridges the gap between the sensing matrix and the degradation process. Moreover, a Mix$S^2$ Transformer is designed via mixing priors across spectral and spatial to strengthen the spectral-spatial representation capability. Finally, plugging the Mix$S^2$ Transformer into the RDLUF leads to an end-to-end trainable neural network RDLUF-Mix$S^2$. Experimental results establish the superior performance of the proposed method over existing ones. △ Less

Submitted 15 November, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: CVPR 2023

arXiv:2211.04847 [pdf, other]

Hyper-Parameter Auto-Tuning for Sparse Bayesian Learning

Authors: Dawei Gao, Qinghua Guo, Ming **, Guisheng Liao, Yonina C. Eldar

Abstract: Choosing the values of hyper-parameters in sparse Bayesian learning (SBL) can significantly impact performance. However, the hyper-parameters are normally tuned manually, which is often a difficult task. Most recently, effective automatic hyper-parameter tuning was achieved by using an empirical auto-tuner. In this work, we address the issue of hyper-parameter auto-tuning using neural network (NN)… ▽ More Choosing the values of hyper-parameters in sparse Bayesian learning (SBL) can significantly impact performance. However, the hyper-parameters are normally tuned manually, which is often a difficult task. Most recently, effective automatic hyper-parameter tuning was achieved by using an empirical auto-tuner. In this work, we address the issue of hyper-parameter auto-tuning using neural network (NN)-based learning. Inspired by the empirical auto-tuner, we design and learn a NN-based auto-tuner, and show that considerable improvement in convergence rate and recovery performance can be achieved. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2211.03025 [pdf, other]

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

Authors: Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, Hung-yi Lee

Abstract: Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improvements to various SLU tasks. However, because of the mismatched modalities between speech signals and text tokens, previous methods usually need comple… ▽ More Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improvements to various SLU tasks. However, because of the mismatched modalities between speech signals and text tokens, previous methods usually need complex designs of the frameworks. This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models, resulting in an unsupervised speech-to-semantic pre-trained model for various tasks in SLU. To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models. Our experiments show that unsupervised ASR itself can improve the representations from speech self-supervised models. More importantly, it is shown as an efficient connector between speech and textual pre-trained models, improving the performances of five different SLU tasks. Notably, on spoken question answering, we reach the state-of-the-art result over the challenging NMSQA benchmark. △ Less

Submitted 6 November, 2022; originally announced November 2022.

Comments: ICASSP2023 submission

arXiv:2210.03911 [pdf, other]

Signal Detection in MIMO Systems with Hardware Imperfections: Message Passing on Neural Networks

Authors: Dawei Gao, Qinghua Guo, Guisheng Liao, Yonina C. Eldar, Yonghui Li, Yanguang Yu, Branka Vucetic

Abstract: In this paper, we investigate signal detection in multiple-input-multiple-output (MIMO) communication systems with hardware impairments, such as power amplifier nonlinearity and in-phase/quadrature imbalance. To deal with the complex combined effects of hardware imperfections, neural network (NN) techniques, in particular deep neural networks (DNNs), have been studied to directly compensate for th… ▽ More In this paper, we investigate signal detection in multiple-input-multiple-output (MIMO) communication systems with hardware impairments, such as power amplifier nonlinearity and in-phase/quadrature imbalance. To deal with the complex combined effects of hardware imperfections, neural network (NN) techniques, in particular deep neural networks (DNNs), have been studied to directly compensate for the impact of hardware impairments. However, it is difficult to train a DNN with limited pilot signals, hindering its practical applications. In this work, we investigate how to achieve efficient Bayesian signal detection in MIMO systems with hardware imperfections. Characterizing combined hardware imperfections often leads to complicated signal models, making Bayesian signal detection challenging. To address this issue, we first train an NN to "model" the MIMO system with hardware imperfections and then perform Bayesian inference based on the trained NN. Modelling the MIMO system with NN enables the design of NN architectures based on the signal flow of the MIMO system, minimizing the number of NN layers and parameters, which is crucial to achieving efficient training with limited pilot signals. We then represent the trained NN with a factor graph, and design an efficient message passing based Bayesian signal detector, leveraging the unitary approximate message passing (UAMP) algorithm. The implementation of a turbo receiver with the proposed Bayesian detector is also investigated. Extensive simulation results demonstrate that the proposed technique delivers remarkably better performance than state-of-the-art methods. △ Less

Submitted 8 October, 2022; originally announced October 2022.

arXiv:2108.01129 [pdf, other]

Decoupling recognition and transcription in Mandarin ASR

Authors: Jiahong Yuan, Xingyu Cai, Dongji Gao, Renjie Zheng, Liang Huang, Kenneth Church

Abstract: Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end approach. Unlike English where the writing system is closely related to sound, Chinese characters (Hanzi) represent meaning, not sound. We propose factoring audio -> Hanzi into two sub-tasks: (1) audio -> Pinyin and (2) Pinyin -> Hanzi, where Pinyin is a system of phonetic transcription of standard Chinese.… ▽ More Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end approach. Unlike English where the writing system is closely related to sound, Chinese characters (Hanzi) represent meaning, not sound. We propose factoring audio -> Hanzi into two sub-tasks: (1) audio -> Pinyin and (2) Pinyin -> Hanzi, where Pinyin is a system of phonetic transcription of standard Chinese. Factoring the audio -> Hanzi task in this way achieves 3.9% CER (character error rate) on the Aishell-1 corpus, the best result reported on this dataset so far. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: submitted to ASRU 2021

arXiv:2007.00221 [pdf, other]

Massive MIMO As an Extreme Learning Machine

Authors: Dawei Gao, Qinghua Guo, Yonina C. Eldar

Abstract: This work shows that a massive multiple-input multiple-output (MIMO) system with low-resolution analog-to-digital converters (ADCs) forms a natural extreme learning machine (ELM). The receive antennas at the base station serve as the hidden nodes of the ELM, and the low-resolution ADCs act as the ELM activation function. By adding random biases to the received signals and optimizing the ELM output… ▽ More This work shows that a massive multiple-input multiple-output (MIMO) system with low-resolution analog-to-digital converters (ADCs) forms a natural extreme learning machine (ELM). The receive antennas at the base station serve as the hidden nodes of the ELM, and the low-resolution ADCs act as the ELM activation function. By adding random biases to the received signals and optimizing the ELM output weights, the system can effectively tackle hardware impairments, such as the nonlinearity of power amplifiers and the low-resolution ADCs. Moreover, the fast adaptive capability of ELM allows the design of an adaptive receiver to address time-varying effects of MIMO channels. Simulations demonstrate the promising performance of the ELM-based receiver compared to conventional receivers in dealing with hardware impairments. △ Less

Submitted 28 December, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: 5 pages, 6 figures (including subfigures); significant changes were made; paper has been accepted by IEEE TVT

arXiv:2005.09801 [pdf]

FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval

Authors: Dehong Gao, Linbo **, Ben Chen, Minghui Qiu, Peng Li, Yi Wei, Yi Hu, Hao Wang

Abstract: In this paper, we address the text and image matching in cross-modal retrieval of the fashion industry. Different from the matching in the general domain, the fashion matching is required to pay much more attention to the fine-grained information in the fashion images and texts. Pioneer approaches detect the region of interests (i.e., RoIs) from images and use the RoI embeddings as image represent… ▽ More In this paper, we address the text and image matching in cross-modal retrieval of the fashion industry. Different from the matching in the general domain, the fashion matching is required to pay much more attention to the fine-grained information in the fashion images and texts. Pioneer approaches detect the region of interests (i.e., RoIs) from images and use the RoI embeddings as image representations. In general, RoIs tend to represent the "object-level" information in the fashion images, while fashion texts are prone to describe more detailed information, e.g. styles, attributes. RoIs are thus not fine-grained enough for fashion text and image matching. To this end, we propose FashionBERT, which leverages patches as image features. With the pre-trained BERT model as the backbone network, FashionBERT learns high level representations of texts and images. Meanwhile, we propose an adaptive loss to trade off multitask learning in the FashionBERT modeling. Two tasks (i.e., text and image matching and cross-modal retrieval) are incorporated to evaluate FashionBERT. On the public dataset, experiments demonstrate FashionBERT achieves significant improvements in performances than the baseline and state-of-the-art approaches. In practice, FashionBERT is applied in a concrete cross-modal retrieval application. We provide the detailed matching performance and inference efficiency analysis. △ Less

Submitted 29 May, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

Comments: 10 pages, to be published in SIGIR20 Industry Track

arXiv:2005.05535 [pdf, other]

DeepFaceLab: Integrated, flexible and extensible face-swap** framework

Authors: Ivan Perov, Daiheng Gao, Nikolay Chervoniy, Kunlin Liu, Sugasa Marangonda, Chris Umé, Mr. Dpfks, Carl Shift Facenheim, Luis RP, Jian Jiang, Sheng Zhang, **yu Wu, Bo Zhou, Weiming Zhang

Abstract: Deepfake defense not only requires the research of detection but also requires the efforts of generation methods. However, current deepfake methods suffer the effects of obscure workflow and poor performance. To solve this problem, we present DeepFaceLab, the current dominant deepfake framework for face-swap**. It provides the necessary tools as well as an easy-to-use way to conduct high-quality… ▽ More Deepfake defense not only requires the research of detection but also requires the efforts of generation methods. However, current deepfake methods suffer the effects of obscure workflow and poor performance. To solve this problem, we present DeepFaceLab, the current dominant deepfake framework for face-swap**. It provides the necessary tools as well as an easy-to-use way to conduct high-quality face-swap**. It also offers a flexible and loose coupling structure for people who need to strengthen their pipeline with other features without writing complicated boilerplate code. We detail the principles that drive the implementation of DeepFaceLab and introduce its pipeline, through which every aspect of the pipeline can be modified painlessly by users to achieve their customization purpose. It is noteworthy that DeepFaceLab could achieve cinema-quality results with high fidelity. We demonstrate the advantage of our system by comparing our approach with other face-swap** methods.For more information, please visit:https://github.com/iperov/DeepFaceLab/. △ Less

Submitted 29 June, 2021; v1 submitted 11 May, 2020; originally announced May 2020.

arXiv:2004.12321 [pdf, other]

doi 10.1109/EMBC44109.2020.9175344

Federated Transfer Learning for EEG Signal Classification

Authors: Ce Ju, Dashan Gao, Ravikiran Mane, Ben Tan, Yang Liu, Cuntai Guan

Abstract: The success of deep learning (DL) methods in the Brain-Computer Interfaces (BCI) field for classification of electroencephalographic (EEG) recordings has been restricted by the lack of large datasets. Privacy concerns associated with EEG signals limit the possibility of constructing a large EEG-BCI dataset by the conglomeration of multiple small ones for jointly training machine learning models. H… ▽ More The success of deep learning (DL) methods in the Brain-Computer Interfaces (BCI) field for classification of electroencephalographic (EEG) recordings has been restricted by the lack of large datasets. Privacy concerns associated with EEG signals limit the possibility of constructing a large EEG-BCI dataset by the conglomeration of multiple small ones for jointly training machine learning models. Hence, in this paper, we propose a novel privacy-preserving DL architecture named federated transfer learning (FTL) for EEG classification that is based on the federated learning framework. Working with the single-trial covariance matrix, the proposed architecture extracts common discriminative information from multi-subject EEG data with the help of domain adaptation techniques. We evaluate the performance of the proposed architecture on the PhysioNet dataset for 2-class motor imagery classification. While avoiding the actual data sharing, our FTL approach achieves 2% higher classification accuracy in a subject-adaptive analysis. Also, in the absence of multi-subject data, our architecture provides 6% better accuracy compared to other state-of-the-art DL architectures. △ Less

Submitted 25 January, 2021; v1 submitted 26 April, 2020; originally announced April 2020.

Comments: 6 pages, 2 figures, Accepted for IEEE Engineering in Medicine and Biology Society (EMBC) 2020 GitHub: https://github.com/DashanGao/Federated-Transfer-Leraning-for-EEG

ACM Class: I.5.4

Journal ref: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 2020, pp. 3040-3045

arXiv:2004.01997 [pdf, other]

doi 10.1007/978-3-030-32226-7_20

Volumetric Attention for 3D Medical Image Segmentation and Detection

Authors: Xudong Wang, Shizhong Han, Yunqiang Chen, Dashan Gao, Nuno Vasconcelos

Abstract: A volumetric attention(VA) module for 3D medical image segmentation and detection is proposed. VA attention is inspired by recent advances in video processing, enables 2.5D networks to leverage context information along the z direction, and allows the use of pretrained 2D detection models when training data is limited, as is often the case for medical applications. Its integration in the Mask R-CN… ▽ More A volumetric attention(VA) module for 3D medical image segmentation and detection is proposed. VA attention is inspired by recent advances in video processing, enables 2.5D networks to leverage context information along the z direction, and allows the use of pretrained 2D detection models when training data is limited, as is often the case for medical applications. Its integration in the Mask R-CNN is shown to enable state-of-the-art performance on the Liver Tumor Segmentation (LiTS) Challenge, outperforming the previous challenge winner by 3.9 points and achieving top performance on the LiTS leader board at the time of paper submission. Detection experiments on the DeepLesion dataset also show that the addition of VA to existing object detectors enables a 69.1 sensitivity at 0.5 false positive per image, outperforming the best published results by 6.6 points. △ Less

Submitted 4 April, 2020; originally announced April 2020.

Comments: Accepted by MICCAI 2019

Journal ref: In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 175-184. Springer, Cham, 2019

arXiv:2003.10661 [pdf]

doi 10.1121/10.0001125

Training a U-Net based on a random mode-coupling matrix model to recover acoustic interference striations

Authors: Xiaolei Li, Wenhua Song, Dazhi Gao, Wei Gao, Haozhong Wan

Abstract: A U-Net is trained to recover acoustic interference striations (AISs) from distorted ones. A random mode-coupling matrix model is introduced to generate a large number of training data quickly, which are used to train the U-Net. The performance of AIS recovery of the U-Net is tested in range-dependent waveguides with nonlinear internal waves (NLIWs). Although the random mode-coupling matrix model… ▽ More A U-Net is trained to recover acoustic interference striations (AISs) from distorted ones. A random mode-coupling matrix model is introduced to generate a large number of training data quickly, which are used to train the U-Net. The performance of AIS recovery of the U-Net is tested in range-dependent waveguides with nonlinear internal waves (NLIWs). Although the random mode-coupling matrix model is not an accurate physical model, the test results show that the U-Net successfully recovers AISs under different signal-to-noise ratios (SNRs) and different amplitudes and widths of NLIWs for different shapes. △ Less

Submitted 24 March, 2020; originally announced March 2020.

arXiv:2002.04639 [pdf, other]

Validating uncertainty in medical image translation

Authors: Jacob C. Reinhold, Yufan He, Shizhong Han, Yunqiang Chen, Dashan Gao, Junghoon Lee, Jerry L. Prince, Aaron Carass

Abstract: Medical images are increasingly used as input to deep neural networks to produce quantitative values that aid researchers and clinicians. However, standard deep neural networks do not provide a reliable measure of uncertainty in those quantitative values. Recent work has shown that using dropout during training and testing can provide estimates of uncertainty. In this work, we investigate using dr… ▽ More Medical images are increasingly used as input to deep neural networks to produce quantitative values that aid researchers and clinicians. However, standard deep neural networks do not provide a reliable measure of uncertainty in those quantitative values. Recent work has shown that using dropout during training and testing can provide estimates of uncertainty. In this work, we investigate using dropout to estimate epistemic and aleatoric uncertainty in a CT-to-MR image translation task. We show that both types of uncertainty are captured, as defined, providing confidence in the output uncertainty estimates. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Comments: IEEE ISBI 2020

arXiv:2002.04626 [pdf, other]

Finding novelty with uncertainty

Authors: Jacob C. Reinhold, Yufan He, Shizhong Han, Yunqiang Chen, Dashan Gao, Junghoon Lee, Jerry L. Prince, Aaron Carass

Abstract: Medical images are often used to detect and characterize pathology and disease; however, automatically identifying and segmenting pathology in medical images is challenging because the appearance of pathology across diseases varies widely. To address this challenge, we propose a Bayesian deep learning method that learns to translate healthy computed tomography images to magnetic resonance images a… ▽ More Medical images are often used to detect and characterize pathology and disease; however, automatically identifying and segmenting pathology in medical images is challenging because the appearance of pathology across diseases varies widely. To address this challenge, we propose a Bayesian deep learning method that learns to translate healthy computed tomography images to magnetic resonance images and simultaneously calculates voxel-wise uncertainty. Since high uncertainty occurs in pathological regions of the image, this uncertainty can be used for unsupervised anomaly segmentation. We show encouraging experimental results on an unsupervised anomaly segmentation task by combining two types of uncertainty into a novel quantity we call scibilic uncertainty. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Comments: SPIE Medical Imaging 2020

arXiv:2002.04102 [pdf]

Validation and Optimization of Multi-Organ Segmentation on Clinical Imaging Archives

Authors: Yuchen Xu, Olivia Tang, Yucheng Tang, Ho Hin Lee, Yunqiang Chen, Dashan Gao, Shizhong Han, Riqiang Gao, Michael R. Savona, Richard G. Abramson, Yuankai Huo, Bennett A. Landman

Abstract: Segmentation of abdominal computed tomography(CT) provides spatial context, morphological properties, and a framework for tissue-specific radiomics to guide quantitative Radiological assessment. A 2015 MICCAI challenge spurred substantial innovation in multi-organ abdominal CT segmentation with both traditional and deep learning methods. Recent innovations in deep methods have driven performance t… ▽ More Segmentation of abdominal computed tomography(CT) provides spatial context, morphological properties, and a framework for tissue-specific radiomics to guide quantitative Radiological assessment. A 2015 MICCAI challenge spurred substantial innovation in multi-organ abdominal CT segmentation with both traditional and deep learning methods. Recent innovations in deep methods have driven performance toward levels for which clinical translation is appealing. However, continued cross-validation on open datasets presents the risk of indirect knowledge contamination and could result in circular reasoning. Moreover, 'real world' segmentations can be challenging due to the wide variability of abdomen physiology within patients. Herein, we perform two data retrievals to capture clinically acquired deidentified abdominal CT cohorts with respect to a recently published variation on 3D U-Net (baseline algorithm). First, we retrieved 2004 deidentified studies on 476 patients with diagnosis codes involving spleen abnormalities (cohort A). Second, we retrieved 4313 deidentified studies on 1754 patients without diagnosis codes involving spleen abnormalities (cohort B). We perform prospective evaluation of the existing algorithm on both cohorts, yielding 13% and 8% failure rate, respectively. Then, we identified 51 subjects in cohort A with segmentation failures and manually corrected the liver and gallbladder labels. We re-trained the model adding the manual labels, resulting in performance improvement of 9% and 6% failure rate for the A and B cohorts, respectively. In summary, the performance of the baseline on the prospective cohorts was similar to that on previously published datasets. Moreover, adding data from the first cohort substantively improved performance when evaluated on the second withheld validation cohort. △ Less

Submitted 10 February, 2020; originally announced February 2020.

Comments: SPIE2020 Medical Imaging

arXiv:2001.03831 [pdf, other]

A Comparative Study for Non-rigid Image Registration and Rigid Image Registration

Authors: Xiaoran Zhang, Hexiang Dong, Di Gao, Xiao Zhao

Abstract: Image registration algorithms can be generally categorized into two groups: non-rigid and rigid. Recently, many deep learning-based algorithms employ a neural net to characterize non-rigid image registration function. However, do they always perform better? In this study, we compare the state-of-art deep learning-based non-rigid registration approach with rigid registration approach. The data is g… ▽ More Image registration algorithms can be generally categorized into two groups: non-rigid and rigid. Recently, many deep learning-based algorithms employ a neural net to characterize non-rigid image registration function. However, do they always perform better? In this study, we compare the state-of-art deep learning-based non-rigid registration approach with rigid registration approach. The data is generated from Kaggle Dog vs Cat Competition \url{https://www.kaggle.com/c/dogs-vs-cats/} and we test the algorithms' performance on rigid transformation including translation, rotation, scaling, shearing and pixelwise non-rigid transformation. The Voxelmorph is trained on rigidset and nonrigidset separately for comparison and we also add a gaussian blur layer to its original architecture to improve registration performance. The best quantitative results in both root-mean-square error (RMSE) and mean absolute error (MAE) metrics for rigid registration are produced by SimpleElastix and non-rigid registration by Voxelmorph. We select representative samples for visual assessment. △ Less

Submitted 11 January, 2020; originally announced January 2020.

arXiv:1912.00420 [pdf]

doi 10.1117/1.JMI.6.4.044005

Stochastic tissue window normalization of deep learning on computed tomography

Authors: Yuankai Huo, Yucheng Tang, Yunqiang Chen, Dashan Gao, Shizhong Han, Shunxing Bao, Smita De, James G. Terry, Jeffrey J. Carr, Richard G. Abramson, Bennett A. Landman

Abstract: Tissue window filtering has been widely used in deep learning for computed tomography (CT) image analyses to improve training performance (e.g., soft tissue windows for abdominal CT). However, the effectiveness of tissue window normalization is questionable since the generalizability of the trained model might be further harmed, especially when such models are applied to new cohorts with different… ▽ More Tissue window filtering has been widely used in deep learning for computed tomography (CT) image analyses to improve training performance (e.g., soft tissue windows for abdominal CT). However, the effectiveness of tissue window normalization is questionable since the generalizability of the trained model might be further harmed, especially when such models are applied to new cohorts with different CT reconstruction kernels, contrast mechanisms, dynamic variations in the acquisition, and physiological changes. We evaluate the effectiveness of both with and without using soft tissue window normalization on multisite CT cohorts. Moreover, we propose a stochastic tissue window normalization (SWN) method to improve the generalizability of tissue window normalization. Different from the random sampling, the SWN method centers the randomization around the soft tissue window to maintain the specificity for abdominal organs. To evaluate the performance of different strategies, 80 training and 453 validation and testing scans from six datasets are employed to perform multi-organ segmentation using standard 2D U-Net. The six datasets cover the scenarios, where the training and testing scans are from (1) same scanner and same population, (2) same CT contrast but different pathology, and (3) different CT contrast and pathology. The traditional soft tissue window and nonwindowed approaches achieved better performance on (1). The proposed SWN achieved general superior performance on (2) and (3) with statistical analyses, which offers better generalizability for a trained model. △ Less

Submitted 1 December, 2019; originally announced December 2019.

Journal ref: Journal of Medical Imaging 6.4 (2019): 044005

arXiv:1911.06395 [pdf]

Contrast Phase Classification with a Generative Adversarial Network

Authors: Yucheng Tang, Ho Hin Lee, Yuchen Xu, Olivia Tang, Yunqiang Chen, Dashan Gao, Shizhong Han, Riqiang Gao, Camilo Bermudez, Michael R. Savona, Richard G. Abramson, Yuankai Huo, Bennett A. Landman

Abstract: Dynamic contrast enhanced computed tomography (CT) is an imaging technique that provides critical information on the relationship of vascular structure and dynamics in the context of underlying anatomy. A key challenge for image processing with contrast enhanced CT is that phase discrepancies are latent in different tissues due to contrast protocols, vascular dynamics, and metabolism variance. Pre… ▽ More Dynamic contrast enhanced computed tomography (CT) is an imaging technique that provides critical information on the relationship of vascular structure and dynamics in the context of underlying anatomy. A key challenge for image processing with contrast enhanced CT is that phase discrepancies are latent in different tissues due to contrast protocols, vascular dynamics, and metabolism variance. Previous studies with deep learning frameworks have been proposed for classifying contrast enhancement with networks inspired by computer vision. Here, we revisit the challenge in the context of whole abdomen contrast enhanced CTs. To capture and compensate for the complex contrast changes, we propose a novel discriminator in the form of a multi-domain disentangled representation learning network. The goal of this network is to learn an intermediate representation that separates contrast enhancement from anatomy and enables classification of images with varying contrast time. Briefly, our unpaired contrast disentangling GAN(CD-GAN) Discriminator follows the ResNet architecture to classify a CT scan from different enhancement phases. To evaluate the approach, we trained the enhancement phase classifier on 21060 slices from two clinical cohorts of 230 subjects. Testing was performed on 9100 slices from 30 independent subjects who had been imaged with CT scans from all contrast phases. Performance was quantified in terms of the multi-class normalized confusion matrix. The proposed network significantly improved correspondence over baseline UNet, ResNet50 and StarGAN performance of accuracy scores 0.54. 0.55, 0.62 and 0.91, respectively. The proposed discriminator from the disentangled network presents a promising technique that may allow deeper modeling of dynamic imaging against patient specific anatomies. △ Less

Submitted 14 November, 2019; originally announced November 2019.

Comments: 8 pages, 4 figures

Journal ref: SPIE2020

arXiv:1911.05113 [pdf]

Semi-Supervised Multi-Organ Segmentation through Quality Assurance Supervision

Authors: Ho Hin Lee, Yucheng Tang, Olivia Tang, Yuchen Xu, Yunqiang Chen, Dashan Gao, Shizhong Han, Riqiang Gao, Michael R. Savona, Richard G. Abramson, Yuankai Huo, Bennett A. Landman

Abstract: Human in-the-loop quality assurance (QA) is typically performed after medical image segmentation to ensure that the systems are performing as intended, as well as identifying and excluding outliers. By performing QA on large-scale, previously unlabeled testing data, categorical QA scores can be generatedIn this paper, we propose a semi-supervised multi-organ segmentation deep neural network consis… ▽ More Human in-the-loop quality assurance (QA) is typically performed after medical image segmentation to ensure that the systems are performing as intended, as well as identifying and excluding outliers. By performing QA on large-scale, previously unlabeled testing data, categorical QA scores can be generatedIn this paper, we propose a semi-supervised multi-organ segmentation deep neural network consisting of a traditional segmentation model generator and a QA involved discriminator. A large-scale dataset of 2027 volumes are used to train the generator, whose 2-D montage images and segmentation mask with QA scores are used to train the discriminator. To generate the QA scores, the 2-D montage images were reviewed manually and coded 0 (success), 1 (errors consistent with published performance), and 2 (gross failure). Then, the ResNet-18 network was trained with 1623 montage images in equal distribution of all three code labels and achieved an accuracy 94% for classification predictions with 404 montage images withheld for the test cohort. To assess the performance of using the QA supervision, the discriminator was used as a loss function in a multi-organ segmentation pipeline. The inclusion of QA-loss function boosted performance on the unlabeled test dataset from 714 patients to 951 patients over the baseline model. Additionally, the number of failures decreased from 606 (29.90%) to 402 (19.83%). The contributions of the proposed method are threefold: We show that (1) the QA scores can be used as a loss function to perform semi-supervised learning for unlabeled data, (2) the well trained discriminator is learnt by QA score rather than traditional true/false, and (3) the performance of multi-organ segmentation on unlabeled datasets can be fine-tuned with more robust and higher accuracy than the original baseline method. △ Less

Submitted 12 November, 2019; originally announced November 2019.

Comments: 7 pages, 5 figures, Accepted by SPIE 2020: Medical Imaging

arXiv:1909.05784 [pdf, other]

HHHFL: Hierarchical Heterogeneous Horizontal Federated Learning for Electroencephalography

Authors: Dashan Gao, Ce Ju, Xiguang Wei, Yang Liu, Tianjian Chen, Qiang Yang

Abstract: Electroencephalography (EEG) classification techniques have been widely studied for human behavior and emotion recognition tasks. But it is still a challenging issue since the data may vary from subject to subject, may change over time for the same subject, and maybe heterogeneous. Recent years, increasing privacy-preserving demands poses new challenges to this task. The data heterogeneity, as wel… ▽ More Electroencephalography (EEG) classification techniques have been widely studied for human behavior and emotion recognition tasks. But it is still a challenging issue since the data may vary from subject to subject, may change over time for the same subject, and maybe heterogeneous. Recent years, increasing privacy-preserving demands poses new challenges to this task. The data heterogeneity, as well as the privacy constraint of the EEG data, is not concerned in previous studies. To fill this gap, in this paper, we propose a heterogeneous federated learning approach to train machine learning models over heterogeneous EEG data, while preserving the data privacy of each party. To verify the effectiveness of our approach, we conduct experiments on a real-world EEG dataset, consisting of heterogeneous data collected from diverse devices. Our approach achieves consistent performance improvement on every task. △ Less

Submitted 10 September, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

Comments: 5 pages, 6 figures, Accepted for International Workshop on Federated Machine Learning for User Privacy and Data Confidentiality in Conjunction with IJCAI 2019 (FL-IJCAI'2019)

ACM Class: I.2.6; I.2.11

arXiv:1908.11486 [pdf, other]

Fast Scenario Reduction for Power Systems by Deep Learning

Authors: Qiao Li, David Wenzhong Gao

Abstract: Scenario reduction is an important topic in stochastic programming problems. Due to the random behavior of load and renewable energy, stochastic programming becomes a useful technique to optimize power systems. Thus, scenario reduction gets more attentions in recent years. Many scenario reduction methods have been proposed to reduce the scenario set in a fast speed. However, the speed of scenario… ▽ More Scenario reduction is an important topic in stochastic programming problems. Due to the random behavior of load and renewable energy, stochastic programming becomes a useful technique to optimize power systems. Thus, scenario reduction gets more attentions in recent years. Many scenario reduction methods have been proposed to reduce the scenario set in a fast speed. However, the speed of scenario reduction is still very slow, in which it takes at least several seconds to several minutes to finish the reduction. This limitation of speed prevents stochastic programming to be implemented in real-time optimal control problems. In this paper, a fast scenario reduction method based on deep learning is proposed to solve this problem. Inspired by the deep learning based image process, recognition and generation methods, the scenario data are transformed into a 2D image-like data and then to be fed into a deep convolutional neural network (DCNN). The output of the DCNN will be an "image" of the reduced scenario set. Since images can be processed in a very high speed by neural networks, the scenario reduction by neural network can also be very fast. The results of the simulation show that the scenario reduction with the proposed DCNN method can be completed in very high speed. △ Less

Submitted 29 August, 2019; originally announced August 2019.

Comments: 4 pages, 4 figures

arXiv:1904.04395 [pdf, other]

doi 10.1109/JSYST.2020.2978535

Extreme Learning Machine Based Non-Iterative and Iterative Nonlinearity Mitigation for LED Communications

Authors: Dawei Gao, Qinghua Guo, Jun Tong, Nan Wu, Jiangtao Xi, Yanguang Yu

Abstract: This work concerns receiver design for light emitting diode (LED) communications where the LED nonlinearity can severely degrade the performance of communications. We propose extreme learning machine (ELM) based non-iterative receivers and iterative receivers to effectively handle the LED nonlinearity and memory effects. For the iterative receiver design, we also develop a data-aided receiver, whe… ▽ More This work concerns receiver design for light emitting diode (LED) communications where the LED nonlinearity can severely degrade the performance of communications. We propose extreme learning machine (ELM) based non-iterative receivers and iterative receivers to effectively handle the LED nonlinearity and memory effects. For the iterative receiver design, we also develop a data-aided receiver, where data is used as virtual training sequence in ELM training. It is shown that the ELM based receivers significantly outperform conventional polynomial based receivers; iterative receivers can achieve huge performance gain compared to non-iterative receivers; and the data-aided receiver can reduce training overhead considerably. This work can also be extended to radio frequency communications, e.g., to deal with the nonlinearity of power amplifiers. △ Less

Submitted 20 April, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

arXiv:1904.00583 [pdf, other]

doi 10.1121/1.5126115

Sound source ranging using a feed-forward neural network with fitting-based early stop**

Authors: **g Chi, Xiaolei Li, Haozhong Wang, Dazhi Gao, Peter Gerstoft

Abstract: When a feed-forward neural network (FNN) is trained for source ranging in an ocean waveguide, it is difficult evaluating the range accuracy of the FNN on unlabeled test data. A fitting-based early stop** (FEAST) method is introduced to evaluate the range error of the FNN on test data where the distance of source is unknown. Based on FEAST, when the evaluated range error of the FNN reaches the mi… ▽ More When a feed-forward neural network (FNN) is trained for source ranging in an ocean waveguide, it is difficult evaluating the range accuracy of the FNN on unlabeled test data. A fitting-based early stop** (FEAST) method is introduced to evaluate the range error of the FNN on test data where the distance of source is unknown. Based on FEAST, when the evaluated range error of the FNN reaches the minimum on test data, stop** training, which will help to improve the ranging accuracy of the FNN on the test data. The FEAST is demonstrated on simulated and experimental data. △ Less

Submitted 1 April, 2019; originally announced April 2019.

arXiv:1903.01551 [pdf, ps, other]

Extreme Learning Machine-Based Receiver for MIMO LED Communications

Authors: Dawei Gao, Qinghua Guo

Abstract: This work concerns receiver design for light-emitting diode (LED) multiple input multiple output (MIMO) communications where the LED nonlinearity can severely degrade the performance of communications. In this paper, we propose an extreme learning machine (ELM) based receiver to jointly handle the LED nonlinearity and cross-LED interference, and a circulant input weight matrix is employed, which s… ▽ More This work concerns receiver design for light-emitting diode (LED) multiple input multiple output (MIMO) communications where the LED nonlinearity can severely degrade the performance of communications. In this paper, we propose an extreme learning machine (ELM) based receiver to jointly handle the LED nonlinearity and cross-LED interference, and a circulant input weight matrix is employed, which significantly reduces the complexity of the receiver with the fast Fourier transform (FFT). It is demonstrated that the proposed receiver can efficiently handle the LED nonlinearity and cross-LED interference. △ Less

Submitted 27 February, 2019; originally announced March 2019.

arXiv:1903.01128 [pdf, ps, other]

Fully Distributed DC Optimal Power Flow Based on Distributed Economic Dispatch and Distributed State Estimation

Authors: Qiao Li, David Wenzhong Gao, Lin Cheng, Fang Zhang, Weihang Yan

Abstract: Optimal power flow (OPF) is an important technique for power systems to achieve optimal operation while satisfying multiple constraints. The traditional OPF are mostly centralized methods which are executed in the centralized control center. This paper introduces a totally Distributed DC Optimal Power Flow (DDCOPF) method for future power systems which have more and more distributed generators. Th… ▽ More Optimal power flow (OPF) is an important technique for power systems to achieve optimal operation while satisfying multiple constraints. The traditional OPF are mostly centralized methods which are executed in the centralized control center. This paper introduces a totally Distributed DC Optimal Power Flow (DDCOPF) method for future power systems which have more and more distributed generators. The proposed method is based on the Distributed Economic Dispatch (DED) method and the Distributed State Estimation (DSE) method. In this proposed scheme, the DED method is used to achieve the optimal power dispatch with the lowest cost, and the DSE method provides power flow information of the power system to the proposed DDCOPF algorithm. In the proposed method, the Auto-Regressive (AR) model is used to predict the load variation so that the proposed algorithm can prevent overflow. In addition, a method called constraint algorithm is developed to correct the results of DED with the proposed correction algorithm and penalty term so that the constraints for the power system will not be violated. Different from existing research, the proposed method is completely distributed without need for any centralized facility. △ Less

Submitted 4 March, 2019; originally announced March 2019.

Comments: 8 pages, 8 figures, journal

arXiv:1902.07318 [pdf]

doi 10.1021/acsphotonics.9b01673

Self-learning photonic signal processor with an optical neural network chip

Authors: Hailong Zhou, Yuhe Zhao, Xu Wang, Dingshan Gao, Jianji Dong, Xinliang Zhang

Abstract: Photonic signal processing is essential in the optical communication and optical computing. Numerous photonic signal processors have been proposed, but most of them exhibit limited reconfigurability and automaticity. A feature of fully automatic implementation and intelligent response is highly desirable for the multipurpose photonic signal processors. Here, we report and experimentally demonstrat… ▽ More Photonic signal processing is essential in the optical communication and optical computing. Numerous photonic signal processors have been proposed, but most of them exhibit limited reconfigurability and automaticity. A feature of fully automatic implementation and intelligent response is highly desirable for the multipurpose photonic signal processors. Here, we report and experimentally demonstrate a fully self-learning and reconfigurable photonic signal processor based on an optical neural network chip. The proposed photonic signal processor is capable of performing various functions including multichannel optical switching, optical multiple-input-multiple-output descrambler and tunable optical filter. All the functions are achieved by complete self-learning. Our demonstration suggests great potential for chip-scale fully programmable optical signal processing with artificial intelligence. △ Less

Submitted 18 February, 2019; originally announced February 2019.

Journal ref: ACS Photonics 7 (2020) 792-799

Showing 1–41 of 41 results for author: Gao, D