Search | arXiv e-print repository

arXiv:2405.15438 [pdf, other]

Comparing remote sensing-based forest biomass map** approaches using new forest inventory plots in contrasting forests in northeastern and southwestern China

Authors: Wenquan Dong, Edward T. A. Mitchard, Yuwei Chen, Man Chen, Congfeng Cao, Peilun Hu, Cong Xu, Steven Hancock

Abstract: Large-scale high spatial resolution aboveground biomass (AGB) maps play a crucial role in determining forest carbon stocks and how they are changing, which is instrumental in understanding the global carbon cycle, and implementing policy to mitigate climate change. The advent of the new space-borne LiDAR sensor, NASA's GEDI instrument, provides unparalleled possibilities for the accurate and unbia… ▽ More Large-scale high spatial resolution aboveground biomass (AGB) maps play a crucial role in determining forest carbon stocks and how they are changing, which is instrumental in understanding the global carbon cycle, and implementing policy to mitigate climate change. The advent of the new space-borne LiDAR sensor, NASA's GEDI instrument, provides unparalleled possibilities for the accurate and unbiased estimation of forest AGB at high resolution, particularly in dense and tall forests, where Synthetic Aperture Radar (SAR) and passive optical data exhibit saturation. However, GEDI is a sampling instrument, collecting dispersed footprints, and its data must be combined with that from other continuous cover satellites to create high-resolution maps, using local machine learning methods. In this study, we developed local models to estimate forest AGB from GEDI L2A data, as the models used to create GEDI L4 AGB data incorporated minimal field data from China. We then applied LightGBM and random forest regression to generate wall-to-wall AGB maps at 25 m resolution, using extensive GEDI footprints as well as Sentinel-1 data, ALOS-2 PALSAR-2 and Sentinel-2 optical data. Through a 5-fold cross-validation, LightGBM demonstrated a slightly better performance than Random Forest across two contrasting regions. However, in both regions, the computation speed of LightGBM is substantially faster than that of the random forest model, requiring roughly one-third of the time to compute on the same hardware. Through the validation against field data, the 25 m resolution AGB maps generated using the local models developed in this study exhibited higher accuracy compared to the GEDI L4B AGB data. We found in both regions an increase in error as slope increased. The trained models were tested on nearby but different regions and exhibited good performance. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2312.05256 [pdf, other]

Holistic Evaluation of GPT-4V for Biomedical Imaging

Authors: Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, **gyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang , et al. (25 additional authors not shown)

Abstract: In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor… ▽ More In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications. △ Less

Submitted 10 November, 2023; originally announced December 2023.

arXiv:2311.03074 [pdf, other]

A Two-Stage Generative Model with CycleGAN and Joint Diffusion for MRI-based Brain Tumor Detection

Authors: Wenxin Wang, Zhuo-Xu Cui, Guanxun Cheng, Chentao Cao, Xi Xu, Ziwei Liu, Haifeng Wang, Yulong Qi, Dong Liang, Yanjie Zhu

Abstract: Accurate detection and segmentation of brain tumors is critical for medical diagnosis. However, current supervised learning methods require extensively annotated images and the state-of-the-art generative models used in unsupervised methods often have limitations in covering the whole data distribution. In this paper, we propose a novel framework Two-Stage Generative Model (TSGM) that combines Cyc… ▽ More Accurate detection and segmentation of brain tumors is critical for medical diagnosis. However, current supervised learning methods require extensively annotated images and the state-of-the-art generative models used in unsupervised methods often have limitations in covering the whole data distribution. In this paper, we propose a novel framework Two-Stage Generative Model (TSGM) that combines Cycle Generative Adversarial Network (CycleGAN) and Variance Exploding stochastic differential equation using joint probability (VE-JP) to improve brain tumor detection and segmentation. The CycleGAN is trained on unpaired data to generate abnormal images from healthy images as data prior. Then VE-JP is implemented to reconstruct healthy images using synthetic paired abnormal images as a guide, which alters only pathological regions but not regions of healthy. Notably, our method directly learned the joint probability distribution for conditional generation. The residual between input and reconstructed images suggests the abnormalities and a thresholding method is subsequently applied to obtain segmentation results. Furthermore, the multimodal results are weighted with different weights to improve the segmentation accuracy further. We validated our method on three datasets, and compared with other unsupervised methods for anomaly detection and segmentation. The DSC score of 0.8590 in BraTs2020 dataset, 0.6226 in ITCS dataset and 0.7403 in In-house dataset show that our method achieves better segmentation performance and has better generalization. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 11 pages,9 figures,3 tables

arXiv:2309.14372 [pdf, other]

Human Transcription Quality Improvement

Authors: Jian Gao, Hanbo Sun, Cheng Cao, Zheng Du

Abstract: High quality transcription data is crucial for training automatic speech recognition (ASR) systems. However, the existing industry-level data collection pipelines are expensive to researchers, while the quality of crowdsourced transcription is low. In this paper, we propose a reliable method to collect speech transcriptions. We introduce two mechanisms to improve transcription quality: confidence… ▽ More High quality transcription data is crucial for training automatic speech recognition (ASR) systems. However, the existing industry-level data collection pipelines are expensive to researchers, while the quality of crowdsourced transcription is low. In this paper, we propose a reliable method to collect speech transcriptions. We introduce two mechanisms to improve transcription quality: confidence estimation based reprocessing at labeling stage, and automatic word error correction at post-labeling stage. We collect and release LibriCrowd - a large-scale crowdsourced dataset of audio transcriptions on 100 hours of English speech. Experiment shows the Transcription WER is reduced by over 50%. We further investigate the impact of transcription error on ASR model performance and found a strong correlation. The transcription quality improvement provides over 10% relative WER reduction for ASR models. We release the dataset and code to benefit the research community. △ Less

Submitted 23 September, 2023; originally announced September 2023.

Comments: 5 pages, 3 figures, 5 tables, INTERSPEECH 2023

MSC Class: 68T50 ACM Class: I.2.7

Journal ref: INTERSPEECH 2023

arXiv:2309.10089 [pdf, other]

HTEC: Human Transcription Error Correction

Authors: Hanbo Sun, Jian Gao, Xiaomin Wu, Anjie Fang, Cheng Cao, Zheng Du

Abstract: High-quality human transcription is essential for training and improving Automatic Speech Recognition (ASR) models. Recent study~\cite{libricrowd} has found that every 1% worse transcription Word Error Rate (WER) increases approximately 2% ASR WER by using the transcriptions to train ASR models. Transcription errors are inevitable for even highly-trained annotators. However, few studies have explo… ▽ More High-quality human transcription is essential for training and improving Automatic Speech Recognition (ASR) models. Recent study~\cite{libricrowd} has found that every 1% worse transcription Word Error Rate (WER) increases approximately 2% ASR WER by using the transcriptions to train ASR models. Transcription errors are inevitable for even highly-trained annotators. However, few studies have explored human transcription correction. Error correction methods for other problems, such as ASR error correction and grammatical error correction, do not perform sufficiently for this problem. Therefore, we propose HTEC for Human Transcription Error Correction. HTEC consists of two stages: Trans-Checker, an error detection model that predicts and masks erroneous words, and Trans-Filler, a sequence-to-sequence generative model that fills masked positions. We propose a holistic list of correction operations, including four novel operations handling deletion errors. We further propose a variant of embeddings that incorporates phoneme information into the input of the transformer. HTEC outperforms other methods by a large margin and surpasses human annotators by 2.2% to 4.5% in WER. Finally, we deployed HTEC to assist human annotators and showed HTEC is particularly effective as a co-pilot, which improves transcription quality by 15.1% without sacrificing transcription velocity. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 13 pages, 4 figures, 11 tables, AMLC 2023

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2306.01974 [pdf, other]

BEDRF: Bidirectional Edge Diffraction Response Function for Interactive Sound Propagation

Authors: Chunxiao Cao, Zili An, Zhong Ren, Dinesh Manocha, Kun Zhou

Abstract: We introduce bidirectional edge diffraction response function (BEDRF), a new approach to model wave diffraction around edges with path tracing. The diffraction part of the wave is expressed as an integration on path space, and the wave-edge interaction is expressed using only the localized information around points on the edge similar to a bidirectional scattering distribution function (BSDF) for… ▽ More We introduce bidirectional edge diffraction response function (BEDRF), a new approach to model wave diffraction around edges with path tracing. The diffraction part of the wave is expressed as an integration on path space, and the wave-edge interaction is expressed using only the localized information around points on the edge similar to a bidirectional scattering distribution function (BSDF) for visual rendering. For an infinite single wedge, our model generates the same result as the analytic solution. Our approach can be easily integrated into interactive geometric sound propagation algorithms that use path tracing to compute specular and diffuse reflections. Our resulting propagation algorithm can approximate complex wave propagation phenomena involving high-order diffraction, and is able to handle dynamic, deformable objects and moving sources and listeners. We highlight the performance of our approach in different scenarios to generate smooth auralization. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.02509 [pdf, other]

Meta-Learning Enabled Score-Based Generative Model for 1.5T-Like Image Reconstruction from 0.5T MRI

Authors: Zhuo-Xu Cui, Congcong Liu, Chentao Cao, Yuanyuan Liu, **g Cheng, Qingyong Zhu, Yanjie Zhu, Haifeng Wang, Dong Liang

Abstract: Magnetic resonance imaging (MRI) is known to have reduced signal-to-noise ratios (SNR) at lower field strengths, leading to signal degradation when producing a low-field MRI image from a high-field one. Therefore, reconstructing a high-field-like image from a low-field MRI is a complex problem due to the ill-posed nature of the task. Additionally, obtaining paired low-field and high-field MR image… ▽ More Magnetic resonance imaging (MRI) is known to have reduced signal-to-noise ratios (SNR) at lower field strengths, leading to signal degradation when producing a low-field MRI image from a high-field one. Therefore, reconstructing a high-field-like image from a low-field MRI is a complex problem due to the ill-posed nature of the task. Additionally, obtaining paired low-field and high-field MR images is often not practical. We theoretically uncovered that the combination of these challenges renders conventional deep learning methods that directly learn the map** from a low-field MR image to a high-field MR image unsuitable. To overcome these challenges, we introduce a novel meta-learning approach that employs a teacher-student mechanism. Firstly, an optimal-transport-driven teacher learns the degradation process from high-field to low-field MR images and generates pseudo-paired high-field and low-field MRI images. Then, a score-based student solves the inverse problem of reconstructing a high-field-like MR image from a low-field MRI within the framework of iterative regularization, by learning the joint distribution of pseudo-paired images to act as a regularizer. Experimental results on real low-field MRI data demonstrate that our proposed method outperforms state-of-the-art unpaired learning methods. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2303.07327 [pdf, other]

Unsupervised HDR Image and Video Tone Map** via Contrastive Learning

Authors: Cong Cao, Huan**g Yue, Xin Liu, **gyu Yang

Abstract: Capturing high dynamic range (HDR) images (videos) is attractive because it can reveal the details in both dark and bright regions. Since the mainstream screens only support low dynamic range (LDR) content, tone map** algorithm is required to compress the dynamic range of HDR images (videos). Although image tone map** has been widely explored, video tone map** is lagging behind, especially f… ▽ More Capturing high dynamic range (HDR) images (videos) is attractive because it can reveal the details in both dark and bright regions. Since the mainstream screens only support low dynamic range (LDR) content, tone map** algorithm is required to compress the dynamic range of HDR images (videos). Although image tone map** has been widely explored, video tone map** is lagging behind, especially for the deep-learning-based methods, due to the lack of HDR-LDR video pairs. In this work, we propose a unified framework (IVTMNet) for unsupervised image and video tone map**. To improve unsupervised training, we propose domain and instance based contrastive learning loss. Instead of using a universal feature extractor, such as VGG to extract the features for similarity measurement, we propose a novel latent code, which is an aggregation of the brightness and contrast of extracted features, to measure the similarity of different pairs. We totally construct two negative pairs and three positive pairs to constrain the latent codes of tone mapped results. For the network structure, we propose a spatial-feature-enhanced (SFE) module to enable information exchange and transformation of nonlocal regions. For video tone map**, we propose a temporal-feature-replaced (TFR) module to efficiently utilize the temporal correlation and improve the temporal consistency of video tone-mapped results. We construct a large-scale unpaired HDR-LDR video dataset to facilitate the unsupervised training process for video tone map**. Experimental results demonstrate that our method outperforms state-of-the-art image and video tone map** methods. Our code and dataset are available at https://github.com/cao-cong/UnCLTMO. △ Less

Submitted 26 June, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

arXiv:2212.11274 [pdf, other]

SPIRiT-Diffusion: SPIRiT-driven Score-Based Generative Modeling for Vessel Wall imaging

Authors: Chentao Cao, Zhuo-Xu Cui, **g Cheng, Sen Jia, Hairong Zheng, Dong Liang, Yanjie Zhu

Abstract: Diffusion model is the most advanced method in image generation and has been successfully applied to MRI reconstruction. However, the existing methods do not consider the characteristics of multi-coil acquisition of MRI data. Therefore, we give a new diffusion model, called SPIRiT-Diffusion, based on the SPIRiT iterative reconstruction algorithm. Specifically, SPIRiT-Diffusion characterizes the pr… ▽ More Diffusion model is the most advanced method in image generation and has been successfully applied to MRI reconstruction. However, the existing methods do not consider the characteristics of multi-coil acquisition of MRI data. Therefore, we give a new diffusion model, called SPIRiT-Diffusion, based on the SPIRiT iterative reconstruction algorithm. Specifically, SPIRiT-Diffusion characterizes the prior distribution of coil-by-coil images by score matching and characterizes the k-space redundant prior between coils based on self-consistency. With sufficient prior constraint utilized, we achieve superior reconstruction results on the joint Intracranial and Carotid Vessel Wall imaging dataset. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: submitted to ISMRM

arXiv:2212.04736 [pdf, other]

FPGA-Based In-Vivo Calcium Image Decoding for Closed-Loop Feedback Applications

Authors: Zhe Chen, Garrett J. Blair, Chengdi Cao, Jim Zhou, Daniel Aharoni, Peyman Golshani, Hugh T. Blair, Jason Cong

Abstract: Miniaturized calcium imaging is an emerging neural recording technique that has been widely used for monitoring neural activity on a large scale at a specific brain region of rats or mice. Most existing calcium-image analysis pipelines operate offline. This results in long processing latency, making it difficult to realize closed-loop feedback stimulation for brain research. In recent work, we hav… ▽ More Miniaturized calcium imaging is an emerging neural recording technique that has been widely used for monitoring neural activity on a large scale at a specific brain region of rats or mice. Most existing calcium-image analysis pipelines operate offline. This results in long processing latency, making it difficult to realize closed-loop feedback stimulation for brain research. In recent work, we have proposed an FPGA-based real-time calcium image processing pipeline for closed-loop feedback applications. It can perform real-time calcium image motion correction, enhancement, fast trace extraction, and real-time decoding from extracted traces. Here, we extend this work by proposing a variety of neural network based methods for real-time decoding and evaluate the tradeoff among these decoding methods and accelerator designs. We introduced the implementation of the neural network based decoders on the FPGA, and showed their speedup against the implementation on the ARM processor. Our FPGA implementation enables the real-time calcium image decoding with sub-ms processing latency for closed-loop feedback applications. △ Less

Submitted 16 April, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: 11 pages, 15 figures

arXiv:2211.05309 [pdf]

Generic Cryo-CMOS Device Modeling and EDACompatible Platform for Reliable Cryogenic IC Design

Authors: Zhidong Tang, Zewei Wang, Yumeng Yuan, Chang He, Xin Luo, Ao Guo, Renhe Chen, Yongqi Hu, Longfei Yang, Chengwei Cao, Linlin Liu, Liujiang Yu, Ganbing Shang, Yongfeng Cao, Shoumian Chen, Yuhang Zhao, Shaojian Hu, Xufeng Kou

Abstract: This paper outlines the establishment of a generic cryogenic CMOS database in which key electrical parameters and transfer characteristics of the MOSFETs are quantified as functions of device size, temperature/frequency responses. Meanwhile, comprehensive device statistical study is conducted to evaluate the influence of variation and mismatch effects at low temperatures. Furthermore, by incorpora… ▽ More This paper outlines the establishment of a generic cryogenic CMOS database in which key electrical parameters and transfer characteristics of the MOSFETs are quantified as functions of device size, temperature/frequency responses. Meanwhile, comprehensive device statistical study is conducted to evaluate the influence of variation and mismatch effects at low temperatures. Furthermore, by incorporating the Cryo-CMOS compact model into the process design kit (PDK), the cryogenic 4 Kb SRAM, 5-bit flash ADC and 8-bit current steering DAC are designed, and their performance is readily investigated and optimized on the EDA-compatible platform, hence laying a solid foundation for large-scale cryogenic IC design. △ Less

Submitted 9 February, 2024; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.14321 [pdf, other]

Artificial ASMR: A Cyber-Psychological Approach

Authors: Zexin Fang, Bin Han, C. Clark Cao, Hans. D. Schotten

Abstract: The popularity of Autonomous Sensory Meridian Response (ASMR) has skyrockted over the past decade, but scientific studies on what exactly triggered ASMR effect remain few and immature, one most commonly acknowledged trigger is that ASMR clips typically provide rich semantic information. With our attention caught by the common acoustic patterns in ASMR audios, we investigate the correlation between… ▽ More The popularity of Autonomous Sensory Meridian Response (ASMR) has skyrockted over the past decade, but scientific studies on what exactly triggered ASMR effect remain few and immature, one most commonly acknowledged trigger is that ASMR clips typically provide rich semantic information. With our attention caught by the common acoustic patterns in ASMR audios, we investigate the correlation between the cyclic features of audio signals and their effectiveness in triggering ASMR effects. A cyber-psychological approach that combines signal processing, artificial intelligence, and experimental psychology is taken, with which we are able to quantize ASMR-related acoustic features, and therewith synthesize ASMR clips with random cyclic patterns but not delivering identifiably scenarios to the audience, which were proven to be effective in triggering ASMR effects. △ Less

Submitted 5 July, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

Comments: Accepted by IEEE MLSP 2023

arXiv:2209.00835 [pdf, other]

Self-Score: Self-Supervised Learning on Score-Based Models for MRI Reconstruction

Authors: Zhuo-Xu Cui, Chentao Cao, Shaonan Liu, Qingyong Zhu, **g Cheng, Haifeng Wang, Yanjie Zhu, Dong Liang

Abstract: Recently, score-based diffusion models have shown satisfactory performance in MRI reconstruction. Most of these methods require a large amount of fully sampled MRI data as a training set, which, sometimes, is difficult to acquire in practice. This paper proposes a fully-sampled-data-free score-based diffusion model for MRI reconstruction, which learns the fully sampled MR image prior in a self-sup… ▽ More Recently, score-based diffusion models have shown satisfactory performance in MRI reconstruction. Most of these methods require a large amount of fully sampled MRI data as a training set, which, sometimes, is difficult to acquire in practice. This paper proposes a fully-sampled-data-free score-based diffusion model for MRI reconstruction, which learns the fully sampled MR image prior in a self-supervised manner on undersampled data. Specifically, we first infer the fully sampled MR image distribution from the undersampled data by Bayesian deep learning, then perturb the data distribution and approximate their probability density gradient by training a score function. Leveraging the learned score function as a prior, we can reconstruct the MR image by performing conditioned Langevin Markov chain Monte Carlo (MCMC) sampling. Experiments on the public dataset show that the proposed method outperforms existing self-supervised MRI reconstruction methods and achieves comparable performances with the conventional (fully sampled data trained) score-based diffusion methods. △ Less

Submitted 2 September, 2022; originally announced September 2022.

arXiv:2208.05481 [pdf, other]

doi 10.1109/TMI.2024.3351702

High-Frequency Space Diffusion Models for Accelerated MRI

Authors: Chentao Cao, Zhuo-Xu Cui, Yue Wang, Shaonan Liu, Tai** Chen, Hairong Zheng, Dong Liang, Yanjie Zhu

Abstract: Diffusion models with continuous stochastic differential equations (SDEs) have shown superior performances in image generation. It can serve as a deep generative prior to solving the inverse problem in magnetic resonance (MR) reconstruction. However, low-frequency regions of $k$-space data are typically fully sampled in fast MR imaging, while existing diffusion models are performed throughout the… ▽ More Diffusion models with continuous stochastic differential equations (SDEs) have shown superior performances in image generation. It can serve as a deep generative prior to solving the inverse problem in magnetic resonance (MR) reconstruction. However, low-frequency regions of $k$-space data are typically fully sampled in fast MR imaging, while existing diffusion models are performed throughout the entire image or $k$-space, inevitably introducing uncertainty in the reconstruction of low-frequency regions. Additionally, existing diffusion models often demand substantial iterations to converge, resulting in time-consuming reconstructions. To address these challenges, we propose a novel SDE tailored specifically for MR reconstruction with the diffusion process in high-frequency space (referred to as HFS-SDE). This approach ensures determinism in the fully sampled low-frequency regions and accelerates the sampling procedure of reverse diffusion. Experiments conducted on the publicly available fastMRI dataset demonstrate that the proposed HFS-SDE method outperforms traditional parallel imaging methods, supervised deep learning, and existing diffusion models in terms of reconstruction accuracy and stability. The fast convergence properties are also confirmed through theoretical and experimental validation. Our code and weights are available at https://github.com/Aboriginer/HFS-SDE. △ Less

Submitted 20 January, 2024; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: accepted for IEEE TMI

arXiv:2208.02466 [pdf, other]

Linear MIMO Precoders Design for Finite Alphabet Inputs via Model-Free Training

Authors: Chen Cao, Biqian Feng, Yongpeng Wu, Derrick Wing Kwan Ng, Wenjun Zhang

Abstract: This paper investigates a novel method for designing linear precoders with finite alphabet inputs based on autoencoders (AE) without the knowledge of the channel model. By model-free training of the autoencoder in a multiple-input multiple-output (MIMO) system, the proposed method can effectively solve the optimization problem to design the precoders that maximize the mutual information between th… ▽ More This paper investigates a novel method for designing linear precoders with finite alphabet inputs based on autoencoders (AE) without the knowledge of the channel model. By model-free training of the autoencoder in a multiple-input multiple-output (MIMO) system, the proposed method can effectively solve the optimization problem to design the precoders that maximize the mutual information between the channel inputs and outputs, when only the input-output information of the channel can be observed. Specifically, the proposed method regards the receiver and the precoder as two independent parameterized functions in the AE and alternately trains them using the exact and approximated gradient, respectively. Compared with previous precoders design methods, it alleviates the limitation of requiring the explicit channel model to be known. Simulation results show that the proposed method works as well as those methods under known channel models in terms of maximizing the mutual information and reducing the bit error rate. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: Accepted by GLOBECOM 2022

arXiv:2207.08117 [pdf, other]

Accelerating Magnetic Resonance T1\r{ho} Map** Using Simultaneously Spatial Patch-based and Parametric Group-based Low-rank Tensors (SMART)

Authors: Yuanyuan Liu, Dong Liang, Zhuo-Xu Cui, Yuxin Yang, Chentao Cao, Qingyong Zhu, **g Cheng, Caiyun Shi, Haifeng Wang, Yanjie Zhu

Abstract: Quantitative magnetic resonance (MR) T1\r{ho} map** is a promising approach for characterizing intrinsic tissue-dependent information. However, long scan time significantly hinders its widespread applications. Recently, low-rank tensor has been employed and demonstrated good performance in accelerating MR T1\r{ho} map**. In this study, we propose a novel method that uses spatial patch-based an… ▽ More Quantitative magnetic resonance (MR) T1\r{ho} map** is a promising approach for characterizing intrinsic tissue-dependent information. However, long scan time significantly hinders its widespread applications. Recently, low-rank tensor has been employed and demonstrated good performance in accelerating MR T1\r{ho} map**. In this study, we propose a novel method that uses spatial patch-based and parametric group-based low rank tensors simultaneously (SMART) to reconstruct images from highly undersampled k-space data. The spatial patch-based low-rank tensor exploits the high local and nonlocal redundancies and similarities between the contrast images in T1\r{ho} map**. The parametric group based low-rank tensor, which integrates similar exponential behavior of the image signals, is jointly used to enforce the multidimensional low-rankness in the reconstruction process. In vivo brain datasets were used to demonstrate the validity of the proposed method. Experimental results have demonstrated that the proposed method achieves 11.7-fold and 13.21-fold accelerations in two-dimensional and three-dimensional acquisitions, respectively, with more accurate reconstructed images and maps than several state-of-the-art methods. Prospective reconstruction results further demonstrate the capability of the SMART method in accelerating MR T1\r{ho} imaging. △ Less

Submitted 28 January, 2023; v1 submitted 17 July, 2022; originally announced July 2022.

Comments: 22 pages, 19 figures

arXiv:2205.04073 [pdf, other]

PS-Net: Learned Partially Separable Model for Dynamic MR Imaging

Authors: Chentao Cao, Zhuo-Xu Cui, Qingyong Zhu, Congcong Liu, Dong Liang, Yanjie Zhu

Abstract: Deep learning methods driven by the low-rank regularization have achieved attractive performance in dynamic magnetic resonance (MR) imaging. However, most of these methods represent low-rank prior by hand-crafted nuclear norm, which cannot accurately approximate the low-rank prior over the entire dataset through a fixed regularization parameter. In this paper, we propose a learned low-rank method… ▽ More Deep learning methods driven by the low-rank regularization have achieved attractive performance in dynamic magnetic resonance (MR) imaging. However, most of these methods represent low-rank prior by hand-crafted nuclear norm, which cannot accurately approximate the low-rank prior over the entire dataset through a fixed regularization parameter. In this paper, we propose a learned low-rank method for dynamic MR imaging. In particular, we unrolled the semi-quadratic splitting method (HQS) algorithm for the partially separable (PS) model to a network, in which the low-rank is adaptively characterized by a learnable null-space transform. Experiments on the cardiac cine dataset show that the proposed model outperforms the state-of-the-art compressed sensing (CS) methods and existing deep learning methods both quantitatively and qualitatively. △ Less

Submitted 9 August, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: journal

arXiv:2203.04659 [pdf, other]

Single-pixel imaging based on weight sort of the Hadamard basis

Authors: Wen-Kai Yu, Chong Cao, Ying Yang, Ning Wei, Shuo-Fei Wang, Chen-Xi Zhu

Abstract: Single-pixel imaging (SPI) is very popular in subsampling applications, but the random measurement matrices it typically uses will lead to measurement blindness as well as difficulties in calculation and storage, and will also limit the further reduction in sampling rate. The deterministic Hadamard basis has become an alternative choice due to its orthogonality and structural characteristics. Ther… ▽ More Single-pixel imaging (SPI) is very popular in subsampling applications, but the random measurement matrices it typically uses will lead to measurement blindness as well as difficulties in calculation and storage, and will also limit the further reduction in sampling rate. The deterministic Hadamard basis has become an alternative choice due to its orthogonality and structural characteristics. There is evidence that sorting the Hadamard basis is beneficial to further reduce the sampling rate, thus many orderings have emerged, but their relations remain unclear and lack a unified theory. Given this, here we specially propose a concept named selection history, which can record the Hadamard spatial folding process, and build a model based on it to reveal the formation mechanisms of different orderings and to deduce the mutual conversion relationship among them. Then, a weight ordering of the Hadamard basis is proposed. Both numerical simulation and experimental results have demonstrated that with this weight sort technique, the sampling rate, reconstruction time and matrix memory consumption are greatly reduced in comparison to traditional sorting methods. Therefore, we believe that this method may pave the way for real-time single-pixel imaging. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: 19 pages, 13 figures

arXiv:2202.01582 [pdf, other]

A Psychoacoustic Quality Criterion for Path-Traced Sound Propagation

Authors: Chunxiao Cao, Zili An, Zhong Ren, Dinesh Manocha, Kun Zhou

Abstract: In develo** virtual acoustic environments, it is important to understand the relationship between the computation cost and the perceptual significance of the resultant numerical error. In this paper, we propose a quality criterion that evaluates the error significance of path-tracing-based sound propagation simulators. We present an analytical formula that estimates the error signal power spectr… ▽ More In develo** virtual acoustic environments, it is important to understand the relationship between the computation cost and the perceptual significance of the resultant numerical error. In this paper, we propose a quality criterion that evaluates the error significance of path-tracing-based sound propagation simulators. We present an analytical formula that estimates the error signal power spectrum. With this spectrum estimation, we can use a modified Zwicker's loudness model to calculate the relative loudness of the error signal masked by the ideal output. Our experimental results show that the proposed criterion can explain the human perception of simulation error in a variety of cases. △ Less

Submitted 8 October, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

Comments: 12 pages, 10 figures. To be published in IEEE TVCG

arXiv:2111.10633 [pdf, other]

Sparse Tensor-based Multiscale Representation for Point Cloud Geometry Compression

Authors: Jianqiang Wang, Dandan Ding, Zhu Li, Xiaoxing Feng, Chuntong Cao, Zhan Ma

Abstract: This study develops a unified Point Cloud Geometry (PCG) compression method through the processing of multiscale sparse tensor-based voxelized PCG. We call this compression method SparsePCGC. The proposed SparsePCGC is a low complexity solution because it only performs the convolutions on sparsely-distributed Most-Probable Positively-Occupied Voxels (MP-POV). The multiscale representation also all… ▽ More This study develops a unified Point Cloud Geometry (PCG) compression method through the processing of multiscale sparse tensor-based voxelized PCG. We call this compression method SparsePCGC. The proposed SparsePCGC is a low complexity solution because it only performs the convolutions on sparsely-distributed Most-Probable Positively-Occupied Voxels (MP-POV). The multiscale representation also allows us to compress scale-wise MP-POVs by exploiting cross-scale and same-scale correlations extensively and flexibly. The overall compression efficiency highly depends on the accuracy of estimated occupancy probability for each MP-POV. Thus, we first design the Sparse Convolution-based Neural Network (SparseCNN) which stacks sparse convolutions and voxel sampling to best characterize and embed spatial correlations. We then develop the SparseCNN-based Occupancy Probability Approximation (SOPA) model to estimate the occupancy probability either in a single-stage manner only using the cross-scale correlation, or in a multi-stage manner by exploiting stage-wise correlation among same-scale neighbors. Besides, we also suggest the SparseCNN based Local Neighborhood Embedding (SLNE) to aggregate local variations as spatial priors in feature attribute to improve the SOPA. Our unified approach not only shows state-of-the-art performance in both lossless and lossy compression modes across a variety of datasets including the dense object PCGs (8iVFB, Owlii, MUVB) and sparse LiDAR PCGs (KITTI, Ford) when compared with standardized MPEG G-PCC and other prevalent learning-based schemes, but also has low complexity which is attractive to practical applications. △ Less

Submitted 21 October, 2022; v1 submitted 20 November, 2021; originally announced November 2021.

Comments: 17 pages, 15 figures

arXiv:2111.06707 [pdf, other]

Transformer-based Image Compression

Authors: Ming Lu, Peiyao Guo, Huiqing Shi, Chuntong Cao, Zhan Ma

Abstract: A Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders. Both main and hyper encoders are comprised of a sequence of neural transformation units (NTUs) to analyse and aggregate important information for more compact representation of input image, while the decoders mirror the… ▽ More A Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders. Both main and hyper encoders are comprised of a sequence of neural transformation units (NTUs) to analyse and aggregate important information for more compact representation of input image, while the decoders mirror the encoder-side operations to generate pixel-domain image reconstruction from the compressed bitstream. Each NTU is consist of a Swin Transformer Block (STB) and a convolutional layer (Conv) to best embed both long-range and short-range information; In the meantime, a casual attention module (CAM) is devised for adaptive context modeling of latent features to utilize both hyper and autoregressive priors. The TIC rivals with state-of-the-art approaches including deep convolutional neural networks (CNNs) based learnt image coding (LIC) methods and handcrafted rules-based intra profile of recently-approved Versatile Video Coding (VVC) standard, and requires much less model parameters, e.g., up to 45% reduction to leading-performance LIC. △ Less

Submitted 12 November, 2021; originally announced November 2021.

arXiv:2106.15054 [pdf]

Time-Domain Doppler Biomotion Detections Immune to Unavoidable DC Offsets

Authors: Qinyi Lv, Lingtong Min, Congqi Cao, Shigang Zhou, Deyun Zhou, Chengkai Zhu, Yun Li, Zhongbo Zhu, Xiaojun Li, Lixin Ran

Abstract: In the past decades, continuous Doppler radar sensor-based bio-signal detections have attracted many research interests. A typical example is the Doppler heartbeat detection. While significant progresses have been achieved, reliable, time-domain accurate demodulation of bio-signals in the presence of unavoidable DC offsets remains a technical challenge. Aiming to overcome this difficulty, we propo… ▽ More In the past decades, continuous Doppler radar sensor-based bio-signal detections have attracted many research interests. A typical example is the Doppler heartbeat detection. While significant progresses have been achieved, reliable, time-domain accurate demodulation of bio-signals in the presence of unavoidable DC offsets remains a technical challenge. Aiming to overcome this difficulty, we propose in this paper a novel demodulation algorithm that does not need to trace and eliminate dynamic DC offsets based on approximating segmented arcs in a quadrature constellation of sampling data to directional chords. Assisted by the principal component analysis, such chords and their directions can be deterministically determined. Simulations and experimental validations showed fully recovery of micron-level pendulum movements and strongly noised human heartbeats, verifying the effectiveness and accuracy of the proposed approach. △ Less

Submitted 29 October, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: Accepted by IEEE Transactions on Instrumentation & Measurement

arXiv:2106.02514 [pdf, other]

The Image Local Autoregressive Transformer

Authors: Chenjie Cao, Yuxin Hong, Xiang Li, Chengrong Wang, Chengming Xu, XiangYang Xue, Yanwei Fu

Abstract: Recently, AutoRegressive (AR) models for the whole image generation empowered by transformers have achieved comparable or even better performance to Generative Adversarial Networks (GANs). Unfortunately, directly applying such AR models to edit/change local image regions, may suffer from the problems of missing global information, slow inference speed, and information leakage of local guidance. To… ▽ More Recently, AutoRegressive (AR) models for the whole image generation empowered by transformers have achieved comparable or even better performance to Generative Adversarial Networks (GANs). Unfortunately, directly applying such AR models to edit/change local image regions, may suffer from the problems of missing global information, slow inference speed, and information leakage of local guidance. To address these limitations, we propose a novel model -- image Local Autoregressive Transformer (iLAT), to better facilitate the locally guided image synthesis. Our iLAT learns the novel local discrete representations, by the newly proposed local autoregressive (LA) transformer of the attention mask and convolution mechanism. Thus iLAT can efficiently synthesize the local image regions by key guidance information. Our iLAT is evaluated on various locally guided image syntheses, such as pose-guided person image synthesis and face editing. Both the quantitative and qualitative results show the efficacy of our model. △ Less

Submitted 18 October, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: Accepted by NeurIPS2021

arXiv:2103.15876 [pdf, other]

High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation

Authors: Lele Chen, Chen Cao, Fernando De la Torre, Jason Saragih, Chenliang Xu, Yaser Sheikh

Abstract: 3D video avatars can empower virtual communications by providing compression, privacy, entertainment, and a sense of presence in AR/VR. Best 3D photo-realistic AR/VR avatars driven by video, that can minimize uncanny effects, rely on person-specific models. However, existing person-specific photo-realistic 3D models are not robust to lighting, hence their results typically miss subtle facial behav… ▽ More 3D video avatars can empower virtual communications by providing compression, privacy, entertainment, and a sense of presence in AR/VR. Best 3D photo-realistic AR/VR avatars driven by video, that can minimize uncanny effects, rely on person-specific models. However, existing person-specific photo-realistic 3D models are not robust to lighting, hence their results typically miss subtle facial behaviors and cause artifacts in the avatar. This is a major drawback for the scalability of these models in communication systems (e.g., Messenger, Skype, FaceTime) and AR/VR. This paper addresses previous limitations by learning a deep learning lighting model, that in combination with a high-quality 3D face tracking algorithm, provides a method for subtle and robust facial motion transfer from a regular video to a 3D photo-realistic avatar. Extensive experimental validation and comparisons to other state-of-the-art methods demonstrate the effectiveness of the proposed framework in real-world scenarios with variability in pose, expression, and illumination. Please visit https://www.youtube.com/watch?v=dtz1LgZR8cc for more results. Our project page can be found at https://www.cs.rochester.edu/u/lchen63. △ Less

Submitted 29 March, 2021; originally announced March 2021.

Comments: The paper is accepted to CVPR 2021

arXiv:2010.04600 [pdf, other]

Robust Adaptive Control of Linear Parameter-Varying Systems with Unmatched Uncertainties

Authors: Pan Zhao, Steven Snyder, Naira Hovakimyana, Chengyu Cao

Abstract: In controlling systems with large operating envelopes, it is often necessary to adjust the desired dynamics according to operating conditions. This paper presents a robust adaptive control architecture for linear parameter-varying (LPV) systems that allows for the desired dynamics to be systematically scheduled, while being able to handle a broad class of uncertainties, both matched and unmatched,… ▽ More In controlling systems with large operating envelopes, it is often necessary to adjust the desired dynamics according to operating conditions. This paper presents a robust adaptive control architecture for linear parameter-varying (LPV) systems that allows for the desired dynamics to be systematically scheduled, while being able to handle a broad class of uncertainties, both matched and unmatched, which can depend on both time and states. The proposed controller adopts an L1 adaptive control architecture for designing the adaptive control law and peak-to-peak gain (PPG) minimization for designing the robust control law to mitigate the effect of unmatched uncertainties. Leveraging the PPG bound of an LPV system, we derive transient and steady-state performance bounds in terms of the input and output signals of the actual closed-loop system as compared to the same signals of a nominal system. The efficacy of the proposed method is validated by extensive simulations using the short-period dynamics of an F-16 aircraft operating in a large envelope. △ Less

Submitted 21 September, 2023; v1 submitted 9 October, 2020; originally announced October 2020.

Comments: 28 pages, 9 figures

arXiv:2008.00901 [pdf, other]

Automated Segmentation of Brain Gray Matter Nuclei on Quantitative Susceptibility Map** Using Deep Convolutional Neural Network

Authors: Chao Chai, Pengchong Qiao, Bin Zhao, Huiying Wang, Guohua Liu, Hong Wu, E Mark Haacke, Wen Shen, Chen Cao, Xinchen Ye, Zhiyang Liu, Shuang Xia

Abstract: Abnormal iron accumulation in the brain subcortical nuclei has been reported to be correlated to various neurodegenerative diseases, which can be measured through the magnetic susceptibility from the quantitative susceptibility map** (QSM). To quantitively measure the magnetic susceptibility, the nuclei should be accurately segmented, which is a tedious task for clinicians. In this paper, we pro… ▽ More Abnormal iron accumulation in the brain subcortical nuclei has been reported to be correlated to various neurodegenerative diseases, which can be measured through the magnetic susceptibility from the quantitative susceptibility map** (QSM). To quantitively measure the magnetic susceptibility, the nuclei should be accurately segmented, which is a tedious task for clinicians. In this paper, we proposed a double-branch residual-structured U-Net (DB-ResUNet) based on 3D convolutional neural network (CNN) to automatically segment such brain gray matter nuclei. To better tradeoff between segmentation accuracy and the memory efficiency, the proposed DB-ResUNet fed image patches with high resolution and the patches with low resolution but larger field of view into the local and global branches, respectively. Experimental results revealed that by jointly using QSM and T$_\text{1}$ weighted imaging (T$_\text{1}$WI) as inputs, the proposed method was able to achieve better segmentation accuracy over its single-branch counterpart, as well as the conventional atlas-based method and the classical 3D-UNet structure. The susceptibility values and the volumes were also measured, which indicated that the measurements from the proposed DB-ResUNet are able to present high correlation with values from the manually annotated regions of interest. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: submitted to IEEE Transactions on Medical Imaging

arXiv:2004.00152 [pdf, other]

L1-Adaptive MPPI Architecture for Robust and Agile Control of Multirotors

Authors: **tasit Pravitra, Kasey A. Ackerman, Chengyu Cao, Naira Hovakimyan, Evangelos A. Theodorou

Abstract: This paper presents a multirotor control architecture, where Model Predictive Path Integral Control (MPPI) and L1 adaptive control are combined to achieve both fast model predictive trajectory planning and robust trajectory tracking. MPPI provides a framework to solve nonlinear MPC with complex cost functions in real-time. However, it often lacks robustness, especially when the simulated dynamics… ▽ More This paper presents a multirotor control architecture, where Model Predictive Path Integral Control (MPPI) and L1 adaptive control are combined to achieve both fast model predictive trajectory planning and robust trajectory tracking. MPPI provides a framework to solve nonlinear MPC with complex cost functions in real-time. However, it often lacks robustness, especially when the simulated dynamics are different from the true dynamics. We show that the L1 adaptive controller robustifies the architecture, allowing the overall system to behave similar to the nominal system simulated with MPPI. The architecture is validated in a simulated multirotor racing environment. △ Less

Submitted 31 March, 2020; originally announced April 2020.

Comments: Submitted to 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems

arXiv:2003.14013 [pdf, other]

Supervised Raw Video Denoising with a Benchmark Dataset on Dynamic Scenes

Authors: Huan**g Yue, Cong Cao, Lei Liao, Ronghe Chu, **gyu Yang

Abstract: In recent years, the supervised learning strategy for real noisy image denoising has been emerging and has achieved promising results. In contrast, realistic noise removal for raw noisy videos is rarely studied due to the lack of noisy-clean pairs for dynamic scenes. Clean video frames for dynamic scenes cannot be captured with a long-exposure shutter or averaging multi-shots as was done for stati… ▽ More In recent years, the supervised learning strategy for real noisy image denoising has been emerging and has achieved promising results. In contrast, realistic noise removal for raw noisy videos is rarely studied due to the lack of noisy-clean pairs for dynamic scenes. Clean video frames for dynamic scenes cannot be captured with a long-exposure shutter or averaging multi-shots as was done for static images. In this paper, we solve this problem by creating motions for controllable objects, such as toys, and capturing each static moment for multiple times to generate clean video frames. In this way, we construct a dataset with 55 groups of noisy-clean videos with ISO values ranging from 1600 to 25600. To our knowledge, this is the first dynamic video dataset with noisy-clean pairs. Correspondingly, we propose a raw video denoising network (RViDeNet) by exploring the temporal, spatial, and channel correlations of video frames. Since the raw video has Bayer patterns, we pack it into four sub-sequences, i.e RGBG sequences, which are denoised by the proposed RViDeNet separately and finally fused into a clean video. In addition, our network not only outputs a raw denoising result, but also the sRGB result by going through an image signal processing (ISP) module, which enables users to generate the sRGB result with their favourite ISPs. Experimental results demonstrate that our method outperforms state-of-the-art video and raw image denoising algorithms on both indoor and outdoor videos. △ Less

Submitted 31 March, 2020; originally announced March 2020.

Comments: CVPR2020 accepted paper

arXiv:1908.03735 [pdf]

Automatic acute ischemic stroke lesion segmentation using semi-supervised learning

Authors: Bin Zhao, Shuxue Ding, Hong Wu, Guohua Liu, Chen Cao, Song **, Zhiyang Liu

Abstract: Ischemic stroke is a common disease in the elderly population, which can cause long-term disability and even death. However, the time window for treatment of ischemic stroke in its acute stage is very short. To fast localize and quantitively evaluate the acute ischemic stroke (AIS) lesions, many deep-learning-based lesion segmentation methods have been proposed in the literature, where a deep conv… ▽ More Ischemic stroke is a common disease in the elderly population, which can cause long-term disability and even death. However, the time window for treatment of ischemic stroke in its acute stage is very short. To fast localize and quantitively evaluate the acute ischemic stroke (AIS) lesions, many deep-learning-based lesion segmentation methods have been proposed in the literature, where a deep convolutional neural network (CNN) was trained on hundreds of fully labeled subjects with accurate annotations of AIS lesions. Despite that high segmentation accuracy can be achieved, the accurate labels should be annotated by experienced clinicians, and it is therefore very time-consuming to obtain a large number of fully labeled subjects. In this paper, we propose a semi-supervised method to automatically segment AIS lesions in diffusion weighted images and apparent diffusion coefficient maps. By using a large number of weakly labeled subjects and a small number of fully labeled subjects, our proposed method is able to accurately detect and segment the AIS lesions. In particular, our proposed method consists of three parts: 1) a double-path classification net (DPC-Net) trained in a weakly-supervised way is used to detect the suspicious regions of AIS lesions; 2) a pixel-level K-Means clustering algorithm is used to identify the hyperintensive regions on the DWIs; and 3) a region-growing algorithm combines the outputs of the DPC-Net and the K-Means to obtain the final precise lesion segmentation. In our experiment, we use 460 weakly labeled subjects and 15 fully labeled subjects to train and fine-tune the proposed method. By evaluating on a clinical dataset with 150 fully labeled subjects, our proposed method achieves a mean dice coefficient of 0.642, and a lesion-wise F1 score of 0.822. △ Less

Submitted 20 September, 2020; v1 submitted 10 August, 2019; originally announced August 2019.

arXiv:1809.01859 [pdf, ps, other]

Deep Learning-Based Decoding for Constrained Sequence Codes

Authors: Congzhe Cao, Duanshun Li, Ivan Fair

Abstract: Constrained sequence codes have been widely used in modern communication and data storage systems. Sequences encoded with constrained sequence codes satisfy constraints imposed by the physical channel, hence enabling efficient and reliable transmission of coded symbols. Traditional encoding and decoding of constrained sequence codes rely on table look-up, which is prone to errors that occur during… ▽ More Constrained sequence codes have been widely used in modern communication and data storage systems. Sequences encoded with constrained sequence codes satisfy constraints imposed by the physical channel, hence enabling efficient and reliable transmission of coded symbols. Traditional encoding and decoding of constrained sequence codes rely on table look-up, which is prone to errors that occur during transmission. In this paper, we introduce constrained sequence decoding based on deep learning. With multiple layer perception (MLP) networks and convolutional neural networks (CNNs), we are able to achieve low bit error rates that are close to maximum a posteriori probability (MAP) decoding as well as improve the system throughput. Moreover, implementation of capacity-achieving fixed-length codes, where the complexity is prohibitively high with table look-up decoding, becomes practical with deep learning-based decoding. △ Less

Submitted 6 September, 2018; originally announced September 2018.

Comments: 7 pages, 6 figures, accepted by IEEE Global Communications Conference Workshop - Machine learning for communications

Showing 1–30 of 30 results for author: Cao, C