-
Lightweight Deep Learning Based Channel Estimation for Extremely Large-Scale Massive MIMO Systems
Authors:
Shen Gao,
Peihao Dong,
Zhiwen Pan,
Xiaohu You
Abstract:
Extremely large-scale massive multiple-input multiple-output (XL-MIMO) systems introduce the much higher channel dimensionality and incur the additional near-field propagation effect, aggravating the computation load and the difficulty to acquire the prior knowledge for channel estimation. In this article, an XL-MIMO channel network (XLCNet) is developed to estimate the high-dimensional channel, w…
▽ More
Extremely large-scale massive multiple-input multiple-output (XL-MIMO) systems introduce the much higher channel dimensionality and incur the additional near-field propagation effect, aggravating the computation load and the difficulty to acquire the prior knowledge for channel estimation. In this article, an XL-MIMO channel network (XLCNet) is developed to estimate the high-dimensional channel, which is a universal solution for both the near-field users and far-field users with different channel statistics. Furthermore, a compressed XLCNet (C-XLCNet) is designed via weight pruning and quantization to accelerate the model inference as well as to facilitate the model storage and transmission. Simulation results show the performance superiority and universality of XLCNet. Compared to XLCNet, C-XLCNet incurs the limited performance loss while reducing the computational complexity and model size by about $10 \times$ and $36 \times$, respectively.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Holistic Evaluation of GPT-4V for Biomedical Imaging
Authors:
Zhengliang Liu,
Hanqi Jiang,
Tianyang Zhong,
Zihao Wu,
Chong Ma,
Yiwei Li,
Xiaowei Yu,
Yutong Zhang,
Yi Pan,
Peng Shu,
Yanjun Lyu,
Lu Zhang,
Junjie Yao,
Peixin Dong,
Chao Cao,
Zhenxiang Xiao,
Jiaqi Wang,
Huan Zhao,
Shaochen Xu,
Yaonai Wei,
**gyuan Chen,
Haixing Dai,
Peilong Wang,
Hao He,
Zewei Wang
, et al. (25 additional authors not shown)
Abstract:
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor…
▽ More
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications.
△ Less
Submitted 10 November, 2023;
originally announced December 2023.
-
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming
Authors:
Qihua Zhou,
Ruibin Li,
Song Guo,
Peiran Dong,
Yi Liu,
**gcai Guo,
Zhenda Xu
Abstract:
Recent years have witnessed the dramatic growth of Internet video traffic, where the video bitstreams are often compressed and delivered in low quality to fit the streamer's uplink bandwidth. To alleviate the quality degradation, it comes the rise of Neural-enhanced Video Streaming (NVS), which shows great prospects for recovering low-quality videos by mostly deploying neural super-resolution (SR)…
▽ More
Recent years have witnessed the dramatic growth of Internet video traffic, where the video bitstreams are often compressed and delivered in low quality to fit the streamer's uplink bandwidth. To alleviate the quality degradation, it comes the rise of Neural-enhanced Video Streaming (NVS), which shows great prospects for recovering low-quality videos by mostly deploying neural super-resolution (SR) on the media server. Despite its benefit, we reveal that current mainstream works with SR enhancement have not achieved the desired rate-distortion trade-off between bitrate saving and quality restoration, due to: (1) overemphasizing the enhancement on the decoder side while omitting the co-design of encoder, (2) limited generative capacity to recover high-fidelity perceptual details, and (3) optimizing the compression-and-restoration pipeline from the resolution perspective solely, without considering color bit-depth. Aiming at overcoming these limitations, we are the first to conduct an encoder-decoder (i.e., codec) synergy by leveraging the inherent visual-generative property of diffusion models. Specifically, we present the Codec-aware Diffusion Modeling (CaDM), a novel NVS paradigm to significantly reduce streaming delivery bitrates while holding pretty higher restoration capacity over existing methods. First, CaDM improves the encoder's compression efficiency by simultaneously reducing resolution and color bit-depth of video frames. Second, CaDM empowers the decoder with high-quality enhancement by making the denoising diffusion restoration aware of encoder's resolution-color conditions. Evaluation on public cloud services with OpenMMLab benchmarks shows that CaDM effectively saves up to 5.12 - 21.44 times bitrates based on common video standards and achieves much better recovery quality (e.g., FID of 0.61) over state-of-the-art neural-enhancing methods.
△ Less
Submitted 8 March, 2023; v1 submitted 15 November, 2022;
originally announced November 2022.
-
A Hybrid Labeled Multi-Bernoulli Filter With Amplitude For Tracking Fluctuating Targets
Authors:
Weizhen Ma,
Zhongliang **g,
Peng Dong,
Henry Leung
Abstract:
The amplitude information of target returns has been incorporated into many tracking algorithms for performance improvements. One of the limitations of employing amplitude feature is that the signal-to-noise ratio (SNR) of the target, i.e., the parameter of amplitude likelihood, is usually assumed to be known and constant. In practice, the target SNR is always unknown, and is dependent on aspect a…
▽ More
The amplitude information of target returns has been incorporated into many tracking algorithms for performance improvements. One of the limitations of employing amplitude feature is that the signal-to-noise ratio (SNR) of the target, i.e., the parameter of amplitude likelihood, is usually assumed to be known and constant. In practice, the target SNR is always unknown, and is dependent on aspect angle hence it will fluctuate. In this paper we propose a hybrid labeled multi-Bernoulli (LMB) filter that introduces the signal amplitude into the LMB filter for tracking targets with unknown and fluctuating SNR. The fluctuation of target SNR is modeled by an autoregressive gamma process and amplitude likelihoods for Swerling 1 and 3 targets are considered. Under Rao-Blackwell decomposition, an approximate Gamma estimator based on Laplace transform and Markov Chain Monte Carlo method is proposed to estimate the target SNR, and the kinematic state is estimated by a Gaussian mixture filter conditioned on the target SNR. The performance of the proposed hybrid filter is analyzed via a tracking scenario including three crossing targets. Simulation results verify the efficacy of the proposed SNR estimator and quantify the benefits of incorporating amplitude information for multi-target tracking.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results
Authors:
Ren Yang,
Radu Timofte,
Xin Li,
Qi Zhang,
Lin Zhang,
Fanglong Liu,
Dongliang He,
Fu li,
He Zheng,
Weihang Yuan,
Pavel Ostyakov,
Dmitry Vyal,
Magauiya Zhussip,
Xueyi Zou,
Youliang Yan,
Lei Li,
**gzhu Tang,
Ming Chen,
Shijie Zhao,
Yu Zhu,
Xiaoran Qin,
Chenghua Li,
Cong Leng,
Jian Cheng,
Claudio Rota
, et al. (28 additional authors not shown)
Abstract:
This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 3…
▽ More
This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR.
△ Less
Submitted 25 August, 2022; v1 submitted 23 August, 2022;
originally announced August 2022.
-
A Multi-tasking Model of Speaker-Keyword Classification for Kee** Human in the Loop of Drone-assisted Inspection
Authors:
Yu Li,
Anisha Parsan,
Bill Wang,
Penghao Dong,
Shanshan Yao,
Ruwen Qin
Abstract:
Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep…
▽ More
Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep learning model that possesses a Share-Split-Collaborate architecture. This architecture allows the two classification tasks to share the feature extractor and then split subject-specific and keyword-specific features intertwined in the extracted features through feature projection and collaborative training. A base model for a group of five authorized subjects is trained and tested on the inspection keyword dataset collected by this study. The model achieved a 95.3% or higher mean accuracy in classifying the keywords of any authorized inspectors. Its mean accuracy in speaker classification is 99.2%. Due to the richer keyword representations that the model learns from the pooled training data, adapting the base model to a new inspector requires only a little training data from that inspector, like five utterances per keyword. Using the speaker classification scores for inspector verification can achieve a success rate of at least 93.9% in verifying authorized inspectors and 76.1% in detecting unauthorized ones. Further, the paper demonstrates the applicability of the proposed model to larger-size groups on a public dataset. This paper provides a solution to addressing challenges facing AI-assisted human-robot interaction, including worker heterogeneity, worker dynamics, and job heterogeneity.
△ Less
Submitted 31 October, 2022; v1 submitted 8 July, 2022;
originally announced July 2022.
-
Edge Semantic Cognitive Intelligence for 6G Networks: Novel Theoretical Models, Enabling Framework, and Typical Applications
Authors:
Peihao Dong,
Qihui Wu,
Xiaofei Zhang,
Guoru Ding
Abstract:
Edge intelligence is anticipated to underlay the pathway to connected intelligence for 6G networks, but the organic confluence of edge computing and artificial intelligence still needs to be carefully treated. To this end, this article discusses the concepts of edge intelligence from the semantic cognitive perspective. Two instructive theoretical models for edge semantic cognitive intelligence (ES…
▽ More
Edge intelligence is anticipated to underlay the pathway to connected intelligence for 6G networks, but the organic confluence of edge computing and artificial intelligence still needs to be carefully treated. To this end, this article discusses the concepts of edge intelligence from the semantic cognitive perspective. Two instructive theoretical models for edge semantic cognitive intelligence (ESCI) are first established. Afterwards, the ESCI framework orchestrating deep learning with semantic communication is discussed. Two representative applications are present to shed light on the prospect of ESCI in 6G networks. Some open problems are finally listed to elicit the future research directions of ESCI.
△ Less
Submitted 9 July, 2022; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Prediction of stent under-expansion in calcified coronary arteries using machine-learning on intravascular optical coherence tomography
Authors:
Yazan Gharaibeh,
Juhwan Lee,
Vladislav N. Zimin,
Chaitanya Kolluru,
Luis A. P. Dallan,
Gabriel T. R. Pereira,
Armando Vergara-Martel,
Justin N. Kim,
Ammar Hoori,
Pengfei Dong,
Peshala T. Gamage,
Linxia Gu,
Hiram G. Bezerra,
Sadeer Al-Kindi,
David L. Wilson
Abstract:
BACKGROUND Careful evaluation of the risk of stent under-expansions before the intervention will aid treatment planning, including the application of a pre-stent plaque modification strategy.
OBJECTIVES It remains challenging to achieve a proper stent expansion in the presence of severely calcified coronary lesions. Building on our work in deep learning segmentation, we created an automated mach…
▽ More
BACKGROUND Careful evaluation of the risk of stent under-expansions before the intervention will aid treatment planning, including the application of a pre-stent plaque modification strategy.
OBJECTIVES It remains challenging to achieve a proper stent expansion in the presence of severely calcified coronary lesions. Building on our work in deep learning segmentation, we created an automated machine learning approach that uses lesion attributes to predict stent under-expansion from pre-stent images, suggesting the need for plaque modification.
METHODS Pre- and post-stent intravascular optical coherence tomography image data were obtained from 110 coronary lesions. Lumen and calcifications in pre-stent images were segmented using deep learning, and numerous features per lesion were extracted. We analyzed stent expansion along the lesion, enabling frame, segmental, and whole-lesion analyses. We trained regression models to predict the poststent lumen area and then to compute the stent expansion index (SEI). Stents with an SEI < or >/= 80% were classified as "under-expanded" and "well-expanded," respectively.
RESULTS Best performance (root-mean-square-error = 0.04+/-0.02 mm2, r = 0.94+/-0.04, p < 0.0001) was achieved when we used features from both the lumen and calcification to train a Gaussian regression model for a segmental analysis over a segment length of 31 frames. Under-expansion classification results (AUC=0.85+/-0.02) were significantly improved over other approaches.
CONCLUSIONS We used calcifications and lumen features to identify lesions at risk of stent under-expansion. Results suggest that the use of pre-stent images can inform physicians of the need to apply plaque modification approaches.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Latency Guarantee for Ubiquitous Intelligence in 6G: A Network Calculus Approach
Authors:
Lianming Zhang,
Qian Wang,
**** Dong,
Yehua Wei,
**g Mei
Abstract:
With the gradual deployment of 5G and the continuous popularization of edge intelligence (EI), the explosive growth of data on the edge of the network has promoted the rapid development of 6G and ubiquitous intelligence (UbiI). This article aims to explore a new method for modeling latency guarantees for UbiI in 6G given 6G's extremely stochastic nature in terahertz (THz) environments, THz channel…
▽ More
With the gradual deployment of 5G and the continuous popularization of edge intelligence (EI), the explosive growth of data on the edge of the network has promoted the rapid development of 6G and ubiquitous intelligence (UbiI). This article aims to explore a new method for modeling latency guarantees for UbiI in 6G given 6G's extremely stochastic nature in terahertz (THz) environments, THz channel tail behavior, and delay distribution tail characteristics generated by the UBiI random component, and to find the optimal solution that minimizes the end-to-end (E2E) delay of UbiI. In this article, the arrival curve and service curve of network calculus can well characterize the stochastic nature of wireless channels, the tail behavior of wireless systems and the E2E service curve of network calculus can model the tail characteristic of the delay distribution in UbiI. Specifically, we first propose demands and challenges facing 6G, edge computing (EC), edge deep learning (DL), and UbiI. Then, we propose the hierarchical architecture, the network model, and the service delay model of the UbiI system based on network calculus. In addition, two case studies demonstrate the usefulness and effectiveness of the network calculus approach in analyzing and modeling the latency guarantee for UbiI in 6G. Finally, future open research issues regarding the latency guarantee for UbiI in 6G are outlined.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
PNC Enabled IIoT: A General Framework for Channel-Coded Asymmetric Physical-Layer Network Coding
Authors:
Zhaorui Wang,
Ling Liu,
Shengli Zhang,
Pengpeng Dong,
Qing Yang,
Taotao Wang
Abstract:
This paper investigates the application of physical-layer network coding (PNC) to Industrial Internet-of-Things (IIoT) where a controller and a robot are out of each other's transmission range, and they exchange messages with the assistance of a relay. We particularly focus on a scenario where the controller has more transmitted information, and the channel of the controller is stronger than that…
▽ More
This paper investigates the application of physical-layer network coding (PNC) to Industrial Internet-of-Things (IIoT) where a controller and a robot are out of each other's transmission range, and they exchange messages with the assistance of a relay. We particularly focus on a scenario where the controller has more transmitted information, and the channel of the controller is stronger than that of the robot. To reduce the communication latency, we propose an asymmetric transmission scheme where the controller and robot transmit different amount of information in the uplink of PNC simultaneously. To achieve this, the controller chooses a higher order modulation. In addition, the both users apply channel codes to guarantee the reliability. A problem is a superimposed symbol at the relay contains different amount of source information from the two end users. It is thus hard for the relay to deduce meaningful network-coded messages by applying the current PNC decoding techniques which require the end users to transmit the same amount of information. To solve this problem, we propose a lattice-based scheme where the two users encode-and-modulate their information in lattices with different lattice construction levels. Our design is versatile on that the two end users can freely choose their modulation orders based on their channel power, and the design is applicable for arbitrary channel codes.
△ Less
Submitted 8 June, 2022; v1 submitted 8 July, 2021;
originally announced July 2021.
-
Deep Multi-Stage CSI Acquisition for Reconfigurable Intelligent Surface Aided MIMO Systems
Authors:
Shen Gao,
Peihao Dong,
Zhiwen Pan,
Geoffrey Ye Li
Abstract:
This article aims to reduce huge pilot overhead when estimating the reconfigurable intelligent surface (RIS) relayed wireless channel. Motivated by the compelling grasp of deep learning in tackling nonlinear map** problems, the proposed approach only activates a part of RIS elements and utilizes the corresponding cascaded channel estimate to predict another part. Through a synthetic deep neural…
▽ More
This article aims to reduce huge pilot overhead when estimating the reconfigurable intelligent surface (RIS) relayed wireless channel. Motivated by the compelling grasp of deep learning in tackling nonlinear map** problems, the proposed approach only activates a part of RIS elements and utilizes the corresponding cascaded channel estimate to predict another part. Through a synthetic deep neural network (DNN), the direct channel and active cascaded channel are first estimated sequentially, followed by the channel prediction for the inactive RIS elements. A three-stage training strategy is developed for this synthetic DNN. From simulation results, the proposed deep learning based approach is effective in reducing the pilot overhead and guaranteeing the reliable estimation accuracy.
△ Less
Submitted 23 April, 2021;
originally announced April 2021.
-
Blind deblurring for microscopic pathology images using deep learning networks
Authors:
Cheng Jiang,
Jun Liao,
Pei Dong,
Zhaoxuan Ma,
De Cai,
Guoan Zheng,
Yue** Liu,
Hong Bu,
Jianhua Yao
Abstract:
Artificial Intelligence (AI)-powered pathology is a revolutionary step in the world of digital pathology and shows great promise to increase both diagnosis accuracy and efficiency. However, defocus and motion blur can obscure tissue or cell characteristics hence compromising AI algorithms'accuracy and robustness in analyzing the images. In this paper, we demonstrate a deep-learning-based approach…
▽ More
Artificial Intelligence (AI)-powered pathology is a revolutionary step in the world of digital pathology and shows great promise to increase both diagnosis accuracy and efficiency. However, defocus and motion blur can obscure tissue or cell characteristics hence compromising AI algorithms'accuracy and robustness in analyzing the images. In this paper, we demonstrate a deep-learning-based approach that can alleviate the defocus and motion blur of a microscopic image and output a sharper and cleaner image with retrieved fine details without prior knowledge of the blur type, blur extent and pathological stain. In this approach, a deep learning classifier is first trained to identify the image blur type. Then, two encoder-decoder networks are trained and used alone or in combination to deblur the input image. It is an end-to-end approach and introduces no corrugated artifacts as traditional blind deconvolution methods do. We test our approach on different types of pathology specimens and demonstrate great performance on image blur correction and the subsequent improvement on the diagnosis outcome of AI algorithms.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Experimental Demonstration of 4,294,967,296-QAM Based Y-00 Quantum Stream Cipher Carrying 160-Gb/s 16-QAM Signals
Authors:
Xi Chen,
Ken Tanizawa,
Peter Winzer,
Po Dong,
Junho Cho,
Fumio Futami,
Kentaro Kato,
Argishti Melikyan,
Kw Kim
Abstract:
We demonstrate a 4,294,967,296-ary quadrature amplitude modulation (QAM) based Y-00 quantum stream cipher system carrying 160-Gb/s 16-QAM signal transmitted over 320-km SSMF. The ultra-dense QAM cipher template is realized by an integrated two-segment silicon photonics I/Q modulator.
We demonstrate a 4,294,967,296-ary quadrature amplitude modulation (QAM) based Y-00 quantum stream cipher system carrying 160-Gb/s 16-QAM signal transmitted over 320-km SSMF. The ultra-dense QAM cipher template is realized by an integrated two-segment silicon photonics I/Q modulator.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Microscope Based HER2 Scoring System
Authors:
Jun Zhang,
Kuan Tian,
Pei Dong,
Haocheng Shen,
Kezhou Yan,
Jianhua Yao,
Junzhou Huang,
Xiao Han
Abstract:
The overexpression of human epidermal growth factor receptor 2 (HER2) has been established as a therapeutic target in multiple types of cancers, such as breast and gastric cancers. Immunohistochemistry (IHC) is employed as a basic HER2 test to identify the HER2-positive, borderline, and HER2-negative patients. However, the reliability and accuracy of HER2 scoring are affected by many factors, such…
▽ More
The overexpression of human epidermal growth factor receptor 2 (HER2) has been established as a therapeutic target in multiple types of cancers, such as breast and gastric cancers. Immunohistochemistry (IHC) is employed as a basic HER2 test to identify the HER2-positive, borderline, and HER2-negative patients. However, the reliability and accuracy of HER2 scoring are affected by many factors, such as pathologists' experience. Recently, artificial intelligence (AI) has been used in various disease diagnosis to improve diagnostic accuracy and reliability, but the interpretation of diagnosis results is still an open problem. In this paper, we propose a real-time HER2 scoring system, which follows the HER2 scoring guidelines to complete the diagnosis, and thus each step is explainable. Unlike the previous scoring systems based on whole-slide imaging, our HER2 scoring system is integrated into an augmented reality (AR) microscope that can feedback AI results to the pathologists while reading the slide. The pathologists can help select informative fields of view (FOVs), avoiding the confounding regions, such as DCIS. Importantly, we illustrate the intermediate results with membrane staining condition and cell classification results, making it possible to evaluate the reliability of the diagnostic results. Also, we support the interactive modification of selecting regions-of-interest, making our system more flexible in clinical practice. The collaboration of AI and pathologists can significantly improve the robustness of our system. We evaluate our system with 285 breast IHC HER2 slides, and the classification accuracy of 95\% shows the effectiveness of our HER2 scoring system.
△ Less
Submitted 14 September, 2020;
originally announced September 2020.
-
Acquisition of Channel State Information for mmWave Massive MIMO: Traditional and Machine Learning-based Approaches
Authors:
Chenhao Qi,
Peihao Dong,
Wenyan Ma,
Hua Zhang,
Zaichen Zhang,
Geoffrey Ye Li
Abstract:
The accuracy of channel state information (CSI) acquisition directly affects the performance of millimeter wave (mmWave) communications. In this article, we provide an overview on CSI acquisition, including beam training and channel estimation for mmWave massive multiple-input multiple-output systems. The beam training can avoid the estimation of a high-dimension channel matrix while the channel e…
▽ More
The accuracy of channel state information (CSI) acquisition directly affects the performance of millimeter wave (mmWave) communications. In this article, we provide an overview on CSI acquisition, including beam training and channel estimation for mmWave massive multiple-input multiple-output systems. The beam training can avoid the estimation of a high-dimension channel matrix while the channel estimation can flexibly exploit advanced signal processing techniques. In addition to introducing the traditional and machine learning-based approaches in this article, we also compare different approaches in terms of spectral efficiency, computational complexity, and overhead.
△ Less
Submitted 12 March, 2022; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Framework on Deep Learning Based Joint Hybrid Processing for mmWave Massive MIMO Systems
Authors:
Peihao Dong,
Hua Zhang,
Geoffrey Ye Li
Abstract:
For millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems, hybrid processing architecture is essential to significantly reduce the complexity and cost but is quite challenging to be jointly optimized over the transmitter and receiver. In this paper, deep learning (DL) is applied to design a novel joint hybrid processing framework (JHPF) that allows end-to-end optimization…
▽ More
For millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems, hybrid processing architecture is essential to significantly reduce the complexity and cost but is quite challenging to be jointly optimized over the transmitter and receiver. In this paper, deep learning (DL) is applied to design a novel joint hybrid processing framework (JHPF) that allows end-to-end optimization by using back propagation. The proposed framework includes three parts: hybrid processing designer, signal flow simulator, and signal demodulator, which outputs the hybrid processing matrices for the transceiver by using neural networks (NNs), simulates the signal transmission over the air, and maps the detected symbols to the original bits by using the NN, respectively. By minimizing the cross-entropy loss between the recovered and original bits, the proposed framework optimizes the analog and digital processing matrices at the transceiver jointly and implicitly instead of approximating pre-designed label matrices, and its trainability is proved theoretically. It can be also directly applied to orthogonal frequency division multiplexing systems by simply modifying the structure of the training data. Simulation results show the proposed DL-JHPF outperforms the existing hybrid processing schemes and is robust to the mismatched channel state information and channel scenarios with the significantly reduced runtime.
△ Less
Submitted 4 June, 2020;
originally announced June 2020.
-
CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks
Authors:
Runbin Shi,
Peiyan Dong,
Tong Geng,
Yuhao Ding,
Xiaolong Ma,
Hayden K. -H. So,
Martin Herbordt,
Ang Li,
Yanzhi Wang
Abstract:
Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight matrices. Pruning schemes have been proposed for RNNs to eliminate the redundant (close-to-zero) weight values. On one hand, the non-structured pruning methods achiev…
▽ More
Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight matrices. Pruning schemes have been proposed for RNNs to eliminate the redundant (close-to-zero) weight values. On one hand, the non-structured pruning methods achieve a high pruning rate but introducing computation irregularity (random sparsity), which is unfriendly to parallel hardware. On the other hand, hardware-oriented structured pruning suffers from low pruning rate due to restricted constraints on allowable pruning structure. This paper presents CSB-RNN, an optimized full-stack RNN framework with a novel compressed structured block (CSB) pruning technique. The CSB pruned RNN model comes with both fine pruning granularity that facilitates a high pruning rate and regular structure that benefits the hardware parallelism. To address the challenges in parallelizing the CSB pruned model inference with fine-grained structural sparsity, we propose a novel hardware architecture with a dedicated compiler. Gaining from the architecture-compilation co-design, the hardware not only supports various RNN cell types, but is also able to address the challenging workload imbalance issue and therefore significantly improves the hardware efficiency.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition
Authors:
Peiyan Dong,
Siyue Wang,
Wei Niu,
Chengming Zhang,
Sheng Lin,
Zhengang Li,
Yifan Gong,
Bin Ren,
Xue Lin,
Yanzhi Wang,
Dingwen Tao
Abstract:
Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones. However, previous RNN compression techniques either suffer from hardware performance overhead due to irregularity or significant accuracy loss due to the preserved regularity for hardware friendliness. In this work, we propose RTMobile that leverages both a nove…
▽ More
Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones. However, previous RNN compression techniques either suffer from hardware performance overhead due to irregularity or significant accuracy loss due to the preserved regularity for hardware friendliness. In this work, we propose RTMobile that leverages both a novel block-based pruning approach and compiler optimizations to accelerate RNN inference on mobile devices. Our proposed RTMobile is the first work that can achieve real-time RNN inference on mobile platforms. Experimental results demonstrate that RTMobile can significantly outperform existing RNN hardware acceleration methods in terms of inference accuracy and time. Compared with prior work on FPGA, RTMobile using Adreno 640 embedded GPU on GRU can improve the energy-efficiency by about 40$\times$ while maintaining the same inference time.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
Deep Learning based Channel Estimation for Massive MIMO with Mixed-Resolution ADCs
Authors:
Shen Gao,
Peihao Dong,
Zhiwen Pan,
Geoffrey Ye Li
Abstract:
In this article, deep learning is applied to estimate the uplink channels for mixed analog-to-digital converters (ADCs) massive multiple-input multiple-output (MIMO) systems, where a portion of antennas are equipped with high-resolution ADCs while others employ low-resolution ones at the base station. A direct-input deep neural network (DI-DNN) is first proposed to estimate channels by using the r…
▽ More
In this article, deep learning is applied to estimate the uplink channels for mixed analog-to-digital converters (ADCs) massive multiple-input multiple-output (MIMO) systems, where a portion of antennas are equipped with high-resolution ADCs while others employ low-resolution ones at the base station. A direct-input deep neural network (DI-DNN) is first proposed to estimate channels by using the received signals of all antennas. To eliminate the adverse impact of the coarsely quantized signals, a selective-input prediction DNN (SIP-DNN) is developed, where only the signals received by the high-resolution ADC antennas are exploited to predict the channels of other antennas as well as to estimate their own channels. Numerical results show the superiority of the proposed DNN based approaches over the existing methods, especially with mixed one-bit ADCs, and the effectiveness of the proposed approaches on different ADC resolution patterns.
△ Less
Submitted 17 August, 2019;
originally announced August 2019.
-
D-UNet: a dimension-fusion U shape network for chronic stroke lesion segmentation
Authors:
Yong** Zhou,
Weijian Huang,
Pei Dong,
Yong Xia,
Shanshan Wang
Abstract:
Assessing the location and extent of lesions caused by chronic stroke is critical for medical diagnosis, surgical planning, and prognosis. In recent years, with the rapid development of 2D and 3D convolutional neural networks (CNN), the encoder-decoder structure has shown great potential in the field of medical image segmentation. However, the 2D CNN ignores the 3D information of medical images, w…
▽ More
Assessing the location and extent of lesions caused by chronic stroke is critical for medical diagnosis, surgical planning, and prognosis. In recent years, with the rapid development of 2D and 3D convolutional neural networks (CNN), the encoder-decoder structure has shown great potential in the field of medical image segmentation. However, the 2D CNN ignores the 3D information of medical images, while the 3D CNN suffers from high computational resource demands. This paper proposes a new architecture called dimension-fusion-UNet (D-UNet), which combines 2D and 3D convolution innovatively in the encoding stage. The proposed architecture achieves a better segmentation performance than 2D networks, while requiring significantly less computation time in comparison to 3D networks. Furthermore, to alleviate the data imbalance issue between positive and negative samples for the network training, we propose a new loss function called Enhance Mixing Loss (EML). This function adds a weighted focal coefficient and combines two traditional loss functions. The proposed method has been tested on the ATLAS dataset and compared to three state-of-the-art methods. The results demonstrate that the proposed method achieves the best quality performance in terms of DSC = 0.5349+0.2763 and precision = 0.6331+0.295).
△ Less
Submitted 14 August, 2019;
originally announced August 2019.
-
A Performance Analysis Model of TCP over Multiple Heterogeneous Paths for 5G Mobile Services
Authors:
Jiayang Song,
** Dong,
Huachun Zhou,
Tao Zheng,
Xiaojiang Du,
Mohsen Guizani
Abstract:
Driven by the primary requirement of emerging 5G mobile services, the demand for concurrent multipath transfer (CMT) is still prominent. Yet, multipath transport protocols are not widely adopted and TCP-based CMT schemes will still be in dominant position in 5G. However, the performance of TCP flow transferred over multiple heterogeneous paths is prone to the link quality asymmetry, the extent of…
▽ More
Driven by the primary requirement of emerging 5G mobile services, the demand for concurrent multipath transfer (CMT) is still prominent. Yet, multipath transport protocols are not widely adopted and TCP-based CMT schemes will still be in dominant position in 5G. However, the performance of TCP flow transferred over multiple heterogeneous paths is prone to the link quality asymmetry, the extent of which was revealed to be significant by our field investigation. In this paper, we present a performance analysis model for TCP over multiple heterogeneous paths in 5G scenarios, where both bandwidth and delay asymmetry are taken into consideration. The evaluation adopting parameters from field investigation shows that the proposed model can achieve high accuracy in practical environments. Some interesting inferences can be drawn from the proposed model, such as the dominant factor that affect the performance of TCP over heterogeneous networks, and the criteria of determining the appropriate number of links to be used under different circumstances of path heterogeneity. Thus, the proposed model can provide a guidance to the design of TCP-based CMT solutions for 5G mobile services.
△ Less
Submitted 6 April, 2018;
originally announced April 2018.