Search | arXiv e-print repository

CMC-Bench: Towards a New Paradigm of Visual Signal Compression

Authors: Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin

Abstract: Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in… ▽ More Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in consistency with the original image and perceptual quality. To address this problem, we introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression. This benchmark covers 18,000 and 40,000 images respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000 subjective preference scores annotated by human experts. At ultra-low bitrates, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal codecs; meanwhile, it highlights where LMMs can be further optimized toward the compression task. We encourage LMM developers to participate in this test to promote the evolution of visual signal codec protocols. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.00123 [pdf]

Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration

Authors: Mingyuan Meng, Dagan Feng, Lei Bi, **man Kim

Abstract: Deformable image registration is a fundamental step for medical image analysis. Recently, transformers have been used for registration and outperformed Convolutional Neural Networks (CNNs). Transformers can capture long-range dependence among image features, which have been shown beneficial for registration. However, due to the high computation/memory loads of self-attention, transformers are typi… ▽ More Deformable image registration is a fundamental step for medical image analysis. Recently, transformers have been used for registration and outperformed Convolutional Neural Networks (CNNs). Transformers can capture long-range dependence among image features, which have been shown beneficial for registration. However, due to the high computation/memory loads of self-attention, transformers are typically used at downsampled feature resolutions and cannot capture fine-grained long-range dependence at the full image resolution. This limits deformable registration as it necessitates precise dense correspondence between each image pixel. Multi-layer Perceptrons (MLPs) without self-attention are efficient in computation/memory usage, enabling the feasibility of capturing fine-grained long-range dependence at full resolution. Nevertheless, MLPs have not been extensively explored for image registration and are lacking the consideration of inductive bias crucial for medical registration tasks. In this study, we propose the first correlation-aware MLP-based registration network (CorrMLP) for deformable medical image registration. Our CorrMLP introduces a correlation-aware multi-window MLP block in a novel coarse-to-fine registration architecture, which captures fine-grained multi-range dependence to perform correlation-aware coarse-to-fine registration. Extensive experiments with seven public medical datasets show that our CorrMLP outperforms state-of-the-art deformable registration methods. △ Less

Submitted 12 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

Comments: Accepted at CVPR2024 as Oral Presentation && Best Paper Candidate

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 9645-9654

arXiv:2404.18105 [pdf, other]

Tightly-Coupled VLP/INS Integrated Navigation by Inclination Estimation and Blockage Handling

Authors: Xiao Sun, Yuan Zhuang, Xiansheng Yang, Jianzhu Huai, Tianming Huang, Daquan Feng

Abstract: Visible Light Positioning (VLP) has emerged as a promising technology capable of delivering indoor localization with high accuracy. In VLP systems that use Photodiodes (PDs) as light receivers, the Received Signal Strength (RSS) is affected by the incidence angle of light, making the inclination of PDs a critical parameter in the positioning model. Currently, most studies assume the inclination to… ▽ More Visible Light Positioning (VLP) has emerged as a promising technology capable of delivering indoor localization with high accuracy. In VLP systems that use Photodiodes (PDs) as light receivers, the Received Signal Strength (RSS) is affected by the incidence angle of light, making the inclination of PDs a critical parameter in the positioning model. Currently, most studies assume the inclination to be constant, limiting the applications and positioning accuracy. Additionally, light blockages may severely interfere with the RSS measurements but the literature has not explored blockage detection in real-world experiments. To address these problems, we propose a tightly coupled VLP/INS (Inertial Navigation System) integrated navigation system that uses graph optimization to account for varying PD inclinations and VLP blockages. We also discussed the possibility of simultaneously estimating the robot's pose and the locations of some unknown LEDs. Simulations and two groups of real-world experiments demonstrate the efficiency of our approach, achieving an average positioning accuracy of 10 cm during movement and inclination accuracy within 1 degree despite inclination changes and blockages. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2402.16749 [pdf, other]

MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Authors: Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, Wenjun Zhang

Abstract: With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. In recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To… ▽ More With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. In recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To solve this problem, this paper proposes a method called Multimodal Image Semantic Compression (MISC), which consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information. Experimental results show that our proposed MISC is suitable for compressing both traditional Natural Sense Images (NSIs) and emerging AI-Generated Images (AIGIs) content. It can achieve optimal consistency and perception results while saving 50% bitrate, which has strong potential applications in the next generation of storage and communication. The code will be released on https://github.com/lcysyzxdxc/MISC. △ Less

Submitted 17 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: 13 page, 11 figures, 4 tables

arXiv:2311.16707 [pdf]

Full-resolution MLPs Empower Medical Dense Prediction

Authors: Mingyuan Meng, Yuxin Xue, Dagan Feng, Lei Bi, **man Kim

Abstract: Dense prediction is a fundamental requirement for many medical vision tasks such as medical image restoration, registration, and segmentation. The most popular vision model, Convolutional Neural Networks (CNNs), has reached bottlenecks due to the intrinsic locality of convolution operations. Recently, transformers have been widely adopted for dense prediction for their capability to capture long-r… ▽ More Dense prediction is a fundamental requirement for many medical vision tasks such as medical image restoration, registration, and segmentation. The most popular vision model, Convolutional Neural Networks (CNNs), has reached bottlenecks due to the intrinsic locality of convolution operations. Recently, transformers have been widely adopted for dense prediction for their capability to capture long-range visual dependence. However, due to the high computational complexity and large memory consumption of self-attention operations, transformers are usually used at downsampled feature resolutions. Such usage cannot effectively leverage the tissue-level textural information available only at the full image resolution. This textural information is crucial for medical dense prediction as it can differentiate the subtle human anatomy in medical images. In this study, we hypothesize that Multi-layer Perceptrons (MLPs) are superior alternatives to transformers in medical dense prediction where tissue-level details dominate the performance, as MLPs enable long-range dependence at the full image resolution. To validate our hypothesis, we develop a full-resolution hierarchical MLP framework that uses MLPs beginning from the full image resolution. We evaluate this framework with various MLP blocks on a wide range of medical dense prediction tasks including restoration, registration, and segmentation. Extensive experiments on six public well-benchmarked datasets show that, by simply using MLPs at full resolution, our framework outperforms its CNN and transformer counterparts and achieves state-of-the-art performance on various medical dense prediction tasks. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: Under Review

arXiv:2310.15550 [pdf]

PET Synthesis via Self-supervised Adaptive Residual Estimation Generative Adversarial Network

Authors: Yuxin Xue, Lei Bi, Yige Peng, Michael Fulham, David Dagan Feng, **man Kim

Abstract: Positron emission tomography (PET) is a widely used, highly sensitive molecular imaging in clinical diagnosis. There is interest in reducing the radiation exposure from PET but also maintaining adequate image quality. Recent methods using convolutional neural networks (CNNs) to generate synthesized high-quality PET images from low-dose counterparts have been reported to be state-of-the-art for low… ▽ More Positron emission tomography (PET) is a widely used, highly sensitive molecular imaging in clinical diagnosis. There is interest in reducing the radiation exposure from PET but also maintaining adequate image quality. Recent methods using convolutional neural networks (CNNs) to generate synthesized high-quality PET images from low-dose counterparts have been reported to be state-of-the-art for low-to-high image recovery methods. However, these methods are prone to exhibiting discrepancies in texture and structure between synthesized and real images. Furthermore, the distribution shift between low-dose PET and standard PET has not been fully investigated. To address these issues, we developed a self-supervised adaptive residual estimation generative adversarial network (SS-AEGAN). We introduce (1) An adaptive residual estimation map** mechanism, AE-Net, designed to dynamically rectify the preliminary synthesized PET images by taking the residual map between the low-dose PET and synthesized output as the input, and (2) A self-supervised pre-training strategy to enhance the feature representation of the coarse generator. Our experiments with a public benchmark dataset of total-body PET images show that SS-AEGAN consistently outperformed the state-of-the-art synthesis methods with various dose reduction factors. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2309.05271 [pdf]

AutoFuse: Automatic Fusion Networks for Deformable Medical Image Registration

Authors: Mingyuan Meng, Michael Fulham, Dagan Feng, Lei Bi, **man Kim

Abstract: Deformable image registration aims to find a dense non-linear spatial correspondence between a pair of images, which is a crucial step for many medical tasks such as tumor growth monitoring and population analysis. Recently, Deep Neural Networks (DNNs) have been widely recognized for their ability to perform fast end-to-end registration. However, DNN-based registration needs to explore the spatial… ▽ More Deformable image registration aims to find a dense non-linear spatial correspondence between a pair of images, which is a crucial step for many medical tasks such as tumor growth monitoring and population analysis. Recently, Deep Neural Networks (DNNs) have been widely recognized for their ability to perform fast end-to-end registration. However, DNN-based registration needs to explore the spatial information of each image and fuse this information to characterize spatial correspondence. This raises an essential question: what is the optimal fusion strategy to characterize spatial correspondence? Existing fusion strategies (e.g., early fusion, late fusion) were empirically designed to fuse information by manually defined prior knowledge, which inevitably constrains the registration performance within the limits of empirical designs. In this study, we depart from existing empirically-designed fusion strategies and develop a data-driven fusion strategy for deformable image registration. To achieve this, we propose an Automatic Fusion network (AutoFuse) that provides flexibility to fuse information at many potential locations within the network. A Fusion Gate (FG) module is also proposed to control how to fuse information at each potential network location based on training data. Our AutoFuse can automatically optimize its fusion strategy during training and can be generalizable to both unsupervised registration (without any labels) and semi-supervised registration (with weak labels provided for partial training data). Extensive experiments on two well-benchmarked medical registration tasks (inter- and intra-patient registration) with eight public datasets show that our AutoFuse outperforms state-of-the-art unsupervised and semi-supervised registration methods. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: Under Review

arXiv:2307.03427 [pdf]

doi 10.1007/978-3-031-43987-2_39

Merging-Diverging Hybrid Transformer Networks for Survival Prediction in Head and Neck Cancer

Authors: Mingyuan Meng, Lei Bi, Michael Fulham, Dagan Feng, **man Kim

Abstract: Survival prediction is crucial for cancer patients as it provides early prognostic information for treatment planning. Recently, deep survival models based on deep learning and medical images have shown promising performance for survival prediction. However, existing deep survival models are not well developed in utilizing multi-modality images (e.g., PET-CT) and in extracting region-specific info… ▽ More Survival prediction is crucial for cancer patients as it provides early prognostic information for treatment planning. Recently, deep survival models based on deep learning and medical images have shown promising performance for survival prediction. However, existing deep survival models are not well developed in utilizing multi-modality images (e.g., PET-CT) and in extracting region-specific information (e.g., the prognostic information in Primary Tumor (PT) and Metastatic Lymph Node (MLN) regions). In view of this, we propose a merging-diverging learning framework for survival prediction from multi-modality images. This framework has a merging encoder to fuse multi-modality information and a diverging decoder to extract region-specific information. In the merging encoder, we propose a Hybrid Parallel Cross-Attention (HPCA) block to effectively fuse multi-modality features via parallel convolutional layers and cross-attention transformers. In the diverging decoder, we propose a Region-specific Attention Gate (RAG) block to screen out the features related to lesion regions. Our framework is demonstrated on survival prediction from PET-CT images in Head and Neck (H&N) cancer, by designing an X-shape merging-diverging hybrid transformer network (named XSurv). Our XSurv combines the complementary information in PET and CT images and extracts the region-specific prognostic information in PT and MLN regions. Extensive experiments on the public dataset of HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022) demonstrate that our XSurv outperforms state-of-the-art survival prediction methods. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: Early Accepted at International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023)

Journal ref: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 400-410, 2023

arXiv:2307.03421 [pdf]

doi 10.1007/978-3-031-43999-5_71

Non-iterative Coarse-to-fine Transformer Networks for Joint Affine and Deformable Image Registration

Authors: Mingyuan Meng, Lei Bi, Michael Fulham, Dagan Feng, **man Kim

Abstract: Image registration is a fundamental requirement for medical image analysis. Deep registration methods based on deep learning have been widely recognized for their capabilities to perform fast end-to-end registration. Many deep registration methods achieved state-of-the-art performance by performing coarse-to-fine registration, where multiple registration steps were iterated with cascaded networks.… ▽ More Image registration is a fundamental requirement for medical image analysis. Deep registration methods based on deep learning have been widely recognized for their capabilities to perform fast end-to-end registration. Many deep registration methods achieved state-of-the-art performance by performing coarse-to-fine registration, where multiple registration steps were iterated with cascaded networks. Recently, Non-Iterative Coarse-to-finE (NICE) registration methods have been proposed to perform coarse-to-fine registration in a single network and showed advantages in both registration accuracy and runtime. However, existing NICE registration methods mainly focus on deformable registration, while affine registration, a common prerequisite, is still reliant on time-consuming traditional optimization-based methods or extra affine registration networks. In addition, existing NICE registration methods are limited by the intrinsic locality of convolution operations. Transformers may address this limitation for their capabilities to capture long-range dependency, but the benefits of using transformers for NICE registration have not been explored. In this study, we propose a Non-Iterative Coarse-to-finE Transformer network (NICE-Trans) for image registration. Our NICE-Trans is the first deep registration method that (i) performs joint affine and deformable coarse-to-fine registration within a single network, and (ii) embeds transformers into a NICE registration framework to model long-range relevance between images. Extensive experiments with seven public datasets show that our NICE-Trans outperforms state-of-the-art registration methods on both registration accuracy and runtime. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: Accepted at International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023)

Journal ref: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp.750-760, 2023

arXiv:2305.09946 [pdf]

AdaMSS: Adaptive Multi-Modality Segmentation-to-Survival Learning for Survival Outcome Prediction from PET/CT Images

Authors: Mingyuan Meng, Bingxin Gu, Michael Fulham, Shaoli Song, Dagan Feng, Lei Bi, **man Kim

Abstract: Survival prediction is a major concern for cancer management. Deep survival models based on deep learning have been widely adopted to perform end-to-end survival prediction from medical images. Recent deep survival models achieved promising performance by jointly performing tumor segmentation with survival prediction, where the models were guided to extract tumor-related information through Multi-… ▽ More Survival prediction is a major concern for cancer management. Deep survival models based on deep learning have been widely adopted to perform end-to-end survival prediction from medical images. Recent deep survival models achieved promising performance by jointly performing tumor segmentation with survival prediction, where the models were guided to extract tumor-related information through Multi-Task Learning (MTL). However, these deep survival models have difficulties in exploring out-of-tumor prognostic information. In addition, existing deep survival models are unable to effectively leverage multi-modality images. Empirically-designed fusion strategies were commonly adopted to fuse multi-modality information via task-specific manually-designed networks, thus limiting the adaptability to different scenarios. In this study, we propose an Adaptive Multi-modality Segmentation-to-Survival model (AdaMSS) for survival prediction from PET/CT images. Instead of adopting MTL, we propose a novel Segmentation-to-Survival Learning (SSL) strategy, where our AdaMSS is trained for tumor segmentation and survival prediction sequentially in two stages. This strategy enables the AdaMSS to focus on tumor regions in the first stage and gradually expand its focus to include other prognosis-related regions in the second stage. We also propose a data-driven strategy to fuse multi-modality information, which realizes adaptive optimization of fusion strategies based on training data during training. With the SSL and data-driven fusion strategies, our AdaMSS is designed as an adaptive model that can self-adapt its focus regions and fusion strategy for different training stages. Extensive experiments with two large clinical datasets show that our AdaMSS outperforms state-of-the-art survival prediction methods. △ Less

Submitted 19 July, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: Under Review

arXiv:2305.07584 [pdf, other]

Proactive Content Caching Scheme in Urban Vehicular Networks

Authors: Biqian Feng, Chenyuan Feng, Daquan Feng, Yongpeng Wu, Xiang-Gen Xia

Abstract: Stream media content caching is a key enabling technology to promote the value chain of future urban vehicular networks. Nevertheless, the high mobility of vehicles, intermittency of information transmissions, high dynamics of user requests, limited caching capacities and extreme complexity of business scenarios pose an enormous challenge to content caching and distribution in vehicular networks.… ▽ More Stream media content caching is a key enabling technology to promote the value chain of future urban vehicular networks. Nevertheless, the high mobility of vehicles, intermittency of information transmissions, high dynamics of user requests, limited caching capacities and extreme complexity of business scenarios pose an enormous challenge to content caching and distribution in vehicular networks. To tackle this problem, this paper aims to design a novel edge-computing-enabled hierarchical cooperative caching framework. Firstly, we profoundly analyze the spatio-temporal correlation between the historical vehicle trajectory of user requests and construct the system model to predict the vehicle trajectory and content popularity, which lays a foundation for mobility-aware content caching and dispatching. Meanwhile, we probe into privacy protection strategies to realize privacy-preserved prediction model. Furthermore, based on trajectory and popular content prediction results, content caching strategy is studied, and adaptive and dynamic resource management schemes are proposed for hierarchical cooperative caching networks. Finally, simulations are provided to verify the superiority of our proposed scheme and algorithms. It shows that the proposed algorithms effectively improve the performance of the considered system in terms of hit ratio and average delay, and narrow the gap to the optimal caching scheme comparing with the traditional schemes. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: Accepted by IEEE Transactions on Communications

arXiv:2304.00725 [pdf]

CG-3DSRGAN: A classification guided 3D generative adversarial network for image quality recovery from low-dose PET images

Authors: Yuxin Xue, Yige Peng, Lei Bi, Dagan Feng, **man Kim

Abstract: Positron emission tomography (PET) is the most sensitive molecular imaging modality routinely applied in our modern healthcare. High radioactivity caused by the injected tracer dose is a major concern in PET imaging and limits its clinical applications. However, reducing the dose leads to inadequate image quality for diagnostic practice. Motivated by the need to produce high quality images with mi… ▽ More Positron emission tomography (PET) is the most sensitive molecular imaging modality routinely applied in our modern healthcare. High radioactivity caused by the injected tracer dose is a major concern in PET imaging and limits its clinical applications. However, reducing the dose leads to inadequate image quality for diagnostic practice. Motivated by the need to produce high quality images with minimum low-dose, Convolutional Neural Networks (CNNs) based methods have been developed for high quality PET synthesis from its low-dose counterparts. Previous CNNs-based studies usually directly map low-dose PET into features space without consideration of different dose reduction level. In this study, a novel approach named CG-3DSRGAN (Classification-Guided Generative Adversarial Network with Super Resolution Refinement) is presented. Specifically, a multi-tasking coarse generator, guided by a classification head, allows for a more comprehensive understanding of the noise-level features present in the low-dose data, resulting in improved image synthesis. Moreover, to recover spatial details of standard PET, an auxiliary super resolution network - Contextual-Net - is proposed as a second-stage training to narrow the gap between coarse prediction and standard PET. We compared our method to the state-of-the-art methods on whole-body PET with different dose reduction factors (DRFs). Experiments demonstrate our method can outperform others on all DRF. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2302.11993 [pdf, other]

doi 10.1109/TWC.2023.3319442

xURLLC-Aware Service Provisioning in Vehicular Networks: A Semantic Communication Perspective

Authors: Le Xia, Yao Sun, Dusit Niyato, Daquan Feng, Lei Feng, Muhammad Ali Imran

Abstract: Semantic communication (SemCom), as an emerging paradigm focusing on meaning delivery, has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to wireless vehicular networks, which normally consume a tremendous amount of resources to meet stringent reliability and latency req… ▽ More Semantic communication (SemCom), as an emerging paradigm focusing on meaning delivery, has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to wireless vehicular networks, which normally consume a tremendous amount of resources to meet stringent reliability and latency requirements. Unfortunately, the unique background knowledge matching mechanism in SemCom makes it challenging to simultaneously realize efficient service provisioning for multiple users in vehicle-to-vehicle networks. To this end, this paper identifies and jointly addresses two fundamental problems of knowledge base construction (KBC) and vehicle service pairing (VSP) inherently existing in SemCom-enabled vehicular networks in alignment with the next-generation ultra-reliable and low-latency communication (xURLLC) requirements. Concretely, we first derive the knowledge matching based queuing latency specific for semantic data packets, and then formulate a latency-minimization problem subject to several KBC and VSP related reliability constraints. Afterward, a SemCom-empowered Service Supplying Solution (S$^{\text{4}}$) is proposed along with the theoretical analysis of its optimality guarantee and computational complexity. Numerical results demonstrate the superiority of S$^{\text{4}}$ in terms of average queuing latency, semantic data packet throughput, user knowledge matching degree and knowledge preference satisfaction compared with two benchmarks. △ Less

Submitted 23 September, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: This paper has been accepted for publication by IEEE Transactions on Wireless Communications. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2301.01732 [pdf, ps, other]

UNAEN: Unsupervised Abnormality Extraction Network for MRI Motion Artifact Reduction

Authors: Yusheng Zhou, Hao Li, Jianan Liu, Zhengmin Kong, Tao Huang, Euijoon Ahn, Zhihan Lv, **man Kim, David Dagan Feng

Abstract: Motion artifacts compromise the quality of magnetic resonance imaging (MRI) and pose challenges to achieving diagnostic outcomes and image-guided therapies. In recent years, supervised deep learning approaches have emerged as successful solutions for motion artifact reduction (MAR). One disadvantage of these methods is their dependency on acquiring paired sets of motion artifact-corrupted (MA-corr… ▽ More Motion artifacts compromise the quality of magnetic resonance imaging (MRI) and pose challenges to achieving diagnostic outcomes and image-guided therapies. In recent years, supervised deep learning approaches have emerged as successful solutions for motion artifact reduction (MAR). One disadvantage of these methods is their dependency on acquiring paired sets of motion artifact-corrupted (MA-corrupted) and motion artifact-free (MA-free) MR images for training purposes. Obtaining such image pairs is difficult and therefore limits the application of supervised training. In this paper, we propose a novel UNsupervised Abnormality Extraction Network (UNAEN) to alleviate this problem. Our network is capable of working with unpaired MA-corrupted and MA-free images. It converts the MA-corrupted images to MA-reduced images by extracting abnormalities from the MA-corrupted images using a proposed artifact extractor, which intercepts the residual artifact maps from the MA-corrupted MR images explicitly, and a reconstructor to restore the original input from the MA-reduced images. The performance of UNAEN was assessed by experimenting on various publicly available MRI datasets and comparing them with state-of-the-art methods. The quantitative evaluation demonstrates the superiority of UNAEN over alternative MAR methods and visually exhibits fewer residual artifacts. Our results substantiate the potential of UNAEN as a promising solution applicable in real-world clinical environments, with the capability to enhance diagnostic accuracy and facilitate image-guided therapies. △ Less

Submitted 11 August, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

arXiv:2212.05808 [pdf]

Z-SSMNet: A Zonal-aware Self-Supervised Mesh Network for Prostate Cancer Detection and Diagnosis in bpMRI

Authors: Yuan Yuan, Euijoon Ahn, Dagan Feng, Mohamad Khadra, **man Kim

Abstract: Prostate cancer (PCa) is one of the most prevalent cancers in men and many people around the world die from clinically significant PCa (csPCa). Early diagnosis of csPCa in bi-parametric MRI (bpMRI), which is non-invasive, cost-effective, and more efficient compared to multiparametric MRI (mpMRI), can contribute to precision care for PCa. The rapid rise in artificial intelligence (AI) algorithms ar… ▽ More Prostate cancer (PCa) is one of the most prevalent cancers in men and many people around the world die from clinically significant PCa (csPCa). Early diagnosis of csPCa in bi-parametric MRI (bpMRI), which is non-invasive, cost-effective, and more efficient compared to multiparametric MRI (mpMRI), can contribute to precision care for PCa. The rapid rise in artificial intelligence (AI) algorithms are enabling unprecedented improvements in providing decision support systems that can aid in csPCa diagnosis and understanding. However, existing state of the art AI algorithms which are based on deep learning technology are often limited to 2D images that fails to capture inter-slice correlations in 3D volumetric images. The use of 3D convolutional neural networks (CNNs) partly overcomes this limitation, but it does not adapt to the anisotropy of images, resulting in sub-optimal semantic representation and poor generalization. Furthermore, due to the limitation of the amount of labelled data of bpMRI and the difficulty of labelling, existing CNNs are built on relatively small datasets, leading to a poor performance. To address the limitations identified above, we propose a new Zonal-aware Self-supervised Mesh Network (Z-SSMNet) that adaptatively fuses multiple 2D, 2.5D and 3D CNNs to effectively balance representation for sparse inter-slice information and dense intra-slice information in bpMRI. A self-supervised learning (SSL) technique is further introduced to pre-train our network using unlabelled data to learn the generalizable image features. Furthermore, we constrained our network to understand the zonal specific domain knowledge to improve the diagnosis precision of csPCa. Experiments on the PI-CAI Challenge dataset demonstrate our proposed method achieves better performance for csPCa detection and diagnosis in bpMRI. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Comments: 8 pages, 1 figure, PI-CAI challenge

arXiv:2211.05409 [pdf]

doi 10.1007/978-3-031-27420-6_14

Radiomics-enhanced Deep Multi-task Learning for Outcome Prediction in Head and Neck Cancer

Authors: Mingyuan Meng, Lei Bi, Dagan Feng, **man Kim

Abstract: Outcome prediction is crucial for head and neck cancer patients as it can provide prognostic information for early treatment planning. Radiomics methods have been widely used for outcome prediction from medical images. However, these methods are limited by their reliance on intractable manual segmentation of tumor regions. Recently, deep learning methods have been proposed to perform end-to-end ou… ▽ More Outcome prediction is crucial for head and neck cancer patients as it can provide prognostic information for early treatment planning. Radiomics methods have been widely used for outcome prediction from medical images. However, these methods are limited by their reliance on intractable manual segmentation of tumor regions. Recently, deep learning methods have been proposed to perform end-to-end outcome prediction so as to remove the reliance on manual segmentation. Unfortunately, without segmentation masks, these methods will take the whole image as input, such that makes them difficult to focus on tumor regions and potentially unable to fully leverage the prognostic information within the tumor regions. In this study, we propose a radiomics-enhanced deep multi-task framework for outcome prediction from PET/CT images, in the context of HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022). In our framework, our novelty is to incorporate radiomics as an enhancement to our recently proposed Deep Multi-task Survival model (DeepMTS). The DeepMTS jointly learns to predict the survival risk scores of patients and the segmentation masks of tumor regions. Radiomics features are extracted from the predicted tumor regions and combined with the predicted survival risk scores for final outcome prediction, through which the prognostic information in tumor regions can be further leveraged. Our method achieved a C-index of 0.681 on the testing set, placing the 2nd on the leaderboard with only 0.00068 lower in C-index than the 1st place. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022)

Journal ref: Head and Neck Tumor Segmentation and Outcome Prediction (HECKTOR 2022), pp.135-143

arXiv:2211.01241 [pdf, other]

doi 10.1109/MWC.004.2200393

WiserVR: Semantic Communication Enabled Wireless Virtual Reality Delivery

Authors: Le Xia, Yao Sun, Chengsi Liang, Daquan Feng, Runze Cheng, Yang Yang, Muhammad Ali Imran

Abstract: Virtual reality (VR) over wireless is expected to be one of the killer applications in next-generation communication networks. Nevertheless, the huge data volume along with stringent requirements on latency and reliability under limited bandwidth resources makes untethered wireless VR delivery increasingly challenging. Such bottlenecks, therefore, motivate this work to seek the potential of using… ▽ More Virtual reality (VR) over wireless is expected to be one of the killer applications in next-generation communication networks. Nevertheless, the huge data volume along with stringent requirements on latency and reliability under limited bandwidth resources makes untethered wireless VR delivery increasingly challenging. Such bottlenecks, therefore, motivate this work to seek the potential of using semantic communication, a new paradigm that promises to significantly ease the resource pressure, for efficient VR delivery. To this end, we propose a novel framework, namely WIreless SEmantic deliveRy for VR (WiserVR), for delivering consecutive 360° video frames to VR users. Specifically, deep learning-based multiple modules are well-devised for the transceiver in WiserVR to realize high-performance feature extraction and semantic recovery. Among them, we dedicatedly develop a concept of semantic location graph and leverage the joint-semantic-channel-coding method with knowledge sharing to not only substantially reduce communication latency, but also to guarantee adequate transmission reliability and resilience under various channel states. Moreover, implementation of WiserVR is presented, followed by corresponding initial simulations for performance evaluation compared with benchmarks. Finally, we discuss several open issues and offer feasible solutions to unlock the full potential of WiserVR. △ Less

Submitted 13 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: This magazine article has been accepted for publication by IEEE Wireless Communications. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2210.15808 [pdf]

Hyper-Connected Transformer Network for Multi-Modality PET-CT Segmentation

Authors: Lei Bi, Michael Fulham, Shaoli Song, David Dagan Feng, **man Kim

Abstract: [18F]-Fluorodeoxyglucose (FDG) positron emission tomography - computed tomography (PET-CT) has become the imaging modality of choice for diagnosing many cancers. Co-learning complementary PET-CT imaging features is a fundamental requirement for automatic tumor segmentation and for develo** computer aided cancer diagnosis systems. In this study, we propose a hyper-connected transformer (HCT) netw… ▽ More [18F]-Fluorodeoxyglucose (FDG) positron emission tomography - computed tomography (PET-CT) has become the imaging modality of choice for diagnosing many cancers. Co-learning complementary PET-CT imaging features is a fundamental requirement for automatic tumor segmentation and for develo** computer aided cancer diagnosis systems. In this study, we propose a hyper-connected transformer (HCT) network that integrates a transformer network (TN) with a hyper connected fusion for multi-modality PET-CT images. The TN was leveraged for its ability to provide global dependencies in image feature learning, which was achieved by using image patch embeddings with a self-attention mechanism to capture image-wide contextual information. We extended the single-modality definition of TN with multiple TN based branches to separately extract image features. We also introduced a hyper connected fusion to fuse the contextual and complementary image features across multiple transformers in an iterative manner. Our results with two clinical datasets show that HCT achieved better performance in segmentation accuracy when compared to the existing methods. △ Less

Submitted 7 August, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: EMBC 2023

arXiv:2209.07705 [pdf, other]

Automatic Tumor Segmentation via False Positive Reduction Network for Whole-Body Multi-Modal PET/CT Images

Authors: Yige Peng, **man Kim, Dagan Feng, Lei Bi

Abstract: Multi-modality Fluorodeoxyglucose (FDG) positron emission tomography / computed tomography (PET/CT) has been routinely used in the assessment of common cancers, such as lung cancer, lymphoma, and melanoma. This is mainly attributed to the fact that PET/CT combines the high sensitivity for tumor detection of PET and anatomical information from CT. In PET/CT image assessment, automatic tumor segment… ▽ More Multi-modality Fluorodeoxyglucose (FDG) positron emission tomography / computed tomography (PET/CT) has been routinely used in the assessment of common cancers, such as lung cancer, lymphoma, and melanoma. This is mainly attributed to the fact that PET/CT combines the high sensitivity for tumor detection of PET and anatomical information from CT. In PET/CT image assessment, automatic tumor segmentation is an important step, and in recent years, deep learning based methods have become the state-of-the-art. Unfortunately, existing methods tend to over-segment the tumor regions and include regions such as the normal high uptake organs, inflammation, and other infections. In this study, we introduce a false positive reduction network to overcome this limitation. We firstly introduced a self-supervised pre-trained global segmentation module to coarsely delineate the candidate tumor regions using a self-supervised pre-trained encoder. The candidate tumor regions were then refined by removing false positives via a local refinement module. Our experiments with the MICCAI 2022 Automated Lesion Segmentation in Whole-Body FDG-PET/CT (AutoPET) challenge dataset showed that our method achieved a dice score of 0.9324 with the preliminary testing data and was ranked 1st place in dice on the leaderboard. Our method was also ranked in the top 7 methods on the final testing data, the final ranking will be announced during the 2022 MICCAI AutoPET workshop. Our code is available at: https://github.com/YigePeng/AutoPET_False_Positive_Reduction. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: Pre-print paper for 2022 MICCAI AutoPET Challenge

arXiv:2205.06891 [pdf, ps, other]

doi 10.1109/TAI.2024.3397292

Unsupervised Representation Learning for 3D MRI Super Resolution with Degradation Adaptation

Authors: Jianan Liu, Hao Li, Tao Huang, Euijoon Ahn, Kang Han, Adeel Razi, Wei Xiang, **man Kim, David Dagan Feng

Abstract: High-resolution (HR) magnetic resonance imaging is critical in aiding doctors in their diagnoses and image-guided treatments. However, acquiring HR images can be time-consuming and costly. Consequently, deep learning-based super-resolution reconstruction (SRR) has emerged as a promising solution for generating super-resolution (SR) images from low-resolution (LR) images. Unfortunately, training su… ▽ More High-resolution (HR) magnetic resonance imaging is critical in aiding doctors in their diagnoses and image-guided treatments. However, acquiring HR images can be time-consuming and costly. Consequently, deep learning-based super-resolution reconstruction (SRR) has emerged as a promising solution for generating super-resolution (SR) images from low-resolution (LR) images. Unfortunately, training such neural networks requires aligned authentic HR and LR image pairs, which are challenging to obtain due to patient movements during and between image acquisitions. While rigid movements of hard tissues can be corrected with image registration, aligning deformed soft tissues is complex, making it impractical to train neural networks with authentic HR and LR image pairs. Previous studies have focused on SRR using authentic HR images and down-sampled synthetic LR images. However, the difference in degradation representations between synthetic and authentic LR images suppresses the quality of SR images reconstructed from authentic LR images. To address this issue, we propose a novel Unsupervised Degradation Adaptation Network (UDEAN). Our network consists of a degradation learning network and an SRR network. The degradation learning network downsamples the HR images using the degradation representation learned from the misaligned or unpaired LR images. The SRR network then learns the map** from the down-sampled HR images to the original ones. Experimental results show that our method outperforms state-of-the-art networks and is a promising solution to the challenges in clinical settings. △ Less

Submitted 24 April, 2024; v1 submitted 13 May, 2022; originally announced May 2022.

Comments: Accepted by IEEE Transactions on Artificial Intelligence

arXiv:2203.02384 [pdf, other]

AutoMO-Mixer: An automated multi-objective Mixer model for balanced, safe and robust prediction in medicine

Authors: Xi Chen, Jiahuan Lv, Dehua Feng, Xuanqin Mou, Ling Bai, Shu Zhang, Zhiguo Zhou

Abstract: Accurately identifying patient's status through medical images plays an important role in diagnosis and treatment. Artificial intelligence (AI), especially the deep learning, has achieved great success in many fields. However, more reliable AI model is needed in image guided diagnosis and therapy. To achieve this goal, develo** a balanced, safe and robust model with a unified framework is desira… ▽ More Accurately identifying patient's status through medical images plays an important role in diagnosis and treatment. Artificial intelligence (AI), especially the deep learning, has achieved great success in many fields. However, more reliable AI model is needed in image guided diagnosis and therapy. To achieve this goal, develo** a balanced, safe and robust model with a unified framework is desirable. In this study, a new unified model termed as automated multi-objective Mixer (AutoMO-Mixer) model was developed, which utilized a recent developed multiple layer perceptron Mixer (MLP-Mixer) as base. To build a balanced model, sensitivity and specificity were considered as the objective functions simultaneously in training stage. Meanwhile, a new evidential reasoning based on entropy was developed to achieve a safe and robust model in testing stage. The experiment on an optical coherence tomography dataset demonstrated that AutoMO-Mixer can obtain safer, more balanced, and robust results compared with MLP-Mixer and other available models. △ Less

Submitted 4 March, 2022; originally announced March 2022.

arXiv:2112.12424 [pdf, other]

Complexity-Oriented Per-shot Video Coding Optimization

Authors: Hongcheng Zhong, Jun Xu, Chen Zhu, Donghui Feng, Li Song

Abstract: Current per-shot encoding schemes aim to improve the compression efficiency by shot-level optimization. It splits a source video sequence into shots and imposes optimal sets of encoding parameters to each shot. Per-shot encoding achieved approximately 20% bitrate savings over baseline fixed QP encoding at the expense of pre-processing complexity. However, the adjustable parameter space of the curr… ▽ More Current per-shot encoding schemes aim to improve the compression efficiency by shot-level optimization. It splits a source video sequence into shots and imposes optimal sets of encoding parameters to each shot. Per-shot encoding achieved approximately 20% bitrate savings over baseline fixed QP encoding at the expense of pre-processing complexity. However, the adjustable parameter space of the current per-shot encoding schemes only has spatial resolution and QP/CRF, resulting in a lack of encoding flexibility. In this paper, we extend the per-shot encoding framework in the complexity dimension. We believe that per-shot encoding with flexible complexity will help in deploying user-generated content. We propose a rate-distortion-complexity optimization process for encoders and a methodology to determine the coding parameters under the constraints of complexities and bitrate ladders. Experimental results show that our proposed method achieves complexity constraints ranging from 100% to 3% in a dense form compared to the slowest per-shot anchor. With similar complexities of the per-shot scheme fixed in specific presets, our proposed method achieves BDrate gain up to -19.17%. △ Less

Submitted 23 December, 2021; originally announced December 2021.

arXiv:2112.06979 [pdf, other]

The Brain Tumor Sequence Registration (BraTS-Reg) Challenge: Establishing Correspondence Between Pre-Operative and Follow-up MRI Scans of Diffuse Glioma Patients

Authors: Bhakti Baheti, Satrajit Chakrabarty, Hamed Akbari, Michel Bilello, Benedikt Wiestler, Julian Schwarting, Evan Calabrese, Jeffrey Rudie, Syed Abidi, Mina Mousa, Javier Villanueva-Meyer, Brandon K. K. Fields, Florian Kofler, Russell Takeshi Shinohara, Juan Eugenio Iglesias, Tony C. W. Mok, Albert C. S. Chung, Marek Wodzinski, Artur Jurgas, Niccolo Marini, Manfredo Atzori, Henning Muller, Christoph Grobroehmer, Hanna Siebert, Lasse Hansen , et al. (48 additional authors not shown)

Abstract: Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance. Although there has been progress in develo** general-purpose medical image registration techniques, they have not yet attained the requisite precision and reliability for this task, highlighting its inherent complexity. Here we describe the Brain Tumor Sequence Registr… ▽ More Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance. Although there has been progress in develo** general-purpose medical image registration techniques, they have not yet attained the requisite precision and reliability for this task, highlighting its inherent complexity. Here we describe the Brain Tumor Sequence Registration (BraTS-Reg) challenge, as the first public benchmark environment for deformable registration algorithms focusing on estimating correspondences between pre-operative and follow-up scans of the same patient diagnosed with a diffuse brain glioma. The BraTS-Reg data comprise de-identified multi-institutional multi-parametric MRI (mpMRI) scans, curated for size and resolution according to a canonical anatomical template, and divided into training, validation, and testing sets. Clinical experts annotated ground truth (GT) landmark points of anatomical locations distinct across the temporal domain. Quantitative evaluation and ranking were based on the Median Euclidean Error (MEE), Robustness, and the determinant of the Jacobian of the displacement field. The top-ranked methodologies yielded similar performance across all evaluation metrics and shared several methodological commonalities, including pre-alignment, deep neural networks, inverse consistency analysis, and test-time instance optimization per-case basis as a post-processing step. The top-ranked method attained the MEE at or below that of the inter-rater variability for approximately 60% of the evaluated landmarks, underscoring the scope for further accuracy and robustness improvements, especially relative to human experts. The aim of BraTS-Reg is to continue to serve as an active resource for research, with the data and online evaluation tools accessible at https://bratsreg.github.io/. △ Less

Submitted 17 April, 2024; v1 submitted 13 December, 2021; originally announced December 2021.

arXiv:2111.10635 [pdf, other]

HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments

Authors: Ji Liu, Zhihua Wu, Dianhai Yu, Yanjun Ma, Danlei Feng, Minxu Zhang, Xinxuan Wu, Xuefeng Yao, De**g Dou

Abstract: Deep neural networks (DNNs) exploit many layers and a large number of parameters to achieve excellent performance. The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high Input/Output (IO) cost, while some layers are compute-intensive. The training process generally exploits distributed computing resources to reduce training time. In… ▽ More Deep neural networks (DNNs) exploit many layers and a large number of parameters to achieve excellent performance. The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high Input/Output (IO) cost, while some layers are compute-intensive. The training process generally exploits distributed computing resources to reduce training time. In addition, heterogeneous computing resources, e.g., CPUs, GPUs of multiple types, are available for the distributed training process. Thus, the scheduling of multiple layers to diverse computing resources is critical for the training process. To efficiently train a DNN model using the heterogeneous computing resources, we propose a distributed framework, i.e., Paddle-Heterogeneous Parameter Server (Paddle-HeterPS), composed of a distributed architecture and a Reinforcement Learning (RL)-based scheduling method. The advantages of Paddle-HeterPS are three-fold compared with existing frameworks. First, Paddle-HeterPS enables efficient training process of diverse workloads with heterogeneous computing resources. Second, Paddle-HeterPS exploits an RL-based method to efficiently schedule the workload of each layer to appropriate computing resources to minimize the cost while satisfying throughput constraints. Third, Paddle-HeterPS manages data storage and data communication among distributed computing resources. We carry out extensive experiments to show that Paddle-HeterPS significantly outperforms state-of-the-art approaches in terms of throughput (14.5 times higher) and monetary cost (312.3% smaller). The codes of the framework are publicly available at: https://github.com/PaddlePaddle/Paddle. △ Less

Submitted 7 June, 2023; v1 submitted 20 November, 2021; originally announced November 2021.

Comments: 14 pages, 11 figures, 2 tables; To appear in Future Generation Computer Systems (FGCS)

arXiv:2109.14805 [pdf, other]

Unsupervised Landmark Detection Based Spatiotemporal Motion Estimation for 4D Dynamic Medical Images

Authors: Yuyu Guo, Lei Bi, Dongming Wei, Liyun Chen, Zhengbin Zhu, Dagan Feng, Ruiyan Zhang, Qian Wang, **man Kim

Abstract: Motion estimation is a fundamental step in dynamic medical image processing for the assessment of target organ anatomy and function. However, existing image-based motion estimation methods, which optimize the motion field by evaluating the local image similarity, are prone to produce implausible estimation, especially in the presence of large motion. In this study, we provide a novel motion estima… ▽ More Motion estimation is a fundamental step in dynamic medical image processing for the assessment of target organ anatomy and function. However, existing image-based motion estimation methods, which optimize the motion field by evaluating the local image similarity, are prone to produce implausible estimation, especially in the presence of large motion. In this study, we provide a novel motion estimation framework of Dense-Sparse-Dense (DSD), which comprises two stages. In the first stage, we process the raw dense image to extract sparse landmarks to represent the target organ anatomical topology and discard the redundant information that is unnecessary for motion estimation. For this purpose, we introduce an unsupervised 3D landmark detection network to extract spatially sparse but representative landmarks for the target organ motion estimation. In the second stage, we derive the sparse motion displacement from the extracted sparse landmarks of two images of different time points. Then, we present a motion reconstruction network to construct the motion field by projecting the sparse landmarks displacement back into the dense image domain. Furthermore, we employ the estimated motion field from our two-stage DSD framework as initialization and boost the motion estimation quality in light-weight yet effective iterative optimization. We evaluate our method on two dynamic medical imaging tasks to model cardiac motion and lung respiratory motion, respectively. Our method has produced superior motion estimation accuracy compared to existing comparative methods. Besides, the extensive experimental results demonstrate that our solution can extract well representative anatomical landmarks without any requirement of manual annotation. Our code is publicly available online. △ Less

Submitted 7 November, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: accepted by IEEE Transactions on Cybernetics

arXiv:2109.07711 [pdf]

doi 10.1109/JBHI.2022.3181791

DeepMTS: Deep Multi-task Learning for Survival Prediction in Patients with Advanced Nasopharyngeal Carcinoma using Pretreatment PET/CT

Authors: Mingyuan Meng, Bingxin Gu, Lei Bi, Shaoli Song, David Dagan Feng, **man Kim

Abstract: Nasopharyngeal Carcinoma (NPC) is a malignant epithelial cancer arising from the nasopharynx. Survival prediction is a major concern for NPC patients, as it provides early prognostic information to plan treatments. Recently, deep survival models based on deep learning have demonstrated the potential to outperform traditional radiomics-based survival prediction models. Deep survival models usually… ▽ More Nasopharyngeal Carcinoma (NPC) is a malignant epithelial cancer arising from the nasopharynx. Survival prediction is a major concern for NPC patients, as it provides early prognostic information to plan treatments. Recently, deep survival models based on deep learning have demonstrated the potential to outperform traditional radiomics-based survival prediction models. Deep survival models usually use image patches covering the whole target regions (e.g., nasopharynx for NPC) or containing only segmented tumor regions as the input. However, the models using the whole target regions will also include non-relevant background information, while the models using segmented tumor regions will disregard potentially prognostic information existing out of primary tumors (e.g., local lymph node metastasis and adjacent tissue invasion). In this study, we propose a 3D end-to-end Deep Multi-Task Survival model (DeepMTS) for joint survival prediction and tumor segmentation in advanced NPC from pretreatment PET/CT. Our novelty is the introduction of a hard-sharing segmentation backbone to guide the extraction of local features related to the primary tumors, which reduces the interference from non-relevant background information. In addition, we also introduce a cascaded survival network to capture the prognostic information existing out of primary tumors and further leverage the global tumor information (e.g., tumor size, shape, and locations) derived from the segmentation backbone. Our experiments with two clinical datasets demonstrate that our DeepMTS can consistently outperform traditional radiomics-based survival prediction models and existing deep survival models. △ Less

Submitted 7 June, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

Comments: Accepted at IEEE Journal of Biomedical and Health Informatics (JBHI)

Journal ref: IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 9, pp. 4497-4507, 2022

arXiv:2107.06170 [pdf]

Robust Blind Source Separation by Soft Decision-Directed Non-Unitary Joint Diagonalization

Authors: Wenjuan Liu, Dazheng Feng, Bingnan Pei, Mengdao Xing, Xinhong Meng, Qianru Wei

Abstract: Approximate joint diagonalization of a set of matrices provides a powerful framework for numerous statistical signal processing applications. For non-unitary joint diagonalization (NUJD) based on the least-squares (LS) criterion, outliers, also referred to as anomaly or discordant observations, have a negative influence on the performance, since squaring the residuals magnifies the effects of them… ▽ More Approximate joint diagonalization of a set of matrices provides a powerful framework for numerous statistical signal processing applications. For non-unitary joint diagonalization (NUJD) based on the least-squares (LS) criterion, outliers, also referred to as anomaly or discordant observations, have a negative influence on the performance, since squaring the residuals magnifies the effects of them. To solve this problem, we propose a novel cost function that incorporates the soft decision-directed scheme into the least-squares algorithm and develops an efficient algorithm. The influence of the outliers is mitigated by applying decision-directed weights which are associated with the residual error at each iterative step. Specifically, the mixing matrix is estimated by a modified stationary point method, in which the updating direction is determined based on the linear approximation to the gradient function. Simulation results demonstrate that the proposed algorithm outperforms conventional non-unitary diagonalization algorithms in terms of both convergence performance and robustness to outliers. △ Less

Submitted 28 June, 2021; originally announced July 2021.

Comments: 19 pages, 9 figures

arXiv:2104.11416 [pdf]

doi 10.1088/1361-6560/ac3d17

Predicting Distant Metastases in Soft-Tissue Sarcomas from PET-CT scans using Constrained Hierarchical Multi-Modality Feature Learning

Authors: Yige Peng, Lei Bi, Ashnil Kumar, Michael Fulham, Dagan Feng, **man Kim

Abstract: Distant metastases (DM) refer to the dissemination of tumors, usually, beyond the organ where the tumor originated. They are the leading cause of death in patients with soft-tissue sarcomas (STSs). Positron emission tomography-computed tomography (PET-CT) is regarded as the imaging modality of choice for the management of STSs. It is difficult to determine from imaging studies which STS patients w… ▽ More Distant metastases (DM) refer to the dissemination of tumors, usually, beyond the organ where the tumor originated. They are the leading cause of death in patients with soft-tissue sarcomas (STSs). Positron emission tomography-computed tomography (PET-CT) is regarded as the imaging modality of choice for the management of STSs. It is difficult to determine from imaging studies which STS patients will develop metastases. 'Radiomics' refers to the extraction and analysis of quantitative features from medical images and it has been employed to help identify such tumors. The state-of-the-art in radiomics is based on convolutional neural networks (CNNs). Most CNNs are designed for single-modality imaging data (CT or PET alone) and do not exploit the information embedded in PET-CT where there is a combination of an anatomical and functional imaging modality. Furthermore, most radiomic methods rely on manual input from imaging specialists for tumor delineation, definition and selection of radiomic features. This approach, however, may not be scalable to tumors with complex boundaries and where there are multiple other sites of disease. We outline a new 3D CNN to help predict DM in STS patients from PET-CT data. The 3D CNN uses a constrained feature learning module and a hierarchical multi-modality feature learning module that leverages the complementary information from the modalities to focus on semantically important regions. Our results on a public PET-CT dataset of STS patients show that multi-modal information improves the ability to identify those patients who develop DM. Further our method outperformed all other related state-of-the-art methods. △ Less

Submitted 23 April, 2021; originally announced April 2021.

Comments: Under Review

arXiv:2103.05220 [pdf]

doi 10.3389/fonc.2022.899351

Prediction of 5-year Progression-Free Survival in Advanced Nasopharyngeal Carcinoma with Pretreatment PET/CT using Multi-Modality Deep Learning-based Radiomics

Authors: Bingxin Gu, Mingyuan Meng, Lei Bi, **man Kim, David Dagan Feng, Shaoli Song

Abstract: Objective: Deep Learning-based Radiomics (DLR) has achieved great success in medical image analysis and has been considered a replacement for conventional radiomics that relies on handcrafted features. In this study, we aimed to explore the capability of DLR for the prediction of 5-year Progression-Free Survival (PFS) in Nasopharyngeal Carcinoma (NPC) using pretreatment PET/CT. Methods: A total of… ▽ More Objective: Deep Learning-based Radiomics (DLR) has achieved great success in medical image analysis and has been considered a replacement for conventional radiomics that relies on handcrafted features. In this study, we aimed to explore the capability of DLR for the prediction of 5-year Progression-Free Survival (PFS) in Nasopharyngeal Carcinoma (NPC) using pretreatment PET/CT. Methods: A total of 257 patients (170/87 in internal/external cohorts) with advanced NPC (TNM stage III or IVa) were enrolled. We developed an end-to-end multi-modality DLR model, in which a 3D convolutional neural network was optimized to extract deep features from pretreatment PET/CT images and predict the probability of 5-year PFS. TNM stage, as a high-level clinical feature, could be integrated into our DLR model to further improve the prognostic performance. To compare conventional radiomics and DLR, 1456 handcrafted features were extracted, and optimal conventional radiomics methods were selected from 54 cross-combinations of 6 feature selection methods and 9 classification methods. In addition, risk group stratification was performed with clinical signature, conventional radiomics signature, and DLR signature. Results: Our multi-modality DLR model using both PET and CT achieved higher prognostic performance than the optimal conventional radiomics method. Furthermore, the multi-modality DLR model outperformed single-modality DLR models using only PET or only CT. For risk group stratification, the conventional radiomics signature and DLR signature enabled significant differences between the high- and low-risk patient groups in both internal and external cohorts, while the clinical signature failed in the external cohort. Conclusion: Our study identified potential prognostic tools for survival prediction in advanced NPC, suggesting that DLR could provide complementary values to the current TNM staging. △ Less

Submitted 4 July, 2022; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: Accepted at Frontiers in Oncology

Journal ref: Frontiers in Oncology, vol. 12, pp. 899352, 2022

arXiv:2103.05213 [pdf]

doi 10.1016/j.neuroimage.2022.119444

Enhancing Medical Image Registration via Appearance Adjustment Networks

Authors: Mingyuan Meng, Lei Bi, Michael Fulham, David Dagan Feng, **man Kim

Abstract: Deformable image registration is fundamental for many medical image analyses. A key obstacle for accurate image registration lies in image appearance variations such as the variations in texture, intensities, and noise. These variations are readily apparent in medical images, especially in brain images where registration is frequently used. Recently, deep learning-based registration methods (DLRs)… ▽ More Deformable image registration is fundamental for many medical image analyses. A key obstacle for accurate image registration lies in image appearance variations such as the variations in texture, intensities, and noise. These variations are readily apparent in medical images, especially in brain images where registration is frequently used. Recently, deep learning-based registration methods (DLRs), using deep neural networks, have shown computational efficiency that is several orders of magnitude faster than traditional optimization-based registration methods (ORs). DLRs rely on a globally optimized network that is trained with a set of training samples to achieve faster registration. DLRs tend, however, to disregard the target-pair-specific optimization inherent in ORs and thus have degraded adaptability to variations in testing samples. This limitation is severe for registering medical images with large appearance variations, especially since few existing DLRs explicitly take into account appearance variations. In this study, we propose an Appearance Adjustment Network (AAN) to enhance the adaptability of DLRs to appearance variations. Our AAN, when integrated into a DLR, provides appearance transformations to reduce the appearance variations during registration. In addition, we propose an anatomy-constrained loss function through which our AAN generates anatomy-preserving transformations. Our AAN has been purposely designed to be readily inserted into a wide range of DLRs and can be trained cooperatively in an unsupervised and end-to-end manner. We evaluated our AAN with three state-of-the-art DLRs on three well-established public datasets of 3D brain magnetic resonance imaging (MRI). The results show that our AAN consistently improved existing DLRs and outperformed state-of-the-art ORs on registration accuracy, while adding a fractional computational load to existing DLRs. △ Less

Submitted 3 July, 2022; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: Published at NeuroImage

Journal ref: NeuroImage, vol. 259, pp. 119444, 2022

arXiv:2102.02998 [pdf, other]

Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output

Authors: Hangting Chen, Yang Yi, Dang Feng, Pengyuan Zhang

Abstract: Time-domain audio separation network (TasNet) has achieved remarkable performance in blind source separation (BSS). Classic multi-channel speech processing framework employs signal estimation and beamforming. For example, Beam-TasNet links multi-channel convolutional TasNet (MC-Conv-TasNet) with minimum variance distortionless response (MVDR) beamforming, which leverages the strong modeling abilit… ▽ More Time-domain audio separation network (TasNet) has achieved remarkable performance in blind source separation (BSS). Classic multi-channel speech processing framework employs signal estimation and beamforming. For example, Beam-TasNet links multi-channel convolutional TasNet (MC-Conv-TasNet) with minimum variance distortionless response (MVDR) beamforming, which leverages the strong modeling ability of data-driven network and boosts the performance of beamforming with an accurate estimation of speech statistics. Such integration can be viewed as a directed acyclic graph by accepting multi-channel input and generating multi-source output. In this paper, we design a "multi-channel input, multi-channel multi-source output" (MIMMO) speech separation system entitled "Beam-Guided TasNet", where MC-Conv-TasNet and MVDR can interact and promote each other more compactly under a directed cyclic flow. Specifically, the first stage uses Beam-TasNet to generate estimated single-speaker signals, which favors the separation in the second stage. The proposed framework facilitates iterative signal refinement with the guide of beamforming and seeks to reach the upper bound of the MVDR-based methods. Experimental results on the spatialized WSJ0-2MIX demonstrate that the Beam-Guided TasNet has achieved an SDR of 21.5 dB, exceeding the baseline Beam-TasNet by 4.1 dB under the same model size and narrowing the gap with the oracle signal-based MVDR to 2 dB. △ Less

Submitted 12 April, 2022; v1 submitted 4 February, 2021; originally announced February 2021.

Comments: Submitted to Inerspeech 2022

arXiv:2012.12472 [pdf, ps, other]

Understanding Age of Information in Large-Scale Wireless Networks

Authors: Howard H. Yang, Chao Xu, Xijun Wang, Daquan Feng, Tony Q. S. Quek

Abstract: The notion of age-of-information (AoI) is investigated in the context of large-scale wireless networks, in which transmitters need to send a sequence of information packets, which are generated as independent Bernoulli processes, to their intended receivers over a shared spectrum. Due to interference, the rate of packet depletion at any given node is entangled with both the spatial configurations,… ▽ More The notion of age-of-information (AoI) is investigated in the context of large-scale wireless networks, in which transmitters need to send a sequence of information packets, which are generated as independent Bernoulli processes, to their intended receivers over a shared spectrum. Due to interference, the rate of packet depletion at any given node is entangled with both the spatial configurations, which determine the path loss, and temporal dynamics, which influence the active states, of the other transmitters, resulting in the queues to interact with each other in both space and time over the entire network. To that end, variants in the packet update frequency affect not just the inter-arrival time but also the departure process, and the impact of such phenomena on the AoI is not well understood. In this paper, we establish a theoretical framework to characterize the AoI performance in the aforementioned setting. Particularly, tractable expressions are derived for both the peak and average AoI under two different transmission protocols, namely the FCFS and the LCFS-PR. Based on the theoretical outcomes, we find that: i) networks operating under LCFS-PR are able to attain smaller values of peak and average AoI than that under FCFS, whereas the gain is more pronounced when the infrastructure is densely deployed, ii) in sparsely deployed networks, ALOHA with a universally designed channel access probability is not instrumental in reducing the AoI, thus calling for more advanced channel access approaches, and iii) when the infrastructure is densely rolled out, there exists a non-trivial ALOHA channel access probability that minimizes the peak and average AoI under both FCFS and LCFS-PR. △ Less

Submitted 22 December, 2020; originally announced December 2020.

arXiv:2007.06002 [pdf, other]

Multi-Modality Information Fusion for Radiomics-based Neural Architecture Search

Authors: Yige Peng, Lei Bi, Michael Fulham, Dagan Feng, **man Kim

Abstract: 'Radiomics' is a method that extracts mineable quantitative features from radiographic images. These features can then be used to determine prognosis, for example, predicting the development of distant metastases (DM). Existing radiomics methods, however, require complex manual effort including the design of hand-crafted radiomic features and their extraction and selection. Recent radiomics method… ▽ More 'Radiomics' is a method that extracts mineable quantitative features from radiographic images. These features can then be used to determine prognosis, for example, predicting the development of distant metastases (DM). Existing radiomics methods, however, require complex manual effort including the design of hand-crafted radiomic features and their extraction and selection. Recent radiomics methods, based on convolutional neural networks (CNNs), also require manual input in network architecture design and hyper-parameter tuning. Radiomic complexity is further compounded when there are multiple imaging modalities, for example, combined positron emission tomography - computed tomography (PET-CT) where there is functional information from PET and complementary anatomical localization information from computed tomography (CT). Existing multi-modality radiomics methods manually fuse the data that are extracted separately. Reliance on manual fusion often results in sub-optimal fusion because they are dependent on an 'expert's' understanding of medical images. In this study, we propose a multi-modality neural architecture search method (MM-NAS) to automatically derive optimal multi-modality image features for radiomics and thus negate the dependence on a manual process. We evaluated our MM-NAS on the ability to predict DM using a public PET-CT dataset of patients with soft-tissue sarcomas (STSs). Our results show that our MM-NAS had a higher prediction accuracy when compared to state-of-the-art radiomics methods. △ Less

Submitted 12 July, 2020; originally announced July 2020.

Comments: Accepted by MICCAI 2020

arXiv:2002.12680 [pdf, other]

A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image

Authors: Yuyu Guo, Lei Bi, Euijoon Ahn, Dagan Feng, Qian Wang, **man Kim

Abstract: Dynamic medical imaging is usually limited in application due to the large radiation doses and longer image scanning and reconstruction times. Existing methods attempt to reduce the dynamic sequence by interpolating the volumes between the acquired image volumes. However, these methods are limited to either 2D images and/or are unable to support large variations in the motion between the image vol… ▽ More Dynamic medical imaging is usually limited in application due to the large radiation doses and longer image scanning and reconstruction times. Existing methods attempt to reduce the dynamic sequence by interpolating the volumes between the acquired image volumes. However, these methods are limited to either 2D images and/or are unable to support large variations in the motion between the image volume sequences. In this paper, we present a spatiotemporal volumetric interpolation network (SVIN) designed for 4D dynamic medical images. SVIN introduces dual networks: first is the spatiotemporal motion network that leverages the 3D convolutional neural network (CNN) for unsupervised parametric volumetric registration to derive spatiotemporal motion field from two-image volumes; the second is the sequential volumetric interpolation network, which uses the derived motion field to interpolate image volumes, together with a new regression-based module to characterize the periodic motion cycles in functional organ structures. We also introduce an adaptive multi-scale architecture to capture the volumetric large anatomy motions. Experimental results demonstrated that our SVIN outperformed state-of-the-art temporal medical interpolation methods and natural video interpolation methods that have been extended to support volumetric images. Our ablation study further exemplified that our motion network was able to better represent the large functional motion compared with the state-of-the-art unsupervised medical registration methods. △ Less

Submitted 24 April, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

Comments: 10 pages, 8 figures, Conference on Computer Vision and Pattern Recognition (CVPR) 2020

arXiv:1911.10468 [pdf]

Extending the dynamic strain sensing rang of phase-OTDR with frequency modulation pulse and frequency interrogation

Authors: **gdong Zhang, Haoting Wu, **gsheng Huang, Hua Zheng, Danqi Feng, Guolu Yin, Tao Zhu

Abstract: We propose and experimentally demonstrate a technique to extend the dynamic sensing range of phase sensitive optical time domain reflectometry system based on the frequency interrogation. Benefitting from the range Doppler coupling feature, the frequency modulation pulse is capable of measuring the frequency shift induced by the dynamic strain, thus the large dynamic strain can be recovered. The p… ▽ More We propose and experimentally demonstrate a technique to extend the dynamic sensing range of phase sensitive optical time domain reflectometry system based on the frequency interrogation. Benefitting from the range Doppler coupling feature, the frequency modulation pulse is capable of measuring the frequency shift induced by the dynamic strain, thus the large dynamic strain can be recovered. The performance of the proposed method is experimentally evaluated by comparing it with phase unwrap**. The strain sensing rang can at least be increased by factor of hundreds, and fast dynamic strain with peak to peak 130 microstrain vibration frequency of 20 kHz is measured. △ Less

Submitted 24 November, 2019; originally announced November 2019.

arXiv:1909.00971 [pdf]

Load Forecasting Model and Day-ahead Operation Strategy for City-located EV Quick Charge Stations

Authors: Zeyu Liu, Yaxin Xie, Donghan Feng, Yun Zhou, Shanshan Shi, Chen Fang

Abstract: Charging demands of electric vehicles (EVs) are sharply increasing due to the rapid development of EVs. Hence, reliable and convenient quick charge stations are required to respond to the needs of EV drivers. Due to the uncertainty of EV charging loads, load forecasting becomes vital for the operation of quick charge stations to formulate the day-ahead plan. In this paper, based on trip chain theo… ▽ More Charging demands of electric vehicles (EVs) are sharply increasing due to the rapid development of EVs. Hence, reliable and convenient quick charge stations are required to respond to the needs of EV drivers. Due to the uncertainty of EV charging loads, load forecasting becomes vital for the operation of quick charge stations to formulate the day-ahead plan. In this paper, based on trip chain theory and EV user behaviour, an EV charging load forecasting model is established for quick charge station operators. This model is capable of forecasting the charging demand of a city-located quick charge station during the next day, where the Monte-Carlo simulation method is applied. Furthermore, based on the forecasting model, a day-ahead profit-oriented operation strategy for such stations is derived. The simulation results support the effectiveness of this forecasting model and the operation strategy. The conclusions of this paper are as follows: 1) The charging load forecasting model ensures operators to grasp the feature of the charging load of the next day. 2) The revenue of the quick charge station can be dramatically increased by applying the proposed day-head operation strategy. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: This article has been accepted in the 2019 International Conference on Renewable Power Generation (RPG 2019), Shanghai, China, October 24-25, 2019

arXiv:1906.08497 [pdf]

Optimal Decision Making Model of Battery Energy Storage-Assisted Electric Vehicle Charging Station Considering Incentive Demand Response

Authors: Bishal Upadhaya, Donghan Feng, Yun Zhou, Qiang Gui, Xiao** Zhao, Dan Wu

Abstract: Considering large scale implementation of electric vehicles (EVs), public EV charging stations are served as fuel tanks for EVs to meet the need of longer travelling distance and overcome the shortage of private charging piles. The allocation of local battery energy storage (BES) can enhance the flexibility of the EV charging station. This paper proposes an optimal decision making model of the BES… ▽ More Considering large scale implementation of electric vehicles (EVs), public EV charging stations are served as fuel tanks for EVs to meet the need of longer travelling distance and overcome the shortage of private charging piles. The allocation of local battery energy storage (BES) can enhance the flexibility of the EV charging station. This paper proposes an optimal decision making model of the BES-assisted EV charging station considering the incentive demand response. Firstly, the detailed models of the BES-assisted EV charging station are presented. Secondly, as a representative incentive demand response, the emergency demand response (EDR) model is introduced. Thirdly, based on the charging load forecast data, an optimal decision making model of the BES-assisted EV charging station considering the EDR to maximize the charging station's operating profit is established. Finally, the feasibility of the proposed method is verified through case studies. The conclusions of this paper are as follows: 1) Through the optimal decision making model, correct and profitable EDR participation decision can be determined for the BES-assisted EV charging station effectively. 2) Local BES in the EV charging station can improve the charging station's ability to participate in the EDR. △ Less

Submitted 20 June, 2019; originally announced June 2019.

arXiv:1906.08411 [pdf]

A novel linear battery energy storage system (BESS) life loss calculation model for BESS-integrated wind farm in scheduled power tracking

Authors: Qiang Gui, Hao Su, Donghan Feng, Yun Zhou, Ran Xu, Ting Lei

Abstract: Recently, rapid development of battery technology makes it feasible to integrate renewable generations with battery energy storage system (BESS). The consideration of BESS life loss for different BESS application scenarios is economic imperative. In this paper, a novel linear BESS life loss calculation model for BESS-integrated wind farm in scheduled power tracking is proposed. Firstly, based on t… ▽ More Recently, rapid development of battery technology makes it feasible to integrate renewable generations with battery energy storage system (BESS). The consideration of BESS life loss for different BESS application scenarios is economic imperative. In this paper, a novel linear BESS life loss calculation model for BESS-integrated wind farm in scheduled power tracking is proposed. Firstly, based on the life cycle times-depth of discharge (DOD) relation-curve, the BESS life loss coefficient for unit throughput energy with different state of charge (SOC) can be determined from the life cycle times-DOD relation-curve fitting function directly. Secondly, as unidirectional variation of SOC in a single time step, the BESS life loss can be calculated through integration of the life loss coefficient-SOC relation function. A linear BESS life loss calculation model is established through self-optimal piecewise linearization of the primitive function of the life loss coefficient-SOC relation function. Thirdly, the proposed life loss calculation model is incorporated in the BESS-integrated wind farm scheduled power tracking optimization. Case studies demonstrate that with the proposed method, the BESS life loss item can be incorporated in the optimization model effectively, and the scheduled power tracking cost of the BESS-integrated wind farm can be determined and optimized more comprehensively. △ Less

Submitted 27 October, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

Comments: This article has been accepted in the 2019 International Conference on Renewable Power Generation (RPG 2019), Shanghai, China, October 24-25, 2019

arXiv:1906.02679 [pdf]

doi 10.1007/s11760-020-01844-8

A Natural Language-Inspired Multi-label Video Streaming Traffic Classification Method Based on Deep Neural Networks

Authors: Yan Shi, Dezhi Feng, Subir Biswas

Abstract: This paper presents a deep-learning based traffic classification method for identifying multiple streaming video sources at the same time within an encrypted tunnel. The work defines a novel feature inspired by Natural Language Processing (NLP) that allows existing NLP techniques to help the traffic classification. The feature extraction method is described, and a large dataset containing video st… ▽ More This paper presents a deep-learning based traffic classification method for identifying multiple streaming video sources at the same time within an encrypted tunnel. The work defines a novel feature inspired by Natural Language Processing (NLP) that allows existing NLP techniques to help the traffic classification. The feature extraction method is described, and a large dataset containing video streaming and web traffic is created to verify its effectiveness. Results are obtained by applying several NLP methods to show that the proposed method performs well on both binary and multilabel traffic classification problems. We also show the ability to achieve zero-shot learning with the proposed method. △ Less

Submitted 3 June, 2019; originally announced June 2019.

arXiv:1903.03913 [pdf, ps, other]

Towards Ultra-Reliable Low-Latency Communications: Typical Scenarios, Possible Solutions, and Open Issues

Authors: Daquan Feng, Changyang She, Kai Ying, Lifeng Lai, Zhanwei Hou, Tony Q. S. Quek, Yonghui Li, Branka Vucetic

Abstract: Ultra-reliable low-latency communications (URLLC) has been considered as one of the three new application scenarios in the \emph{5th Generation} (5G) \emph {New Radio} (NR), where the physical layer design aspects have been specified. With the 5G NR, we can guarantee the reliability and latency in radio access networks. However, for communication scenarios where the transmission involves both radi… ▽ More Ultra-reliable low-latency communications (URLLC) has been considered as one of the three new application scenarios in the \emph{5th Generation} (5G) \emph {New Radio} (NR), where the physical layer design aspects have been specified. With the 5G NR, we can guarantee the reliability and latency in radio access networks. However, for communication scenarios where the transmission involves both radio access and wide area core networks, the delay in radio access networks only contributes to part of the \emph{end-to-end} (E2E) delay. In this paper, we outline the delay components and packet loss probabilities in typical communication scenarios of URLLC, and formulate the constraints on E2E delay and overall packet loss probability. Then, we summarize possible solutions in the physical layer, the link layer, the network layer, and the cross-layer design, respectively. Finally, we discuss the open issues in prediction and communication co-design for URLLC in wide area large scale networks. △ Less

Submitted 9 March, 2019; originally announced March 2019.

Comments: 8 pages, 7 figures. Accepted by IEEE Vehicular Technology Magazine

Journal ref: IEEE Vehicular Technology Magazine, 2019

arXiv:1903.00888 [pdf, other]

Denoising convolutional autoencoder based B-mode ultrasound tongue image feature extraction

Authors: Bo Li, Kele Xu, Dawei Feng, Haibo Mi, Huaimin Wang, Jian Zhu

Abstract: B-mode ultrasound tongue imaging is widely used in the speech production field. However, efficient interpretation is in a great need for the tongue image sequences. Inspired by the recent success of unsupervised deep learning approach, we explore unsupervised convolutional network architecture for the feature extraction in the ultrasound tongue image, which can be helpful for the clinical linguist… ▽ More B-mode ultrasound tongue imaging is widely used in the speech production field. However, efficient interpretation is in a great need for the tongue image sequences. Inspired by the recent success of unsupervised deep learning approach, we explore unsupervised convolutional network architecture for the feature extraction in the ultrasound tongue image, which can be helpful for the clinical linguist and phonetics. By quantitative comparison between different unsupervised feature extraction approaches, the denoising convolutional autoencoder (DCAE)-based method outperforms the other feature extraction methods on the reconstruction task and the 2010 silent speech interface challenge. A Word Error Rate of 6.17% is obtained with DCAE, compared to the state-of-the-art value of 6.45% using Discrete cosine transform as the feature extractor. Our codes are available at https://github.com/DeePBluE666/Source-code1. △ Less

Submitted 3 March, 2019; originally announced March 2019.

Comments: Accepted by ICASSP 2019

arXiv:1812.11356 [pdf]

doi 10.35833/MPCE.2018.000801

A Multi-Agent-Based Rolling Optimization Method for Restoration Scheduling of Electrical Distribution Systems with Distributed Generation

Authors: Donghan Feng, Fan Wu, Yun Zhou, Usama Rahman, Xiao** Zhao, Chen Fang

Abstract: Resilience against major disasters is the most essential characteristic of future electrical distribution systems (EDS). A multi-agent-based rolling optimization method for EDS restoration scheduling is proposed in this paper. When a blackout occurs, considering the risk of losing the centralized authority due to the failure of the common core communication network, the agents available after disa… ▽ More Resilience against major disasters is the most essential characteristic of future electrical distribution systems (EDS). A multi-agent-based rolling optimization method for EDS restoration scheduling is proposed in this paper. When a blackout occurs, considering the risk of losing the centralized authority due to the failure of the common core communication network, the agents available after disasters or cyber-attacks identify the communication-connected parts (CCPs) in the EDS with distributed communication. A multi-time interval optimization model is formulated and solved by the agents for the restoration scheduling of a CCP. A rolling optimization process for the entire EDS restoration is proposed. During the scheduling/rescheduling in the rolling process, the CCPs in the EDS are reidentified and the restoration schedules for the CCPs are updated. Through decentralized decision-making and rolling optimization, EDS restoration scheduling can automatically start and periodically update itself, providing effective solutions for EDS restoration scheduling in a blackout event. A modified IEEE 123-bus EDS is utilized to demonstrate the effectiveness of the proposed method. △ Less

Submitted 11 March, 2020; v1 submitted 29 December, 2018; originally announced December 2018.

Comments: This version is accepted by Journal of Modern Power Systems and Clean Energy (MPCE). The final version will be published in MPCE

Journal ref: MPCE, vol.8, no.4, pp.737-749, July 2020

arXiv:1410.3899 [pdf]

Optimal Scheduling of Electric Vehicles Charging in low-Voltage Distribution Systems

Authors: Shaolun Xu, Liang Zhang, Zheng Yan, Donghan Feng, Gang Wang, Xiaobo Zhao

Abstract: Uncoordinated charging of large-scale electric vehicles (EVs) will have a negative impact on the secure and economic operation of the power system, especially at the distribution level. Given that the charging load of EVs can be controlled to some extent, research on the optimal charging control of EVs has been extensively carried out. In this paper, two possible smart charging scenarios in China… ▽ More Uncoordinated charging of large-scale electric vehicles (EVs) will have a negative impact on the secure and economic operation of the power system, especially at the distribution level. Given that the charging load of EVs can be controlled to some extent, research on the optimal charging control of EVs has been extensively carried out. In this paper, two possible smart charging scenarios in China are studied: centralized optimal charging operated by an aggregator and decentralized optimal charging managed by individual users. Under the assumption that the aggregators and individual users only concern the economic benefits, new load peaks will arise under time of use (TOU) pricing which is extensively employed in China. To solve this problem, a simple incentive mechanism is proposed for centralized optimal charging while a rolling-update pricing scheme is devised for decentralized optimal charging. The original optimal charging models are modified to account for the developed schemes. Simulated tests corroborate the efficacy of optimal scheduling for charging EVs in various scenarios. △ Less

Submitted 19 January, 2016; v1 submitted 14 October, 2014; originally announced October 2014.

Comments: Preparation of Final Manuscripts Accepted for Journal of Electrical Engineering & Technology

Showing 1–43 of 43 results for author: Feng, D