-
Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images
Authors:
Songhan Jiang,
Zhengyu Gan,
Linghan Cai,
Yifeng Wang,
Yongbing Zhang
Abstract:
Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tu…
▽ More
Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tumor microenvironment (TME). (2) Existing multimodal methods often rely on alignment strategies to integrate complementary information, which may lead to information loss due to the inherent heterogeneity between pathology and genes. In this paper, we propose a Multimodal Cross-Task Interaction (MCTI) framework to explore the intrinsic correlations between subtype classification and survival analysis tasks. Specifically, to capture TME-related features in WSIs, we leverage the subtype classification task to mine tumor regions. Simultaneously, multi-head attention mechanisms are applied in genomic feature extraction, adaptively performing genes grou** to obtain task-related genomic embedding. With the joint representation of pathological images and genomic data, we further introduce a Transport-Guided Attention (TGA) module that uses optimal transport theory to model the correlation between subtype classification and survival analysis tasks, effectively transferring potential information. Extensive experiments demonstrate the superiority of our approaches, with MCTI outperforming state-of-the-art frameworks on three public benchmarks. \href{https://github.com/jsh0792/MCTI}{https://github.com/jsh0792/MCTI}.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding
Authors:
Trang Le,
Daniel Lazar,
Suyoun Kim,
Shan Jiang,
Duc Le,
Adithya Sagar,
Aleksandr Livshits,
Ahmed Aly,
Akshat Shrivastava
Abstract:
Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a no…
▽ More
Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a novel method leveraging a Connectionist Temporal Classification-based decoding strategy as well as a denoising objective to train robust non-autoregressive deliberation models. We show that PRoDeliberation achieves the latency reduction of parallel decoding (2-10x improvement over autoregressive models) while retaining the ability to correct Automatic Speech Recognition (ASR) mistranscriptions of autoregressive deliberation systems. We further show that the design of the denoising training allows PRoDeliberation to overcome the limitations of small ASR devices, and we provide analysis on the necessity of each component of the system.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey
Authors:
Marcos V. Conde,
Florin-Alexandru Vasluianu,
Radu Timofte,
Jianxing Zhang,
Jia Li,
Fan Wang,
Xiaopeng Li,
Zikun Liu,
Hyunhee Park,
Sejun Song,
Changho Kim,
Zhijuan Huang,
Hongyuan Yu,
Cheng Wan,
Wending Xiang,
Jiamin Lin,
Hang Zhong,
Qiaosong Zhang,
Yue Sun,
Xuanwu Yin,
Kunlong Zuo,
Senyan Xu,
Siyuan Jiang,
Zhi**g Sun,
Jiaying Zhu
, et al. (10 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois…
▽ More
This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT Images
Authors:
**peng Lu,
**gyun Chen,
Linghan Cai,
Songhan Jiang,
Yongbing Zhang
Abstract:
Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effec…
▽ More
Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effectively model the non-linear dependencies between PET and CT modalities. Recent studies have investigated various approaches to optimize the fusion of modality-specific features for enhancing joint representations. However, modality-specific encoders used in these methods operate independently, inadequately leveraging the synergistic relationships inherent in PET and CT modalities, for example, the complementarity between semantics and structure. To address these issues, we propose a Hierarchical Adaptive Interaction and Weighting Network termed H2ASeg to explore the intrinsic cross-modal correlations and transfer potential complementary information. Specifically, we design a Modality-Cooperative Spatial Attention (MCSA) module that performs intra- and inter-modal interactions globally and locally. Additionally, a Target-Aware Modality Weighting (TAMW) module is developed to highlight tumor-related features within multi-modal features, thereby refining tumor segmentation. By embedding these modules across different layers, H2ASeg can hierarchically model cross-modal correlations, enabling a nuanced understanding of both semantic and structural tumor features. Extensive experiments demonstrate the superiority of H2ASeg, outperforming state-of-the-art methods on AutoPet-II and Hecktor2022 benchmarks. The code is released at https://github.com/**PLu/H2ASeg.
△ Less
Submitted 28 March, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification
Authors:
Shuai Li,
Xiaoguang Ma,
Shancheng Jiang,
Lu Meng
Abstract:
Remarkable successes were made in Medical Image Classification (MIC) recently, mainly due to wide applications of convolutional neural networks (CNNs). However, adversarial examples (AEs) exhibited imperceptible similarity with raw data, raising serious concerns on network robustness. Although adversarial training (AT), in responding to malevolent AEs, was recognized as an effective approach to im…
▽ More
Remarkable successes were made in Medical Image Classification (MIC) recently, mainly due to wide applications of convolutional neural networks (CNNs). However, adversarial examples (AEs) exhibited imperceptible similarity with raw data, raising serious concerns on network robustness. Although adversarial training (AT), in responding to malevolent AEs, was recognized as an effective approach to improve robustness, it was challenging to overcome generalization decline of networks caused by the AT. In this paper, in order to reserve high generalization while improving robustness, we proposed a dynamic perturbation-adaptive adversarial training (DPAAT) method, which placed AT in a dynamic learning environment to generate adaptive data-level perturbations and provided a dynamically updated criterion by loss information collections to handle the disadvantage of fixed perturbation sizes in conventional AT methods and the dependence on external transference. Comprehensive testing on dermatology HAM10000 dataset showed that the DPAAT not only achieved better robustness improvement and generalization preservation but also significantly enhanced mean average precision and interpretability on various CNNs, indicating its great potential as a generic adversarial training method on the MIC.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Digital Twin Aided Massive MIMO: CSI Compression and Feedback
Authors:
Shuaifeng Jiang,
Ahmed Alkhateeb
Abstract:
Deep learning (DL) approaches have demonstrated high performance in compressing and reconstructing the channel state information (CSI) and reducing the CSI feedback overhead in massive MIMO systems. One key challenge, however, with the DL approaches is the demand for extensive training data. Collecting this real-world CSI data incurs significant overhead that hinders the DL approaches from scaling…
▽ More
Deep learning (DL) approaches have demonstrated high performance in compressing and reconstructing the channel state information (CSI) and reducing the CSI feedback overhead in massive MIMO systems. One key challenge, however, with the DL approaches is the demand for extensive training data. Collecting this real-world CSI data incurs significant overhead that hinders the DL approaches from scaling to a large number of communication sites. To address this challenge, we propose a novel direction that utilizes site-specific \textit{digital twins} to aid the training of DL models. The proposed digital twin approach generates site-specific synthetic CSI data from the EM 3D model and ray tracing, which can then be used to train the DL model without real-world data collection. To further improve the performance, we adopt online data selection to refine the DL model training with a small real-world CSI dataset. Results show that a DL model trained solely on the digital twin data can achieve high performance when tested in a real-world deployment. Further, leveraging domain adaptation techniques, the proposed approach requires orders of magnitude less real-world data to approach the same performance of the model trained completely on a real-world CSI dataset.
△ Less
Submitted 29 February, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Automatic laminectomy cutting plane planning based on artificial intelligence in robot assisted laminectomy surgery
Authors:
Zhuofu Li,
Yonghong Zhang,
Chengxia Wang,
Shanshan Liu,
Xiongkang Song,
Xuquan Ji,
Shuai Jiang,
Woquan Zhong,
Lei Hu,
Weishi Li
Abstract:
Objective: This study aims to use artificial intelligence to realize the automatic planning of laminectomy, and verify the method. Methods: We propose a two-stage approach for automatic laminectomy cutting plane planning. The first stage was the identification of key points. 7 key points were manually marked on each CT image. The Spatial Pyramid Upsampling Network (SPU-Net) algorithm developed by…
▽ More
Objective: This study aims to use artificial intelligence to realize the automatic planning of laminectomy, and verify the method. Methods: We propose a two-stage approach for automatic laminectomy cutting plane planning. The first stage was the identification of key points. 7 key points were manually marked on each CT image. The Spatial Pyramid Upsampling Network (SPU-Net) algorithm developed by us was used to accurately locate the 7 key points. In the second stage, based on the identification of key points, a personalized coordinate system was generated for each vertebra. Finally, the transverse and longitudinal cutting planes of laminectomy were generated under the coordinate system. The overall effect of planning was evaluated. Results: In the first stage, the average localization error of the SPU-Net algorithm for the seven key points was 0.65mm. In the second stage, a total of 320 transverse cutting planes and 640 longitudinal cutting planes were planned by the algorithm. Among them, the number of horizontal plane planning effects of grade A, B, and C were 318(99.38%), 1(0.31%), and 1(0.31%), respectively. The longitudinal planning effects of grade A, B, and C were 622(97.18%), 1(0.16%), and 17(2.66%), respectively. Conclusions: In this study, we propose a method for automatic surgical path planning of laminectomy based on the localization of key points in CT images. The results showed that the method achieved satisfactory results. More studies are needed to confirm the reliability of this approach in the future.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
EM Based p-norm-like Constraint RLS Algorithm for Sparse System Identification
Authors:
Shuyang Jiang,
Kung Yao
Abstract:
In this paper, the recursive least squares (RLS) algorithm is considered in the sparse system identification setting. The cost function of RLS algorithm is regularized by a $p$-norm-like ($0 \leq p \leq 1$) constraint of the estimated system parameters. In order to minimize the regularized cost function, we transform it into a penalized maximum likelihood (ML) problem, which is solved by the expec…
▽ More
In this paper, the recursive least squares (RLS) algorithm is considered in the sparse system identification setting. The cost function of RLS algorithm is regularized by a $p$-norm-like ($0 \leq p \leq 1$) constraint of the estimated system parameters. In order to minimize the regularized cost function, we transform it into a penalized maximum likelihood (ML) problem, which is solved by the expectation-maximization (EM) algorithm. With the introduction of a thresholding operator, the update equation of the tap-weight vector is derived. We also exploit the underlying sparsity to implement the proposed algorithm in a low computational complexity fashion. Numerical simulations demonstrate the superiority of the new algorithm over conventional sparse RLS algorithms, as well as regular RLS algorithm.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Snake Robot with Tactile Perception Navigates on Large-scale Challenging Terrain
Authors:
Shuo Jiang,
Adarsh Salagame,
Alireza Ramezani,
Lawson Wong
Abstract:
Along with the advancement of robot skin technology, there has been notable progress in the development of snake robots featuring body-surface tactile perception. In this study, we proposed a locomotion control framework for snake robots that integrates tactile perception to augment their adaptability to various terrains. Our approach embraces a hierarchical reinforcement learning (HRL) architectu…
▽ More
Along with the advancement of robot skin technology, there has been notable progress in the development of snake robots featuring body-surface tactile perception. In this study, we proposed a locomotion control framework for snake robots that integrates tactile perception to augment their adaptability to various terrains. Our approach embraces a hierarchical reinforcement learning (HRL) architecture, wherein the high-level orchestrates global navigation strategies while the low-level uses curriculum learning for local navigation maneuvers. Due to the significant computational demands of collision detection in whole-body tactile sensing, the efficiency of the simulator is severely compromised. Thus a distributed training pattern to mitigate the efficiency reduction was adopted. We evaluated the navigation performance of the snake robot in complex large-scale cave exploration with challenging terrains to exhibit improvements in motion efficiency, evidencing the efficacy of tactile perception in terrain-adaptive locomotion of snake robots.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Hierarchical RL-Guided Large-scale Navigation of a Snake Robot
Authors:
Shuo Jiang,
Adarsh Salagame,
Alireza Ramezani,
Lawson Wong
Abstract:
Classical snake robot control leverages mimicking snake-like gaits tuned for specific environments. However, to operate adaptively in unstructured environments, gait generation must be dynamically scheduled. In this work, we present a four-layer hierarchical control scheme to enable the snake robot to navigate freely in large-scale environments. The proposed model decomposes navigation into global…
▽ More
Classical snake robot control leverages mimicking snake-like gaits tuned for specific environments. However, to operate adaptively in unstructured environments, gait generation must be dynamically scheduled. In this work, we present a four-layer hierarchical control scheme to enable the snake robot to navigate freely in large-scale environments. The proposed model decomposes navigation into global planning, local planning, gait generation, and gait tracking. Using reinforcement learning (RL) and a central pattern generator (CPG), our method learns to navigate in complex mazes within hours and can be directly deployed to arbitrary new environments in a zero-shot fashion. We use the high-fidelity model of Northeastern's slithering robot COBRA to test the effectiveness of the proposed hierarchical control approach.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
GlanceSeg: Real-time microaneurysm lesion segmentation with gaze-map-guided foundation model for early detection of diabetic retinopathy
Authors:
Hongyang Jiang,
Mengdi Gao,
Zirong Liu,
Chen Tang,
Xiaoqing Zhang,
Shuai Jiang,
Wu Yuan,
Jiang Liu
Abstract:
Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to inconspicuous and minute microangioma lesions, resulting in limited research in this area. Additionally, the potential of emerging foundation models, such as the segment anything model (SAM), in medical scenarios remains rarely explored. In this work, we propose a human-in-the-loop, label-free early DR diagnosis…
▽ More
Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to inconspicuous and minute microangioma lesions, resulting in limited research in this area. Additionally, the potential of emerging foundation models, such as the segment anything model (SAM), in medical scenarios remains rarely explored. In this work, we propose a human-in-the-loop, label-free early DR diagnosis framework called GlanceSeg, based on SAM. GlanceSeg enables real-time segmentation of microangioma lesions as ophthalmologists review fundus images. Our human-in-the-loop framework integrates the ophthalmologist's gaze map, allowing for rough localization of minute lesions in fundus images. Subsequently, a saliency map is generated based on the located region of interest, which provides prompt points to assist the foundation model in efficiently segmenting microangioma lesions. Finally, a domain knowledge filter refines the segmentation of minute lesions. We conducted experiments on two newly-built public datasets, i.e., IDRiD and Retinal-Lesions, and validated the feasibility and superiority of GlanceSeg through visualized illustrations and quantitative measures. Additionally, we demonstrated that GlanceSeg improves annotation efficiency for clinicians and enhances segmentation performance through fine-tuning using annotations. This study highlights the potential of GlanceSeg-based annotations for self-model optimization, leading to enduring performance advancements through continual learning.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Dual Pipeline Style Transfer with Input Distribution Differentiation
Authors:
ShiQi Jiang,
JunJie Kang,
YuJian Li
Abstract:
The color and texture dual pipeline architecture (CTDP) suppresses texture representation and artifacts through masked total variation loss (Mtv), and further experiments have shown that smooth input can almost completely eliminate texture representation. We have demonstrated through experiments that smooth input is not the key reason for removing texture representations, but rather the distributi…
▽ More
The color and texture dual pipeline architecture (CTDP) suppresses texture representation and artifacts through masked total variation loss (Mtv), and further experiments have shown that smooth input can almost completely eliminate texture representation. We have demonstrated through experiments that smooth input is not the key reason for removing texture representations, but rather the distribution differentiation of the training dataset. Based on this, we propose an input distribution differentiation training strategy (IDD), which forces the generation of textures to be completely dependent on the noise distribution, while the smooth distribution will not produce textures at all. Overall, our proposed distribution differentiation training strategy allows for two pre-defined input distributions to be responsible for two generation tasks, with noise distribution responsible for texture generation and smooth distribution responsible for color smooth transfer. Finally, we choose a smooth distribution as the input for the forward inference stage to completely eliminate texture representations and artifacts in color transfer tasks.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
DEFN: Dual-Encoder Fourier Group Harmonics Network for Three-Dimensional Indistinct-Boundary Object Segmentation
Authors:
Xiaohua Jiang,
Yihao Guo,
Jian Huang,
Yuting Wu,
Meiyi Luo,
Zhaoyang Xu,
Qianni Zhang,
Xingru Huang,
Hong He,
Shaowei Jiang,
**g Ye,
Mang Xiao
Abstract:
The precise spatial and quantitative delineation of indistinct-boundary medical objects is paramount for the accuracy of diagnostic protocols, efficacy of surgical interventions, and reliability of postoperative assessments. Despite their significance, the effective segmentation and instantaneous three-dimensional reconstruction are significantly impeded by the paucity of representative samples in…
▽ More
The precise spatial and quantitative delineation of indistinct-boundary medical objects is paramount for the accuracy of diagnostic protocols, efficacy of surgical interventions, and reliability of postoperative assessments. Despite their significance, the effective segmentation and instantaneous three-dimensional reconstruction are significantly impeded by the paucity of representative samples in available datasets and noise artifacts. To surmount these challenges, we introduced Stochastic Defect Injection (SDi) to augment the representational diversity of challenging indistinct-boundary objects within training corpora. Consequently, we propose the Dual-Encoder Fourier Group Harmonics Network (DEFN) to tailor noise filtration, amplify detailed feature recognition, and bolster representation across diverse medical imaging scenarios. By incorporating Dynamic Weight Composing (DWC) loss dynamically adjusts model's focus based on training progression, DEFN achieves SOTA performance on the OIMHS public dataset, showcasing effectiveness in indistinct boundary contexts. Source code for DEFN is available at: https://github.com/IMOP-lab/DEFN-pytorch.
△ Less
Submitted 19 June, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Assessing and Enhancing Robustness of Deep Learning Models with Corruption Emulation in Digital Pathology
Authors:
Peixiang Huang,
Songtao Zhang,
Yulu Gan,
Rui Xu,
Rongqi Zhu,
Wenkang Qin,
Limei Guo,
Shan Jiang,
Lin Luo
Abstract:
Deep learning in digital pathology brings intelligence and automation as substantial enhancements to pathological analysis, the gold standard of clinical diagnosis. However, multiple steps from tissue preparation to slide imaging introduce various image corruptions, making it difficult for deep neural network (DNN) models to achieve stable diagnostic results for clinical use. In order to assess an…
▽ More
Deep learning in digital pathology brings intelligence and automation as substantial enhancements to pathological analysis, the gold standard of clinical diagnosis. However, multiple steps from tissue preparation to slide imaging introduce various image corruptions, making it difficult for deep neural network (DNN) models to achieve stable diagnostic results for clinical use. In order to assess and further enhance the robustness of the models, we analyze the physical causes of the full-stack corruptions throughout the pathological life-cycle and propose an Omni-Corruption Emulation (OmniCE) method to reproduce 21 types of corruptions quantified with 5-level severity. We then construct three OmniCE-corrupted benchmark datasets at both patch level and slide level and assess the robustness of popular DNNs in classification and segmentation tasks. Further, we explore to use the OmniCE-corrupted datasets as augmentation data for training and experiments to verify that the generalization ability of the models has been significantly enhanced.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Automatic nodule identification and differentiation in ultrasound videos to facilitate per-nodule examination
Authors:
Siyuan Jiang,
Yan Ding,
Yuling Wang,
Lei Xu,
Wenli Dai,
Wanru Chang,
Jianfeng Zhang,
Jie Yu,
Jianqiao Zhou,
Chunquan Zhang,
** Liang,
Dexing Kong
Abstract:
Ultrasound is a vital diagnostic technique in health screening, with the advantages of non-invasive, cost-effective, and radiation free, and therefore is widely applied in the diagnosis of nodules. However, it relies heavily on the expertise and clinical experience of the sonographer. In ultrasound images, a single nodule might present heterogeneous appearances in different cross-sectional views w…
▽ More
Ultrasound is a vital diagnostic technique in health screening, with the advantages of non-invasive, cost-effective, and radiation free, and therefore is widely applied in the diagnosis of nodules. However, it relies heavily on the expertise and clinical experience of the sonographer. In ultrasound images, a single nodule might present heterogeneous appearances in different cross-sectional views which makes it hard to perform per-nodule examination. Sonographers usually discriminate different nodules by examining the nodule features and the surrounding structures like gland and duct, which is cumbersome and time-consuming. To address this problem, we collected hundreds of breast ultrasound videos and built a nodule reidentification system that consists of two parts: an extractor based on the deep learning model that can extract feature vectors from the input video clips and a real-time clustering algorithm that automatically groups feature vectors by nodules. The system obtains satisfactory results and exhibits the capability to differentiate ultrasound videos. As far as we know, it's the first attempt to apply re-identification technique in the ultrasonic field.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
Authors:
Jianing Qiu,
Jian Wu,
Hao Wei,
Peilun Shi,
Minqing Zhang,
Yunyun Sun,
Lin Li,
Hanruo Liu,
Hongyi Liu,
Simeng Hou,
Yuyang Zhao,
Xuehui Shi,
Junfang Xian,
Xiaoxia Qu,
Sirui Zhu,
Lijie Pan,
Xiaoniao Chen,
Xiaojia Zhang,
Shuai Jiang,
Kebing Wang,
Chenlong Yang,
Mingqiang Chen,
Sujie Fan,
Jianhua Hu,
Aiguo Lv
, et al. (17 additional authors not shown)
Abstract:
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi…
▽ More
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassification of disease phenotype, and systemic biomarker and disease prediction, with each application enhanced with expert-level intelligence and accuracy. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale ophthalmic disease diagnosis benchmark database, as well as a new large-scale segmentation and detection benchmark database, VisionFM outperformed strong baseline deep neural networks. The ophthalmic image representations learned by VisionFM exhibited noteworthy explainability, and demonstrated strong generalizability to new ophthalmic modalities, disease spectrum, and imaging devices. As a foundation model, VisionFM has a large capacity to learn from diverse ophthalmic imaging data and disparate datasets. To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data. Experimental results revealed that synthetic data that passed visual Turing tests, can also enhance the representation learning capability of VisionFM, leading to substantial performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI applications developed, validated, and demonstrated in this work, substantial further applications can be achieved in an efficient and cost-effective manner using VisionFM as the foundation.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Lightweight texture transfer based on texture feature preset
Authors:
ShiQi Jiang
Abstract:
In the task of texture transfer, reference texture images typically exhibit highly repetitive texture features, and the texture transfer results from different content images under the same style also share remarkably similar texture patterns. Encoding such highly similar texture features often requires deep layers and a large number of channels, making it is also the main source of the entire mod…
▽ More
In the task of texture transfer, reference texture images typically exhibit highly repetitive texture features, and the texture transfer results from different content images under the same style also share remarkably similar texture patterns. Encoding such highly similar texture features often requires deep layers and a large number of channels, making it is also the main source of the entire model's parameter count and computational load, and inference time. We propose a lightweight texture transfer based on texture feature preset (TFP). TFP takes full advantage of the high repetitiveness of texture features by providing preset universal texture feature maps for a given style. These preset feature maps can be fused and decoded directly with shallow color transfer feature maps of any content to generate texture transfer results, thereby avoiding redundant texture information from being encoded repeatedly. The texture feature map we preset is encoded through noise input images with consistent distribution (standard normal distribution). This consistent input distribution can completely avoid the problem of texture transfer differentiation, and by randomly sampling different noise inputs, we can obtain different texture features and texture transfer results under the same reference style. Compared to state-of-the-art techniques, our TFP not only produces visually superior results but also reduces the model size by 3.2-3538 times and speeds up the process by 1.8-5.6 times.
△ Less
Submitted 1 January, 2024; v1 submitted 29 June, 2023;
originally announced June 2023.
-
Vision Guided MIMO Radar Beamforming for Enhanced Vital Signs Detection in Crowds
Authors:
Shuaifeng Jiang,
Ahmed Alkhateeb,
Daniel W. Bliss,
Yu Rong
Abstract:
Radar as a remote sensing technology has been used to analyze human activity for decades. Despite all the great features such as motion sensitivity, privacy preservation, penetrability, and more, radar has limited spatial degrees of freedom compared to optical sensors and thus makes it challenging to sense crowded environments without prior information. In this paper, we develop a novel dual-sensi…
▽ More
Radar as a remote sensing technology has been used to analyze human activity for decades. Despite all the great features such as motion sensitivity, privacy preservation, penetrability, and more, radar has limited spatial degrees of freedom compared to optical sensors and thus makes it challenging to sense crowded environments without prior information. In this paper, we develop a novel dual-sensing system, in which a vision sensor is leveraged to guide digital beamforming in a multiple-input multiple-output (MIMO) radar. Also, we develop a calibration algorithm to align the two types of sensors and show that the calibrated dual system achieves about two centimeters precision in three-dimensional space within a field of view of $75^\circ$ by $65^\circ$ and for a range of two meters. Finally, we show that the proposed approach is capable of detecting the vital signs simultaneously for a group of closely spaced subjects, sitting and standing, in a cluttered environment, which highlights a promising direction for vital signs detection in realistic environments.
△ Less
Submitted 18 June, 2023;
originally announced June 2023.
-
Zero-shot Medical Image Translation via Frequency-Guided Diffusion Models
Authors:
Yunxiang Li,
Hua-Chieh Shao,
Xiao Liang,
Liyuan Chen,
Ruiqi Li,
Steve Jiang,
**g Wang,
You Zhang
Abstract:
Recently, the diffusion model has emerged as a superior generative model that can produce high quality and realistic images. However, for medical image translation, the existing diffusion models are deficient in accurately retaining structural information since the structure details of source domain images are lost during the forward diffusion process and cannot be fully recovered through learned…
▽ More
Recently, the diffusion model has emerged as a superior generative model that can produce high quality and realistic images. However, for medical image translation, the existing diffusion models are deficient in accurately retaining structural information since the structure details of source domain images are lost during the forward diffusion process and cannot be fully recovered through learned reverse diffusion, while the integrity of anatomical structures is extremely important in medical images. For instance, errors in image translation may distort, shift, or even remove structures and tumors, leading to incorrect diagnosis and inadequate treatments. Training and conditioning diffusion models using paired source and target images with matching anatomy can help. However, such paired data are very difficult and costly to obtain, and may also reduce the robustness of the developed model to out-of-distribution testing data. We propose a frequency-guided diffusion model (FGDM) that employs frequency-domain filters to guide the diffusion model for structure-preserving image translation. Based on its design, FGDM allows zero-shot learning, as it can be trained solely on the data from the target domain, and used directly for source-to-target domain translation without any exposure to the source-domain data during training. We evaluated it on three cone-beam CT (CBCT)-to-CT translation tasks for different anatomical sites, and a cross-institutional MR imaging translation task. FGDM outperformed the state-of-the-art methods (GAN-based, VAE-based, and diffusion-based) in metrics of Frechet Inception Distance (FID), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index Measure (SSIM), showing its significant advantages in zero-shot medical image translation.
△ Less
Submitted 27 October, 2023; v1 submitted 5 April, 2023;
originally announced April 2023.
-
Digital staining in optical microscopy using deep learning -- a review
Authors:
Lucas Kreiss,
Shaowei Jiang,
Xiang Li,
Shiqi Xu,
Kevin C. Zhou,
Alexander Mühlberg,
Kyung Chul Lee,
Kanghyun Kim,
Amey Chaware,
Michael Ando,
Laura Barisoni,
Seung Ah Lee,
Guoan Zheng,
Kyle Lafata,
Oliver Friedrich,
Roarke Horstmeyer
Abstract:
Until recently, conventional biochemical staining had the undisputed status as well-established benchmark for most biomedical problems related to clinical diagnostics, fundamental research and biotechnology. Despite this role as gold-standard, staining protocols face several challenges, such as a need for extensive, manual processing of samples, substantial time delays, altered tissue homeostasis,…
▽ More
Until recently, conventional biochemical staining had the undisputed status as well-established benchmark for most biomedical problems related to clinical diagnostics, fundamental research and biotechnology. Despite this role as gold-standard, staining protocols face several challenges, such as a need for extensive, manual processing of samples, substantial time delays, altered tissue homeostasis, limited choice of contrast agents for a given sample, 2D imaging instead of 3D tomography and many more. Label-free optical technologies, on the other hand, do not rely on exogenous and artificial markers, by exploiting intrinsic optical contrast mechanisms, where the specificity is typically less obvious to the human observer. Over the past few years, digital staining has emerged as a promising concept to use modern deep learning for the translation from optical contrast to established biochemical contrast of actual stainings. In this review article, we provide an in-depth analysis of the current state-of-the-art in this field, suggest methods of good practice, identify pitfalls and challenges and postulate promising advances towards potential future implementations and applications.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Online Streaming Video Super-Resolution with Convolutional Look-Up Table
Authors:
Guanghao Yin,
Zefan Qu,
Xinyang Jiang,
Shan Jiang,
Zhenhua Han,
Ningxin Zheng,
Xiaohong Liu,
Huan Yang,
Yuqing Yang,
Dongsheng Li,
Lili Qiu
Abstract:
Online video streaming has fundamental limitations on the transmission bandwidth and computational capacity and super-resolution is a promising potential solution. However, applying existing video super-resolution methods to online streaming is non-trivial. Existing video codecs and streaming protocols (\eg, WebRTC) dynamically change the video quality both spatially and temporally, which leads to…
▽ More
Online video streaming has fundamental limitations on the transmission bandwidth and computational capacity and super-resolution is a promising potential solution. However, applying existing video super-resolution methods to online streaming is non-trivial. Existing video codecs and streaming protocols (\eg, WebRTC) dynamically change the video quality both spatially and temporally, which leads to diverse and dynamic degradations. Furthermore, online streaming has a strict requirement for latency that most existing methods are less applicable. As a result, this paper focuses on the rarely exploited problem setting of online streaming video super resolution. To facilitate the research on this problem, a new benchmark dataset named LDV-WebRTC is constructed based on a real-world online streaming system. Leveraging the new benchmark dataset, we proposed a novel method specifically for online video streaming, which contains a convolution and Look-Up Table (LUT) hybrid model to achieve better performance-latency trade-off. To tackle the changing degradations, we propose a mixture-of-expert-LUT module, where a set of LUT specialized in different degradations are built and adaptively combined to handle different degradations. Experiments show our method achieves 720P video SR around 100 FPS, while significantly outperforms existing LUT-based methods and offers competitive performance compared to efficient CNN-based methods.
△ Less
Submitted 25 July, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Deep Learning (DL)-based Automatic Segmentation of the Internal Pudendal Artery (IPA) for Reduction of Erectile Dysfunction in Definitive Radiotherapy of Localized Prostate Cancer
Authors:
Anjali Balagopal,
Michael Dohopolski,
Young Suk Kwon,
Steven Montalvo,
Howard Morgan,
Ti Bai,
Dan Nguyen,
Xiao Liang,
Xinran Zhong,
Mu-Han Lin,
Neil Desai,
Steve Jiang
Abstract:
Background and purpose: Radiation-induced erectile dysfunction (RiED) is commonly seen in prostate cancer patients. Clinical trials have been developed in multiple institutions to investigate whether dose-sparing to the internal-pudendal-arteries (IPA) will improve retention of sexual potency. The IPA is usually not considered a conventional organ-at-risk (OAR) due to segmentation difficulty. In t…
▽ More
Background and purpose: Radiation-induced erectile dysfunction (RiED) is commonly seen in prostate cancer patients. Clinical trials have been developed in multiple institutions to investigate whether dose-sparing to the internal-pudendal-arteries (IPA) will improve retention of sexual potency. The IPA is usually not considered a conventional organ-at-risk (OAR) due to segmentation difficulty. In this work, we propose a deep learning (DL)-based auto-segmentation model for the IPA that utilizes CT and MRI or CT alone as the input image modality to accommodate variation in clinical practice. Materials and methods: 86 patients with CT and MRI images and noisy IPA labels were recruited in this study. We split the data into 42/14/30 for model training, testing, and a clinical observer study, respectively. There were three major innovations in this model: 1) we designed an architecture with squeeze-and-excite blocks and modality attention for effective feature extraction and production of accurate segmentation, 2) a novel loss function was used for training the model effectively with noisy labels, and 3) modality dropout strategy was used for making the model capable of segmentation in the absence of MRI. Results: The DSC, ASD, and HD95 values for the test dataset were 62.2%, 2.54mm, and 7mm, respectively. AI segmented contours were dosimetrically equivalent to the expert physician's contours. The observer study showed that expert physicians' scored AI contours (mean=3.7) higher than inexperienced physicians' contours (mean=3.1). When inexperienced physicians started with AI contours, the score improved to 3.7. Conclusion: The proposed model achieved good quality IPA contours to improve uniformity of segmentation and to facilitate introduction of standardized IPA segmentation into clinical trials and practice.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
Real-Time Digital Twins: Vision and Research Directions for 6G and Beyond
Authors:
Ahmed Alkhateeb,
Shuaifeng Jiang,
Gouranga Charan
Abstract:
This article presents a vision where \textit{real-time} digital twins of the physical wireless environments are continuously updated using multi-modal sensing data from the distributed infrastructure and user devices, and are used to make communication and sensing decisions. This vision is mainly enabled by the advances in precise 3D maps, multi-modal sensing, ray-tracing computations, and machine…
▽ More
This article presents a vision where \textit{real-time} digital twins of the physical wireless environments are continuously updated using multi-modal sensing data from the distributed infrastructure and user devices, and are used to make communication and sensing decisions. This vision is mainly enabled by the advances in precise 3D maps, multi-modal sensing, ray-tracing computations, and machine/deep learning. This article details this vision, explains the different approaches for constructing and utilizing these real-time digital twins, discusses the applications and open problems, and presents a research platform that can be used to investigate various digital twin research directions.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
Digital Twin Based Beam Prediction: Can we Train in the Digital World and Deploy in Reality?
Authors:
Shuaifeng Jiang,
Ahmed Alkhateeb
Abstract:
Realizing the potential gains of large-scale MIMO systems requires the accurate estimation of their channels or the fine adjustment of their narrow beams. This, however, is typically associated with high channel acquisition/beam swee** overhead that scales with the number of antennas. Machine and deep learning represent promising approaches to overcome these challenges thanks to their powerful a…
▽ More
Realizing the potential gains of large-scale MIMO systems requires the accurate estimation of their channels or the fine adjustment of their narrow beams. This, however, is typically associated with high channel acquisition/beam swee** overhead that scales with the number of antennas. Machine and deep learning represent promising approaches to overcome these challenges thanks to their powerful ability to learn from prior observations and side information. Training machine and deep learning models, however, requires large-scale datasets that are expensive to collect in deployed systems. To address this challenge, we propose a novel direction that utilizes digital replicas of the physical world to reduce or even eliminate the MIMO channel acquisition overhead. In the proposed digital twin aided communication, 3D models that approximate the real-world communication environment are constructed and accurate ray-tracing is utilized to simulate the site-specific channels. These channels can then be used to aid various communication tasks. Further, we propose to use machine learning to approximate the digital replicas and reduce the ray tracing computational cost. To evaluate the proposed digital twin based approach, we conduct a case study focusing on the position-aided beam prediction task. The results show that a learning model trained solely with the data generated by the digital replica can achieve relatively good performance on the real-world data. Moreover, a small number of real-world data points can quickly achieve near-optimal performance, overcoming the modeling mismatches between the physical and digital worlds and significantly reducing the data acquisition overhead.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Toward Robust Diagnosis: A Contour Attention Preserving Adversarial Defense for COVID-19 Detection
Authors:
Kun Xiang,
Xing Zhang,
**wen She,
**peng Liu,
Haohan Wang,
Shiqi Deng,
Shancheng Jiang
Abstract:
As the COVID-19 pandemic puts pressure on healthcare systems worldwide, the computed tomography image based AI diagnostic system has become a sustainable solution for early diagnosis. However, the model-wise vulnerability under adversarial perturbation hinders its deployment in practical situation. The existing adversarial training strategies are difficult to generalized into medical imaging field…
▽ More
As the COVID-19 pandemic puts pressure on healthcare systems worldwide, the computed tomography image based AI diagnostic system has become a sustainable solution for early diagnosis. However, the model-wise vulnerability under adversarial perturbation hinders its deployment in practical situation. The existing adversarial training strategies are difficult to generalized into medical imaging field challenged by complex medical texture features. To overcome this challenge, we propose a Contour Attention Preserving (CAP) method based on lung cavity edge extraction. The contour prior features are injected to attention layer via a parameter regularization and we optimize the robust empirical risk with hybrid distance metric. We then introduce a new cross-nation CT scan dataset to evaluate the generalization capability of the adversarial robustness under distribution shift. Experimental results indicate that the proposed method achieves state-of-the-art performance in multiple adversarial defense and generalization tasks. The code and dataset are available at https://github.com/Quinn777/CAP.
△ Less
Submitted 30 November, 2022;
originally announced November 2022.
-
Sensing Aided Reconfigurable Intelligent Surfaces for 3GPP 5G Transparent Operation
Authors:
Shuaifeng Jiang,
Ahmed Hindy,
Ahmed Alkhateeb
Abstract:
Can reconfigurable intelligent surfaces (RISs) operate in a standalone mode that is completely transparent to the 3GPP 5G initial access process? Realizing that may greatly simplify the deployment and operation of these surfaces and reduce the infrastructure control overhead. This paper investigates the feasibility of building standalone/transparent RIS systems and shows that one key challenge lie…
▽ More
Can reconfigurable intelligent surfaces (RISs) operate in a standalone mode that is completely transparent to the 3GPP 5G initial access process? Realizing that may greatly simplify the deployment and operation of these surfaces and reduce the infrastructure control overhead. This paper investigates the feasibility of building standalone/transparent RIS systems and shows that one key challenge lies in determining the user equipment (UE)-side RIS beam reflection direction. To address this challenge, we propose to equip the RISs with multi-modal sensing capabilities (e.g., using wireless and visual sensors) that enable them to develop some perception of the surrounding environment and the mobile users. Based on that, we develop a machine learning framework that leverages the wireless and visual sensors at the RIS to select the optimal beams between the base station (BS) and users and enable 5G standalone/transparent RIS operation. Using a high-fidelity synthetic dataset with co-existing wireless and visual data, we extensively evaluate the performance of the proposed framework. Experimental results demonstrate that the proposed approach can accurately predict the BS and UE-side candidate beams, and that the standalone RIS beam selection solution is capable of realizing near-optimal achievable rates with significantly reduced beam training overhead.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Camera Aided Reconfigurable Intelligent Surfaces: Computer Vision Based Fast Beam Selection
Authors:
Shuaifeng Jiang,
Ahmed Hindy,
Ahmed Alkhateeb
Abstract:
Reconfigurable intelligent surfaces (RISs) have attracted increasing interest due to their ability to improve the coverage, reliability, and energy efficiency of millimeter wave (mmWave) communication systems. However, designing the RIS beamforming typically requires large channel estimation or beam training overhead, which degrades the efficiency of these systems. In this paper, we propose to equ…
▽ More
Reconfigurable intelligent surfaces (RISs) have attracted increasing interest due to their ability to improve the coverage, reliability, and energy efficiency of millimeter wave (mmWave) communication systems. However, designing the RIS beamforming typically requires large channel estimation or beam training overhead, which degrades the efficiency of these systems. In this paper, we propose to equip the RIS surfaces with visual sensors (cameras) that obtain sensing information about the surroundings and user/basestation locations, guide the RIS beam selection, and reduce the beam training overhead. We develop a machine learning (ML) framework that leverages this visual sensing information to efficiently select the optimal RIS reflection beams that reflect the signals between the basestation and mobile users. To evaluate the developed approach, we build a high-fidelity synthetic dataset that comprises co-existing wireless and visual data. Based on this dataset, the results show that the proposed vision-aided machine learning solution can accurately predict the RIS beams and achieve near-optimal achievable rate while significantly reducing the beam training overhead.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Performance Deterioration of Deep Learning Models after Clinical Deployment: A Case Study with Auto-segmentation for Definitive Prostate Cancer Radiotherapy
Authors:
Biling Wang,
Michael Dohopolski,
Ti Bai,
Junjie Wu,
Raquibul Hannan,
Neil Desai,
Aurelie Garant,
Daniel Yang,
Dan Nguyen,
Mu-Han Lin,
Robert Timmerman,
Xinlei Wang,
Steve Jiang
Abstract:
We evaluated the temporal performance of a deep learning (DL) based artificial intelligence (AI) model for auto segmentation in prostate radiotherapy, seeking to correlate its efficacy with changes in clinical landscapes. Our study involved 1328 prostate cancer patients who underwent definitive radiotherapy from January 2006 to August 2022 at the University of Texas Southwestern Medical Center. We…
▽ More
We evaluated the temporal performance of a deep learning (DL) based artificial intelligence (AI) model for auto segmentation in prostate radiotherapy, seeking to correlate its efficacy with changes in clinical landscapes. Our study involved 1328 prostate cancer patients who underwent definitive radiotherapy from January 2006 to August 2022 at the University of Texas Southwestern Medical Center. We trained a UNet based segmentation model on data from 2006 to 2011 and tested it on data from 2012 to 2022 to simulate real world clinical deployment. We measured the model performance using the Dice similarity coefficient (DSC), visualized the trends in contour quality using exponentially weighted moving average (EMA) curves. Additionally, we performed Wilcoxon Rank Sum Test to analyze the differences in DSC distributions across distinct periods, and multiple linear regression to investigate the impact of various clinical factors. The model exhibited peak performance in the initial phase (from 2012 to 2014) for segmenting the prostate, rectum, and bladder. However, we observed a notable decline in performance for the prostate and rectum after 2015, while bladder contour quality remained stable. Key factors that impacted the prostate contour quality included physician contouring styles, the use of various hydrogel spacer, CT scan slice thickness, MRI-guided contouring, and using intravenous (IV) contrast. Rectum contour quality was influenced by factors such as slice thickness, physician contouring styles, and the use of various hydrogel spacers. The bladder contour quality was primarily affected by using IV contrast. This study highlights the challenges in maintaining AI model performance consistency in a dynamic clinical setting. It underscores the need for continuous monitoring and updating of AI models to ensure their ongoing effectiveness and relevance in patient care.
△ Less
Submitted 16 November, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Sensing Aided OTFS Channel Estimation for Massive MIMO Systems
Authors:
Shuaifeng Jiang,
Ahmed Alkhateeb
Abstract:
Orthogonal time frequency space (OTFS) modulation has the potential to enable robust communications in highly-mobile scenarios. Estimating the channels for OTFS systems, however, is associated with high pilot signaling overhead that scales with the maximum delay and Doppler spreads. This becomes particularly challenging for massive MIMO systems where the overhead also scales with the number of ant…
▽ More
Orthogonal time frequency space (OTFS) modulation has the potential to enable robust communications in highly-mobile scenarios. Estimating the channels for OTFS systems, however, is associated with high pilot signaling overhead that scales with the maximum delay and Doppler spreads. This becomes particularly challenging for massive MIMO systems where the overhead also scales with the number of antennas. An important observation however is that the delay, Doppler, and angle of departure/arrival information are directly related to the distance, velocity, and direction information of the mobile user and the various scatterers in the environment. With this motivation, we propose to leverage radar sensing to obtain this information about the mobile users and scatterers in the environment and leverage it to aid the OTFS channel estimation in massive MIMO systems.
As one approach to realize our vision, this paper formulates the OTFS channel estimation problem in massive MIMO systems as a sparse recovery problem and utilizes the radar sensing information to determine the support (locations of the non-zero delay-Doppler taps). The proposed radar sensing aided sparse recovery algorithm is evaluated based on an accurate 3D ray-tracing framework with co-existing radar and communication data. The results show that the developed sensing-aided solution consistently outperforms the standard sparse recovery algorithms (that do not leverage radar sensing data) and leads to a significant reduction in the pilot overhead, which highlights a promising direction for OTFS based massive MIMO systems.
△ Less
Submitted 22 September, 2022;
originally announced September 2022.
-
Low-complexity Joint Phase Adjustment and Receive Beamforming for Directional Modulation Networks via IRS
Authors:
Rongen Dong,
Shaohua Jiang,
Xinhai Hua,
Yin Teng,
Feng Shu,
Jiangzhou Wang
Abstract:
Intelligent reflecting surface (IRS) is a revolutionary and low-cost technology for boosting the spectrum and energy efficiencies in future wireless communication network. In order to create controllable multipath transmission in the conventional line-of-sight (LOS) wireless communication environment, an IRS-aided directional modulation (DM) network is considered. In this paper, to improve the tra…
▽ More
Intelligent reflecting surface (IRS) is a revolutionary and low-cost technology for boosting the spectrum and energy efficiencies in future wireless communication network. In order to create controllable multipath transmission in the conventional line-of-sight (LOS) wireless communication environment, an IRS-aided directional modulation (DM) network is considered. In this paper, to improve the transmission security of the system and maximize the receive power sum (Max-RPS), two alternately optimizing schemes of jointly designing receive beamforming (RBF) vectors and IRS phase shift matrix (PSM) are proposed: Max-RPS using general alternating optimization (Max-RPS-GAO) algorithm and Max-RPS using zero-forcing (Max-RPS-ZF) algorithm. Simulation results show that, compared with the no-IRS-assisted scheme and the no-PSM optimization scheme, the proposed IRS-assisted Max-RPS-GAO method and Max-RPS-ZF method can significantly improve the secrecy rate (SR) performance of the DM system. Moreover, compared with the Max-RPS-GAO method, the proposed Max-RPS-ZF method has a faster convergence speed and a certain lower computational complexity.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Turbo: Opportunistic Enhancement for Edge Video Analytics
Authors:
Yan Lu,
Shiqi Jiang,
Ting Cao,
Yuanchao Shu
Abstract:
Edge computing is being widely used for video analytics. To alleviate the inherent tension between accuracy and cost, various video analytics pipelines have been proposed to optimize the usage of GPU on edge nodes. Nonetheless, we find that GPU compute resources provisioned for edge nodes are commonly under-utilized due to video content variations, subsampling and filtering at different places of…
▽ More
Edge computing is being widely used for video analytics. To alleviate the inherent tension between accuracy and cost, various video analytics pipelines have been proposed to optimize the usage of GPU on edge nodes. Nonetheless, we find that GPU compute resources provisioned for edge nodes are commonly under-utilized due to video content variations, subsampling and filtering at different places of a pipeline. As opposed to model and pipeline optimization, in this work, we study the problem of opportunistic data enhancement using the non-deterministic and fragmented idle GPU resources. In specific, we propose a task-specific discrimination and enhancement module and a model-aware adversarial training mechanism, providing a way to identify and transform low-quality images that are specific to a video pipeline in an accurate and efficient manner. A multi-exit model structure and a resource-aware scheduler is further developed to make online enhancement decisions and fine-grained inference execution under latency and GPU resource constraints. Experiments across multiple video analytics pipelines and datasets reveal that by judiciously allocating a small amount of idle resources on frames that tend to yield greater marginal benefits from enhancement, our system boosts DNN object detection accuracy by $7.3-11.3\%$ without incurring any latency costs.
△ Less
Submitted 29 June, 2022;
originally announced July 2022.
-
Multicast Scheduling over Multiple Channels: A Distribution-Embedding Deep Reinforcement Learning Method
Authors:
Ran Li,
Chuan Huang,
Xiaoqi Qin,
Shengpei Jiang
Abstract:
Multicasting is an efficient technique for simultaneously transmitting common messages from the base station (BS) to multiple mobile users (MUs). Multicast scheduling over multiple channels, which aims to jointly minimize the energy consumption of the BS and the latency of serving asynchronized requests from the MUs, is formulated as an infinite-horizon Markov decision process (MDP) problem with a…
▽ More
Multicasting is an efficient technique for simultaneously transmitting common messages from the base station (BS) to multiple mobile users (MUs). Multicast scheduling over multiple channels, which aims to jointly minimize the energy consumption of the BS and the latency of serving asynchronized requests from the MUs, is formulated as an infinite-horizon Markov decision process (MDP) problem with a large discrete action space, multiple time-varying constraints, and multiple time-invariant constraints. To address these challenges, this paper proposes a novel distribution-embedding multi-agent proximal policy optimization (DE-MAPPO) algorithm, which consists of one modified MAPPO and one distribution-embedding module: The former one handles the large discrete action space and time-varying constraints by modifying the structure of the actor networks and the training kernel of the conventional MAPPO; and the latter one iteratively adjusts the action distribution to satisfy the time-invariant constraints. Moreover, a performance upper bound of the considered MDP is derived by solving a two-step optimization problem. Finally, numerical results demonstrate that our proposed algorithm outperforms the existing ones and achieves comparable performance to the derived benchmark.
△ Less
Submitted 21 August, 2023; v1 submitted 19 May, 2022;
originally announced May 2022.
-
Coexistence between Task- and Data-Oriented Communications: A Whittle's Index Guided Multi-Agent Reinforcement Learning Approach
Authors:
Ran Li,
Chuan Huang,
Xiaoqi Qin,
Shengpei Jiang,
Nan Ma,
Shuguang Cui
Abstract:
We investigate the coexistence of task-oriented and data-oriented communications in a IoT system that shares a group of channels, and study the scheduling problem to jointly optimize the weighted age of incorrect information (AoII) and throughput, which are the performance metrics of the two types of communications, respectively. This problem is formulated as a Markov decision problem, which is di…
▽ More
We investigate the coexistence of task-oriented and data-oriented communications in a IoT system that shares a group of channels, and study the scheduling problem to jointly optimize the weighted age of incorrect information (AoII) and throughput, which are the performance metrics of the two types of communications, respectively. This problem is formulated as a Markov decision problem, which is difficult to solve due to the large discrete action space and the time-varying action constraints induced by the stochastic availability of channels. By exploiting the intrinsic properties of this problem and reformulating the reward function based on channel statistics, we first simplify the solution space, state space, and optimality criteria, and convert it to an equivalent Markov game, for which the large discrete action space issue is greatly relieved. Then, we propose a Whittle's index guided multi-agent proximal policy optimization (WI-MAPPO) algorithm to solve the considered game, where the embedded Whittle's index module further shrinks the action space, and the proposed offline training algorithm extends the training kernel of conventional MAPPO to address the issue of time-varying constraints. Finally, numerical results validate that the proposed algorithm significantly outperforms state-of-the-art age of information (AoI) based algorithms under scenarios with insufficient channel resources.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Lensless coherent diffraction imaging based on spatial light modulator with unknown modulation curve
Authors:
Hao Sha,
Chao He,
Shaowei Jiang,
Pengming Song,
Shuai Liu,
Wenzhen Zou,
Peiwu Qin,
Haoqian Wang,
Yongbing Zhang
Abstract:
Lensless imaging is a popular research field for the advantages of small size, wide field-of-view and low aberration in recent years. However, some traditional lensless imaging methods suffer from slow convergence, mechanical errors and conjugate solution interference, which limit its further application and development. In this work, we proposed a lensless imaging method based on spatial light mo…
▽ More
Lensless imaging is a popular research field for the advantages of small size, wide field-of-view and low aberration in recent years. However, some traditional lensless imaging methods suffer from slow convergence, mechanical errors and conjugate solution interference, which limit its further application and development. In this work, we proposed a lensless imaging method based on spatial light modulator (SLM) with unknown modulation curve. In our imaging system, we use SLM to modulate the wavefront of object, and introduce the ptychographic scanning algorithm that is able to recover the complex amplitude information even the SLM modulation curve is inaccurate or unknown. In addition, we also design a split-beam interference experiment to calibrate the modulation curve of SLM, and using the calibrated modulation function as the initial value of the expended ptychography iterative engine (ePIE) algorithm can improve the convergence speed. We further analyze the effect of modulation function, algorithm parameters and the characteristics of the coherent light source on the quality of reconstructed image. The simulated and real experiments show that the proposed method is superior to traditional mechanical scanning methods in terms of recovering speed and accuracy, with the recovering resolution up to 14 um.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
Hybrid Pixel-Unshuffled Network for Lightweight Image Super-Resolution
Authors:
Bin Sun,
Yulun Zhang,
Songyao Jiang,
Yun Fu
Abstract:
Convolutional neural network (CNN) has achieved great success on image super-resolution (SR). However, most deep CNN-based SR models take massive computations to obtain high performance. Downsampling features for multi-resolution fusion is an efficient and effective way to improve the performance of visual recognition. Still, it is counter-intuitive in the SR task, which needs to project a low-res…
▽ More
Convolutional neural network (CNN) has achieved great success on image super-resolution (SR). However, most deep CNN-based SR models take massive computations to obtain high performance. Downsampling features for multi-resolution fusion is an efficient and effective way to improve the performance of visual recognition. Still, it is counter-intuitive in the SR task, which needs to project a low-resolution input to high-resolution. In this paper, we propose a novel Hybrid Pixel-Unshuffled Network (HPUN) by introducing an efficient and effective downsampling module into the SR task. The network contains pixel-unshuffled downsampling and Self-Residual Depthwise Separable Convolutions. Specifically, we utilize pixel-unshuffle operation to downsample the input features and use grouped convolution to reduce the channels. Besides, we enhance the depthwise convolution's performance by adding the input feature to its output. Experiments on benchmark datasets show that our HPUN achieves and surpasses the state-of-the-art reconstruction performance with fewer parameters and computation costs.
△ Less
Submitted 29 November, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
LiDAR Aided Future Beam Prediction in Real-World Millimeter Wave V2I Communications
Authors:
Shuaifeng Jiang,
Gouranga Charan,
Ahmed Alkhateeb
Abstract:
This paper presents the first large-scale real-world evaluation for using LiDAR data to guide the mmWave beam prediction task. A machine learning (ML) model that leverages the LiDAR sensory data to predict the current and future beams was developed. Based on the large-scale real-world dataset, DeepSense 6G, this model was evaluated in a vehicle-to-infrastructure communication scenario with highly-…
▽ More
This paper presents the first large-scale real-world evaluation for using LiDAR data to guide the mmWave beam prediction task. A machine learning (ML) model that leverages the LiDAR sensory data to predict the current and future beams was developed. Based on the large-scale real-world dataset, DeepSense 6G, this model was evaluated in a vehicle-to-infrastructure communication scenario with highly-mobile vehicles. The experimental results show that the developed LiDAR-aided beam prediction and tracking model can predict the optimal beam in $95\%$ of the cases and with more than $90\%$ reduction in the beam training overhead. The LiDAR-aided beam tracking achieves comparable accuracy performance to a baseline solution that has perfect knowledge of the previous optimal beams, without requiring any knowledge about the previous optimal beam information and without any need for beam calibration. This highlights a promising solution for the critical beam alignment challenges in mmWave and terahertz communication systems.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
Region Specific Optimization (RSO)-based Deep Interactive Registration
Authors:
Ti Bai,
Muhan Lin,
Xiao Liang,
Biling Wang,
Michael Dohopolski,
Bin Cai,
Dan Nguyen,
Steve Jiang
Abstract:
Medical image registration is a fundamental and vital task which will affect the efficacy of many downstream clinical tasks. Deep learning (DL)-based deformable image registration (DIR) methods have been investigated, showing state-of-the-art performance. A test time optimization (TTO) technique was proposed to further improve the DL models' performance. Despite the substantial accuracy improvemen…
▽ More
Medical image registration is a fundamental and vital task which will affect the efficacy of many downstream clinical tasks. Deep learning (DL)-based deformable image registration (DIR) methods have been investigated, showing state-of-the-art performance. A test time optimization (TTO) technique was proposed to further improve the DL models' performance. Despite the substantial accuracy improvement with this TTO technique, there still remained some regions that exhibited large registration errors even after many TTO iterations. To mitigate this challenge, we firstly identified the reason why the TTO technique was slow, or even failed, to improve those regions' registration results. We then proposed a two-levels TTO technique, i.e., image-specific optimization (ISO) and region-specific optimization (RSO), where the region can be interactively indicated by the clinician during the registration result reviewing process. For both efficiency and accuracy, we further envisioned a three-step DL-based image registration workflow. Experimental results showed that our proposed method outperformed the conventional method qualitatively and quantitatively.
△ Less
Submitted 7 March, 2022;
originally announced March 2022.
-
Optimal Hyperparameters and Structure Setting of Multi-Objective Robust CNN Systems via Generalized Taguchi Method and Objective Vector Norm
Authors:
Sheng-Guo Wang,
Shanshan Jiang
Abstract:
Recently, Machine Learning (ML), Artificial Intelligence (AI), and Convolutional Neural Network (CNN) have made huge progress with broad applications, where their systems have deep learning structures and a large number of hyperparameters that determine the quality and performance of the CNNs and AI systems. These systems may have multi-objective ML and AI performance needs. There is a key require…
▽ More
Recently, Machine Learning (ML), Artificial Intelligence (AI), and Convolutional Neural Network (CNN) have made huge progress with broad applications, where their systems have deep learning structures and a large number of hyperparameters that determine the quality and performance of the CNNs and AI systems. These systems may have multi-objective ML and AI performance needs. There is a key requirement to find the optimal hyperparameters and structures for multi-objective robust optimal CNN systems. This paper proposes a generalized Taguchi approach to effectively determine the optimal hyperparameters and structure for the multi-objective robust optimal CNN systems via their objective performance vector norm. The proposed approach and methods are applied to a CNN classification system with the original ResNet for CIFAR-10 dataset as a demonstration and validation, which shows the proposed methods are highly effective to achieve an optimal accuracy rate of the original ResNet on CIFAR-10.
△ Less
Submitted 10 February, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer
Authors:
Pengfei Guo,
Yiqun Mei,
**yuan Zhou,
Shanshan Jiang,
Vishal M. Patel
Abstract:
Accelerating magnetic resonance image (MRI) reconstruction process is a challenging ill-posed inverse problem due to the excessive under-sampling operation in k-space. In this paper, we propose a recurrent transformer model, namely ReconFormer, for MRI reconstruction which can iteratively reconstruct high fertility magnetic resonance images from highly under-sampled k-space data. In particular, th…
▽ More
Accelerating magnetic resonance image (MRI) reconstruction process is a challenging ill-posed inverse problem due to the excessive under-sampling operation in k-space. In this paper, we propose a recurrent transformer model, namely ReconFormer, for MRI reconstruction which can iteratively reconstruct high fertility magnetic resonance images from highly under-sampled k-space data. In particular, the proposed architecture is built upon Recurrent Pyramid Transformer Layers (RPTL), which jointly exploits intrinsic multi-scale information at every architecture unit as well as the dependencies of the deep feature correlation through recurrent states. Moreover, the proposed ReconFormer is lightweight since it employs the recurrent structure for its parameter efficiency. We validate the effectiveness of ReconFormer on multiple datasets with different magnetic resonance sequences and show that it achieves significant improvements over the state-of-the-art methods with better parameter efficiency. Implementation code will be available in https://github.com/guopengf/ReconFormer.
△ Less
Submitted 27 January, 2022; v1 submitted 23 January, 2022;
originally announced January 2022.
-
Ptychographic sensor for large-scale lensless microbial monitoring with high spatiotemporal resolution
Authors:
Shaowei Jiang,
Chengfei Guo,
Zichao Bian,
Ruihai Wang,
Jiakai Zhu,
Pengming Song,
Patrick Hu,
Derek Hu,
Zibang Zhang,
Kazunori Hoshino,
Bin Feng,
Guoan Zheng
Abstract:
Traditional microbial detection methods often rely on the overall property of microbial cultures and cannot resolve individual growth event at high spatiotemporal resolution. As a result, they require bacteria to grow to confluence and then interpret the results. Here, we demonstrate the application of an integrated ptychographic sensor for lensless cytometric analysis of microbial cultures over a…
▽ More
Traditional microbial detection methods often rely on the overall property of microbial cultures and cannot resolve individual growth event at high spatiotemporal resolution. As a result, they require bacteria to grow to confluence and then interpret the results. Here, we demonstrate the application of an integrated ptychographic sensor for lensless cytometric analysis of microbial cultures over a large scale and with high spatiotemporal resolution. The reported device can be placed within a regular incubator or used as a standalone incubating unit for long-term microbial monitoring. For longitudinal study where massive data are acquired at sequential time points, we report a new temporal-similarity constraint to increase the temporal resolution of ptychographic reconstruction by 7-fold. With this strategy, the reported device achieves a centimeter-scale field of view, a half-pitch spatial resolution of 488 nm, and a temporal resolution of 15-second intervals. For the first time, we report the direct observation of bacterial growth in a 15-second interval by tracking the phase wraps of the recovered images, with high phase sensitivity like that in interferometric measurements. We also characterize cell growth via longitudinal dry mass measurement and perform rapid bacterial detection at low concentrations. For drug-screening application, we demonstrate proof-of-concept antibiotic susceptibility testing and perform single-cell analysis of antibiotic-induced filamentation. The combination of high phase sensitivity, high spatiotemporal resolution, and large field of view is unique among existing microscopy techniques. As a quantitative and miniaturized platform, it can improve studies with microorganisms and other biospecimens at resource-limited settings.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
A Survey: Deep Learning for Hyperspectral Image Classification with Few Labeled Samples
Authors:
Sen Jia,
Shuguo Jiang,
Zhijie Lin,
Nanying Li,
Meng Xu,
Shiqi Yu
Abstract:
With the rapid development of deep learning technology and improvement in computing capability, deep learning has been widely used in the field of hyperspectral image (HSI) classification. In general, deep learning models often contain many trainable parameters and require a massive number of labeled samples to achieve optimal performance. However, in regard to HSI classification, a large number o…
▽ More
With the rapid development of deep learning technology and improvement in computing capability, deep learning has been widely used in the field of hyperspectral image (HSI) classification. In general, deep learning models often contain many trainable parameters and require a massive number of labeled samples to achieve optimal performance. However, in regard to HSI classification, a large number of labeled samples is generally difficult to acquire due to the difficulty and time-consuming nature of manual labeling. Therefore, many research works focus on building a deep learning model for HSI classification with few labeled samples. In this article, we concentrate on this topic and provide a systematic review of the relevant literature. Specifically, the contributions of this paper are twofold. First, the research progress of related methods is categorized according to the learning paradigm, including transfer learning, active learning and few-shot learning. Second, a number of experiments with various state-of-the-art approaches has been carried out, and the results are summarized to reveal the potential research directions. More importantly, it is notable that although there is a vast gap between deep learning models (that usually need sufficient labeled samples) and the HSI scenario with few labeled samples, the issues of small-sample sets can be well characterized by fusion of deep learning methods and related techniques, such as transfer learning and a lightweight model. For reproducibility, the source codes of the methods assessed in the paper can be found at https://github.com/ShuGuoJ/HSI-Classification.git.
△ Less
Submitted 3 December, 2021;
originally announced December 2021.
-
Computer Vision Aided Beam Tracking in A Real-World Millimeter Wave Deployment
Authors:
Shuaifeng Jiang,
Ahmed Alkhateeb
Abstract:
Millimeter-wave (mmWave) and terahertz (THz) communications require beamforming to acquire adequate receive signal-to-noise ratio (SNR). To find the optimal beam, current beam management solutions perform beam training over a large number of beams in pre-defined codebooks. The beam training overhead increases the access latency and can become infeasible for high-mobility applications. To reduce or…
▽ More
Millimeter-wave (mmWave) and terahertz (THz) communications require beamforming to acquire adequate receive signal-to-noise ratio (SNR). To find the optimal beam, current beam management solutions perform beam training over a large number of beams in pre-defined codebooks. The beam training overhead increases the access latency and can become infeasible for high-mobility applications. To reduce or even eliminate this beam training overhead, we propose to utilize the visual data, captured for example by cameras at the base stations, to guide the beam tracking/refining process. We propose a machine learning (ML) framework, based on an encoder-decoder architecture, that can predict the future beams using the previously obtained visual sensing information. Our proposed approach is evaluated on a large-scale real-world dataset, where it achieves an accuracy of $64.47\%$ (and a normalized receive power of $97.66\%$) in predicting the future beam. This is achieved while requiring less than $1\%$ of the beam training overhead of a corresponding baseline solution that uses a sequence of previous beams to predict the future one. This high performance and low overhead obtained on the real-world dataset demonstrate the potential of the proposed vision-aided beam tracking approach in real-world applications.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
MHAttnSurv: Multi-Head Attention for Survival Prediction Using Whole-Slide Pathology Images
Authors:
Shuai Jiang,
Arief A. Suriawinata,
Saeed Hassanpour
Abstract:
In pathology, whole-slide images (WSI) based survival prediction has attracted increasing interest. However, given the large size of WSIs and the lack of pathologist annotations, extracting the prognostic information from WSIs remains a challenging task. Previous studies have used multiple instance learning approaches to combine the information from multiple randomly sampled patches, but different…
▽ More
In pathology, whole-slide images (WSI) based survival prediction has attracted increasing interest. However, given the large size of WSIs and the lack of pathologist annotations, extracting the prognostic information from WSIs remains a challenging task. Previous studies have used multiple instance learning approaches to combine the information from multiple randomly sampled patches, but different visual patterns may contribute differently to prognosis prediction. In this study, we developed a multi-head attention approach to focus on various parts of a tumor slide, for more comprehensive information extraction from WSIs. We evaluated our approach on four cancer types from The Cancer Genome Atlas database. Our model achieved an average c-index of 0.640, outperforming two existing state-of-the-art approaches for WSI-based survival prediction, which have an average c-index of 0.603 and 0.619 on these datasets. Visualization of our attention maps reveals each attention head focuses synergistically on different morphological patterns.
△ Less
Submitted 21 October, 2021;
originally announced October 2021.
-
High-throughput lensless whole slide imaging via continuous height-varying modulation of tilted sensor
Authors:
Shaowei Jiang,
Chengfei Guo,
Patrick Hu,
Derek Hu,
Pengming Song,
Tianbo Wang,
Zichao Bian,
Zibang Zhang,
Guoan Zheng
Abstract:
We report a new lensless microscopy configuration by integrating the concepts of transverse translational ptychography and defocus multi-height phase retrieval. In this approach, we place a tilted image sensor under the specimen for linearly-increasing phase modulation along one lateral direction. Similar to the operation of ptychography, we laterally translate the specimen and acquire the diffrac…
▽ More
We report a new lensless microscopy configuration by integrating the concepts of transverse translational ptychography and defocus multi-height phase retrieval. In this approach, we place a tilted image sensor under the specimen for linearly-increasing phase modulation along one lateral direction. Similar to the operation of ptychography, we laterally translate the specimen and acquire the diffraction images for reconstruction. Since the axial distance between the specimen and the sensor varies at different lateral positions, laterally translating the specimen effectively introduces defocus multi-height measurements while eliminating axial scanning. Lateral translation further introduces sub-pixel shift for pixel super-resolution imaging and naturally expands the field of view for rapid whole slide imaging. We show that the equivalent height variation can be precisely estimated from the lateral shift of the specimen, thereby addressing the challenge of precise axial positioning in conventional multi-height phase retrieval. Using a sensor with a 1.67-micron pixel size, our low-cost and field-portable prototype can resolve 690-nm linewidth on the resolution target. We show that a whole slide image of a blood smear with a 120-mm^2 field of view can be acquired in 18 seconds. We also demonstrate accurate automatic white blood cell counting from the recovered image. The reported approach may provide a turnkey solution for addressing point-of-care- and telemedicine-related challenges.
△ Less
Submitted 28 September, 2021;
originally announced October 2021.
-
Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech
Authors:
Bi-Cheng Yan,
Shao-Wei Fan Jiang,
Fu-An Chao,
Berlin Chen
Abstract:
End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data. However, there is a discrepancy between the objectives of model training and the MDD evaluation,…
▽ More
End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data. However, there is a discrepancy between the objectives of model training and the MDD evaluation, since the performance of an MDD model is commonly evaluated in terms of F1-score instead of phone or word error rate (PER/WER). In view of this, we in this paper explore the use of a discriminative objective function for training E2E MDD models, which aims to maximize the expected F1-score directly. A series of experiments conducted on the L2-ARCTIC dataset show that our proposed method can yield considerable performance improvements in relation to some state-of-the-art E2E MDD approaches and the celebrated GOP method.
△ Less
Submitted 9 July, 2022; v1 submitted 31 August, 2021;
originally announced August 2021.
-
Towards Robust Mispronunciation Detection and Diagnosis for L2 English Learners with Accent-Modulating Methods
Authors:
Shao-Wei Fan Jiang,
Bi-Cheng Yan,
Tien-Hong Lo,
Fu-An Chao,
Berlin Chen
Abstract:
With the acceleration of globalization, more and more people are willing or required to learn second languages (L2). One of the major remaining challenges facing current mispronunciation and diagnosis (MDD) models for use in computer-assisted pronunciation training (CAPT) is to handle speech from L2 learners with a diverse set of accents. In this paper, we set out to mitigate the adverse effects o…
▽ More
With the acceleration of globalization, more and more people are willing or required to learn second languages (L2). One of the major remaining challenges facing current mispronunciation and diagnosis (MDD) models for use in computer-assisted pronunciation training (CAPT) is to handle speech from L2 learners with a diverse set of accents. In this paper, we set out to mitigate the adverse effects of accent variety in building an L2 English MDD system with end-to-end (E2E) neural models. To this end, we first propose an effective modeling framework that infuses accent features into an E2E MDD model, thereby making the model more accent-aware. Going a step further, we design and present disparate accent-aware modules to perform accent-aware modulation of acoustic features in a finer-grained manner, so as to enhance the discriminating capability of the resulting MDD model. Extensive sets of experiments conducted on the L2-ARCTIC benchmark dataset show the merits of our MDD model, in comparison to some existing E2E-based strong baselines and the celebrated pronunciation scoring based method.
△ Less
Submitted 3 October, 2021; v1 submitted 26 August, 2021;
originally announced August 2021.
-
Dynamic Modelling of Combined Cycle Power Plant for Load Frequency Control With Large Penetration of Renewable Energy
Authors:
Songyao Jiang,
Takeyoshi Kato
Abstract:
As the concern about climate change and energy shortage grow stronger, the incorporation of renewable energy in the power system in the future is foreseeable. In a hybrid power system with a large penetration of PV generation, PV panel is regarded as a negative load in the power system. With the accurate prediction of PV output power, load frequency control could be done by controlling the thermal…
▽ More
As the concern about climate change and energy shortage grow stronger, the incorporation of renewable energy in the power system in the future is foreseeable. In a hybrid power system with a large penetration of PV generation, PV panel is regarded as a negative load in the power system. With the accurate prediction of PV output power, load frequency control could be done by controlling the thermal and hydro power plant in the system. Combined Cycle Power Plant is widely used because of its great advantages of fast response and high efficiency. This article is focusing on the mathematical modelling and analyzing of Combined Cycle Power Plant for the frequency control purpose in a model of hybrid system with large renewable energy generation.
△ Less
Submitted 8 August, 2021;
originally announced August 2021.
-
A Proof-of-Concept Study of Artificial Intelligence Assisted Contour Revision
Authors:
Ti Bai,
Anjali Balagopal,
Michael Dohopolski,
Howard E. Morgan,
Rafe McBeth,
Jun Tan,
Mu-Han Lin,
David J. Sher,
Dan Nguyen,
Steve Jiang
Abstract:
Automatic segmentation of anatomical structures is critical for many medical applications. However, the results are not always clinically acceptable and require tedious manual revision. Here, we present a novel concept called artificial intelligence assisted contour revision (AIACR) and demonstrate its feasibility. The proposed clinical workflow of AIACR is as follows given an initial contour that…
▽ More
Automatic segmentation of anatomical structures is critical for many medical applications. However, the results are not always clinically acceptable and require tedious manual revision. Here, we present a novel concept called artificial intelligence assisted contour revision (AIACR) and demonstrate its feasibility. The proposed clinical workflow of AIACR is as follows given an initial contour that requires a clinicians revision, the clinician indicates where a large revision is needed, and a trained deep learning (DL) model takes this input to update the contour. This process repeats until a clinically acceptable contour is achieved. The DL model is designed to minimize the clinicians input at each iteration and to minimize the number of iterations needed to reach acceptance. In this proof-of-concept study, we demonstrated the concept on 2D axial images of three head-and-neck cancer datasets, with the clinicians input at each iteration being one mouse click on the desired location of the contour segment. The performance of the model is quantified with Dice Similarity Coefficient (DSC) and 95th percentile of Hausdorff Distance (HD95). The average DSC/HD95 (mm) of the auto-generated initial contours were 0.82/4.3, 0.73/5.6 and 0.67/11.4 for three datasets, which were improved to 0.91/2.1, 0.86/2.4 and 0.86/4.7 with three mouse clicks, respectively. Each DL-based contour update requires around 20 ms. We proposed a novel AIACR concept that uses DL models to assist clinicians in revising contours in an efficient and effective way, and we demonstrated its feasibility by using 2D axial CT images from three head-and-neck cancer datasets.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
TENET: A Time-reversal Enhancement Network for Noise-robust ASR
Authors:
Fu-An Chao,
Shao-Wei Fan Jiang,
Bi-Cheng Yan,
Jeih-weih Hung,
Berlin Chen
Abstract:
Due to the unprecedented breakthroughs brought about by deep learning, speech enhancement (SE) techniques have been developed rapidly and play an important role prior to acoustic modeling to mitigate noise effects on speech. To increase the perceptual quality of speech, current state-of-the-art in the SE field adopts adversarial training by connecting an objective metric to the discriminator. Howe…
▽ More
Due to the unprecedented breakthroughs brought about by deep learning, speech enhancement (SE) techniques have been developed rapidly and play an important role prior to acoustic modeling to mitigate noise effects on speech. To increase the perceptual quality of speech, current state-of-the-art in the SE field adopts adversarial training by connecting an objective metric to the discriminator. However, there is no guarantee that optimizing the perceptual quality of speech will necessarily lead to improved automatic speech recognition (ASR) performance. In this study, we present TENET, a novel Time-reversal Enhancement NETwork, which leverages the transformation of an input noisy signal itself, i.e., the time-reversed version, in conjunction with the siamese network and complex dual-path transformer to promote SE performance for noise-robust ASR. Extensive experiments conducted on the Voicebank-DEMAND dataset show that TENET can achieve state-of-the-art results compared to a few top-of-the-line methods in terms of both SE and ASR evaluation metrics. To demonstrate the model generalization ability, we further evaluate TENET on the test set of scenarios contaminated with unseen noise, and the results also confirm the superiority of this promising method.
△ Less
Submitted 14 September, 2021; v1 submitted 3 July, 2021;
originally announced July 2021.
-
Over-and-Under Complete Convolutional RNN for MRI Reconstruction
Authors:
Pengfei Guo,
Jeya Maria Jose Valanarasu,
Puyang Wang,
**yuan Zhou,
Shanshan Jiang,
Vishal M. Patel
Abstract:
Reconstructing magnetic resonance (MR) images from undersampled data is a challenging problem due to various artifacts introduced by the under-sampling operation. Recent deep learning-based methods for MR image reconstruction usually leverage a generic auto-encoder architecture which captures low-level features at the initial layers and high-level features at the deeper layers. Such networks focus…
▽ More
Reconstructing magnetic resonance (MR) images from undersampled data is a challenging problem due to various artifacts introduced by the under-sampling operation. Recent deep learning-based methods for MR image reconstruction usually leverage a generic auto-encoder architecture which captures low-level features at the initial layers and high-level features at the deeper layers. Such networks focus much on global features which may not be optimal to reconstruct the fully-sampled image. In this paper, we propose an Over-and-Under Complete Convolutional Recurrent Neural Network (OUCR), which consists of an overcomplete and an undercomplete Convolutional Recurrent Neural Network(CRNN). The overcomplete branch gives special attention in learning local structures by restraining the receptive field of the network. Combining it with the undercomplete branch leads to a network which focuses more on low-level features without losing out on the global structures. Extensive experiments on two datasets demonstrate that the proposed method achieves significant improvements over the compressed sensing and popular deep learning-based methods with less number of trainable parameters.
△ Less
Submitted 24 June, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.