-
Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path Generator
Authors:
Xin Li,
Wenyang Gan,
Pang Wen,
Daqi Zhu
Abstract:
To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network meth…
▽ More
To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network method based on workload balance and neighborhood function. When there exists kinematic constraints or obstacles which may cause failure of trajectory planning, task re-assignment will be implemented by change the weights of SOM neurals, until the AUVs can have paths to reach all the targets. Then, the Dubins paths are generated in several limited cases. AUV's yaw angle is limited, which result in new assignments to the targets. Computation flow is designed so that the algorithm in MATLAB and Python can realizes the path planning to multiple targets. Finally, simulation results prove that the proposed algorithm can effectively accomplish the task assignment task for multi-AUV system.
△ Less
Submitted 24 June, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
YNetr: Dual-Encoder architecture on Plain Scan Liver Tumors (PSLT)
Authors:
Wen Sheng,
Zhong Zheng,
Jiajun Liu,
Han Lu,
Hanyuan Zhang,
Zhengyong Jiang,
Zhihong Zhang,
Dao** Zhu
Abstract:
Background: Liver tumors are abnormal growths in the liver that can be either benign or malignant, with liver cancer being a significant health concern worldwide. However, there is no dataset for plain scan segmentation of liver tumors, nor any related algorithms. To fill this gap, we propose Plain Scan Liver Tumors(PSLT) and YNetr. Methods: A collection of 40 liver tumor plain scan segmentation d…
▽ More
Background: Liver tumors are abnormal growths in the liver that can be either benign or malignant, with liver cancer being a significant health concern worldwide. However, there is no dataset for plain scan segmentation of liver tumors, nor any related algorithms. To fill this gap, we propose Plain Scan Liver Tumors(PSLT) and YNetr. Methods: A collection of 40 liver tumor plain scan segmentation datasets was assembled and annotated. Concurrently, we utilized Dice coefficient as the metric for assessing the segmentation outcomes produced by YNetr, having advantage of capturing different frequency information. Results: The YNetr model achieved a Dice coefficient of 62.63% on the PSLT dataset, surpassing the other publicly available model by an accuracy margin of 1.22%. Comparative evaluations were conducted against a range of models including UNet 3+, XNet, UNetr, Swin UNetr, Trans-BTS, COTr, nnUNetv2 (2D), nnUNetv2 (3D fullres), MedNext (2D) and MedNext(3D fullres). Conclusions: We not only proposed a dataset named PSLT(Plain Scan Liver Tumors), but also explored a structure called YNetr that utilizes wavelet transform to extract different frequency information, which having the SOTA in PSLT by experiments.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Point cloud-based registration and image fusion between cardiac SPECT MPI and CTA
Authors:
Shaojie Tang,
Penpen Miao,
Xingyu Gao,
Yu Zhong,
Dantong Zhu,
Haixing Wen,
Zhihui Xu,
Qiuyue Wei,
Hong** Yao,
Xin Huang,
Rui Gao,
Chen Zhao,
Weihua Zhou
Abstract:
A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point c…
▽ More
A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point clouds of the LV epicardial contours (LVECs). Secondly, according to the characteristics of cardiac anatomy, the special points of anterior and posterior interventricular grooves (APIGs) were manually marked in both SPECT and CTA image volumes. Thirdly, we developed an in-house program for coarsely registering the special points of APIGs to ensure a correct cardiac orientation alignment between SPECT and CTA images. Fourthly, we employed ICP, SICP or CPD algorithm to achieve a fine registration for the point clouds (together with the special points of APIGs) of the LV epicardial surfaces (LVERs) in SPECT and CTA images. Finally, the image fusion between SPECT and CTA was realized after the fine registration. The experimental results showed that the cardiac orientation was aligned well and the mean distance error of the optimal registration method (CPD with affine transform) was consistently less than 3 mm. The proposed method could effectively fuse the structures from cardiac CTA and SPECT functional images, and demonstrated a potential in assisting in accurate diagnosis of cardiac diseases by combining complementary advantages of the two imaging modalities.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Current Effect-eliminated Optimal Target Assignment and Motion Planning for a Multi-UUV System
Authors:
Danjie Zhu,
Simon X. Yang
Abstract:
The paper presents an innovative approach (CBNNTAP) that addresses the complexities and challenges introduced by ocean currents when optimizing target assignment and motion planning for a multi-unmanned underwater vehicle (UUV) system. The core of the proposed algorithm involves the integration of several key components. Firstly, it incorporates a bio-inspired neural network-based (BINN) approach…
▽ More
The paper presents an innovative approach (CBNNTAP) that addresses the complexities and challenges introduced by ocean currents when optimizing target assignment and motion planning for a multi-unmanned underwater vehicle (UUV) system. The core of the proposed algorithm involves the integration of several key components. Firstly, it incorporates a bio-inspired neural network-based (BINN) approach which predicts the most efficient paths for individual UUVs while simultaneously ensuring collision avoidance among the vehicles. Secondly, an efficient target assignment component is integrated by considering the path distances determined by the BINN algorithm. In addition, a critical innovation within the CBNNTAP algorithm is its capacity to address the disruptive effects of ocean currents, where an adjustment component is seamlessly integrated to counteract the deviations caused by these currents, which enhances the accuracy of both motion planning and target assignment for the UUVs. The effectiveness of the CBNNTAP algorithm is demonstrated through comprehensive simulation results and the outcomes underscore the superiority of the developed algorithm in nullifying the effects of static and dynamic ocean currents in 2D and 3D scenarios.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Joint Trading and Scheduling among Coupled Carbon-Electricity-Heat-Gas Industrial Clusters
Authors:
Dafeng Zhu,
Bo Yang,
Yu Wu,
Haoran Deng,
Zhaoyang Dong,
Kai Ma,
** Guan
Abstract:
This paper presents a carbon-energy coupling management framework for an industrial park, where the carbon flow model accompanying multi-energy flows is adopted to track and suppress carbon emissions on the user side. To deal with the quadratic constraint of gas flows, a bound tightening algorithm for constraints relaxation is adopted. The synergies among the carbon capture, energy storage, power-…
▽ More
This paper presents a carbon-energy coupling management framework for an industrial park, where the carbon flow model accompanying multi-energy flows is adopted to track and suppress carbon emissions on the user side. To deal with the quadratic constraint of gas flows, a bound tightening algorithm for constraints relaxation is adopted. The synergies among the carbon capture, energy storage, power-to-gas further consume renewable energy and reduce carbon emissions. Aiming at carbon emissions disparities and supply-demand imbalances, this paper proposes a carbon trading ladder reward and punishment mechanism and an energy trading and scheduling method based on Lyapunov optimization and matching game to maximize the long-term benefits of each industrial cluster without knowing the prior information of random variables. Case studies show that our proposed trading method can reduce overall costs and carbon emissions while relieving energy pressure, which is important for Environmental, Social and Governance (ESG).
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Holistic Evaluation of GPT-4V for Biomedical Imaging
Authors:
Zhengliang Liu,
Hanqi Jiang,
Tianyang Zhong,
Zihao Wu,
Chong Ma,
Yiwei Li,
Xiaowei Yu,
Yutong Zhang,
Yi Pan,
Peng Shu,
Yanjun Lyu,
Lu Zhang,
Junjie Yao,
Peixin Dong,
Chao Cao,
Zhenxiang Xiao,
Jiaqi Wang,
Huan Zhao,
Shaochen Xu,
Yaonai Wei,
**gyuan Chen,
Haixing Dai,
Peilong Wang,
Hao He,
Zewei Wang
, et al. (25 additional authors not shown)
Abstract:
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor…
▽ More
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications.
△ Less
Submitted 10 November, 2023;
originally announced December 2023.
-
Empirical Evaluation of the Segment Anything Model (SAM) for Brain Tumor Segmentation
Authors:
Mohammad Peivandi,
Jason Zhang,
Michael Lu,
Dongxiao Zhu,
Zhifeng Kou
Abstract:
Brain tumor segmentation presents a formidable challenge in the field of Medical Image Segmentation. While deep-learning models have been useful, human expert segmentation remains the most accurate method. The recently released Segment Anything Model (SAM) has opened up the opportunity to apply foundation models to this difficult task. However, SAM was primarily trained on diverse natural images.…
▽ More
Brain tumor segmentation presents a formidable challenge in the field of Medical Image Segmentation. While deep-learning models have been useful, human expert segmentation remains the most accurate method. The recently released Segment Anything Model (SAM) has opened up the opportunity to apply foundation models to this difficult task. However, SAM was primarily trained on diverse natural images. This makes applying SAM to biomedical segmentation, such as brain tumors with less defined boundaries, challenging. In this paper, we enhanced SAM's mask decoder using transfer learning with the Decathlon brain tumor dataset. We developed three methods to encapsulate the four-dimensional data into three dimensions for SAM. An on-the-fly data augmentation approach has been used with a combination of rotations and elastic deformations to increase the size of the training dataset. Two key metrics: the Dice Similarity Coefficient (DSC) and the Hausdorff Distance 95th Percentile (HD95), have been applied to assess the performance of our segmentation models. These metrics provided valuable insights into the quality of the segmentation results. In our evaluation, we compared this improved model to two benchmarks: the pretrained SAM and the widely used model, nnUNetv2. We find that the improved SAM shows considerable improvement over the pretrained SAM, while nnUNetv2 outperformed the improved SAM in terms of overall segmentation accuracy. Nevertheless, the improved SAM demonstrated slightly more consistent results than nnUNetv2, especially on challenging cases that can lead to larger Hausdorff distances. In the future, more advanced techniques can be applied in order to further improve the performance of SAM on brain tumor segmentation.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Improving CTC-AED model with integrated-CTC and auxiliary loss regularization
Authors:
Daobin Zhu,
Xiangdong Su,
Hongbin Zhang
Abstract:
Connectionist temporal classification (CTC) and attention-based encoder decoder (AED) joint training has been widely applied in automatic speech recognition (ASR). Unlike most hybrid models that separately calculate the CTC and AED losses, our proposed integrated-CTC utilizes the attention mechanism of AED to guide the output of CTC. In this paper, we employ two fusion methods, namely direct addit…
▽ More
Connectionist temporal classification (CTC) and attention-based encoder decoder (AED) joint training has been widely applied in automatic speech recognition (ASR). Unlike most hybrid models that separately calculate the CTC and AED losses, our proposed integrated-CTC utilizes the attention mechanism of AED to guide the output of CTC. In this paper, we employ two fusion methods, namely direct addition of logits (DAL) and preserving the maximum probability (PMP). We achieve dimensional consistency by adaptively affine transforming the attention results to match the dimensions of CTC. To accelerate model convergence and improve accuracy, we introduce auxiliary loss regularization for accelerated convergence. Experimental results demonstrate that the DAL method performs better in attention rescoring, while the PMP method excels in CTC prefix beam search and greedy search.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Can We Transfer Noise Patterns? A Multi-environment Spectrum Analysis Model Using Generated Cases
Authors:
Haiwen Du,
Zheng Ju,
Yu An,
Honghui Du,
Dongjie Zhu,
Zhaoshuo Tian,
Aonghus Lawlor,
Ruihai Dong
Abstract:
Spectrum analysis systems in online water quality testing are designed to detect types and concentrations of pollutants and enable regulatory agencies to respond promptly to pollution incidents. However, spectral data-based testing devices suffer from complex noise patterns when deployed in non-laboratory environments. To make the analysis model applicable to more environments, we propose a noise…
▽ More
Spectrum analysis systems in online water quality testing are designed to detect types and concentrations of pollutants and enable regulatory agencies to respond promptly to pollution incidents. However, spectral data-based testing devices suffer from complex noise patterns when deployed in non-laboratory environments. To make the analysis model applicable to more environments, we propose a noise patterns transferring model, which takes the spectrum of standard water samples in different environments as cases and learns the differences in their noise patterns, thus enabling noise patterns to transfer to unknown samples. Unfortunately, the inevitable sample-level baseline noise makes the model unable to obtain the paired data that only differ in dataset-level environmental noise. To address the problem, we generate a sample-to-sample case-base to exclude the interference of sample-level noise on dataset-level noise learning, enhancing the system's learning performance. Experiments on spectral data with different background noises demonstrate the good noise-transferring ability of the proposed method against baseline systems ranging from wavelet denoising, deep neural networks, and generative models. From this research, we posit that our method can enhance the performance of DL models by generating high-quality cases. The source code is made publicly available online at https://github.com/Magnomic/CNST.
△ Less
Submitted 14 August, 2023; v1 submitted 2 August, 2023;
originally announced August 2023.
-
MUVF-YOLOX: A Multi-modal Ultrasound Video Fusion Network for Renal Tumor Diagnosis
Authors:
Junyu Li,
Han Huang,
Dong Ni,
Wufeng Xue,
Dongmei Zhu,
Jun Cheng
Abstract:
Early diagnosis of renal cancer can greatly improve the survival rate of patients. Contrast-enhanced ultrasound (CEUS) is a cost-effective and non-invasive imaging technique and has become more and more frequently used for renal tumor diagnosis. However, the classification of benign and malignant renal tumors can still be very challenging due to the highly heterogeneous appearance of cancer and im…
▽ More
Early diagnosis of renal cancer can greatly improve the survival rate of patients. Contrast-enhanced ultrasound (CEUS) is a cost-effective and non-invasive imaging technique and has become more and more frequently used for renal tumor diagnosis. However, the classification of benign and malignant renal tumors can still be very challenging due to the highly heterogeneous appearance of cancer and imaging artifacts. Our aim is to detect and classify renal tumors by integrating B-mode and CEUS-mode ultrasound videos. To this end, we propose a novel multi-modal ultrasound video fusion network that can effectively perform multi-modal feature fusion and video classification for renal tumor diagnosis. The attention-based multi-modal fusion module uses cross-attention and self-attention to extract modality-invariant features and modality-specific features in parallel. In addition, we design an object-level temporal aggregation (OTA) module that can automatically filter low-quality features and efficiently integrate temporal information from multiple frames to improve the accuracy of tumor diagnosis. Experimental results on a multicenter dataset show that the proposed framework outperforms the single-modal models and the competing methods. Furthermore, our OTA module achieves higher classification accuracy than the frame-level predictions. Our code is available at \url{https://github.com/JeunyuLi/MUAF}.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Exploring Multimodal Approaches for Alzheimer's Disease Detection Using Patient Speech Transcript and Audio Data
Authors:
Hongmin Cai,
Xiaoke Huang,
Zhengliang Liu,
Wenxiong Liao,
Haixing Dai,
Zihao Wu,
Dajiang Zhu,
Hui Ren,
Quanzheng Li,
Tianming Liu,
Xiang Li
Abstract:
Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach invo…
▽ More
Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach involves pre-trained language models and Graph Neural Network (GNN) that constructs a graph from the speech transcript, and extracts features using GNN for AD detection. Data augmentation techniques, including synonym replacement, GPT-based augmenter, and so on, were used to address the small dataset size. Audio data was also introduced, and WavLM model was used to extract audio features. These features were then fused with text features using various methods. Finally, a contrastive learning approach was attempted by converting speech transcripts back to audio and using it for contrastive learning with the original audio. We conducted intensive experiments and analysis on the above methods. Our findings shed light on the challenges and potential solutions in AD detection using speech and audio data.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Segment Anything Model (SAM) for Radiation Oncology
Authors:
Lian Zhang,
Zhengliang Liu,
Lu Zhang,
Zihao Wu,
Xiaowei Yu,
Jason Holmes,
Hongying Feng,
Haixing Dai,
Xiang Li,
Quanzheng Li,
Dajiang Zhu,
Tianming Liu,
Wei Liu
Abstract:
In this study, we evaluate the performance of the Segment Anything Model (SAM) in clinical radiotherapy. Our results indicate that SAM's 'segment anything' mode can achieve clinically acceptable segmentation results in most organs-at-risk (OARs) with Dice scores higher than 0.7. SAM's 'box prompt' mode further improves the Dice scores by 0.1 to 0.5. Considering the size of the organ and the clarit…
▽ More
In this study, we evaluate the performance of the Segment Anything Model (SAM) in clinical radiotherapy. Our results indicate that SAM's 'segment anything' mode can achieve clinically acceptable segmentation results in most organs-at-risk (OARs) with Dice scores higher than 0.7. SAM's 'box prompt' mode further improves the Dice scores by 0.1 to 0.5. Considering the size of the organ and the clarity of its boundary, SAM displays better performance for large organs with clear boundaries but performs worse for smaller organs with unclear boundaries. Given that SAM, a model pre-trained purely on natural images, can handle the delineation of OARs from medical images with clinically acceptable accuracy, these results highlight SAM's robust generalization capabilities with consistent accuracy in automatic segmentation for radiotherapy. In other words, SAM can achieve delineation of different OARs at different sites using a generic automatic segmentation model. SAM's generalization capabilities across different disease sites suggest that it is technically feasible to develop a generic model for automatic segmentation in radiotherapy.
△ Less
Submitted 4 July, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
A GOA-Based Fault-Tolerant Trajectory Tracking Control for an Underwater Vehicle of Multi-Thruster System without Actuator Saturation
Authors:
Danjie Zhu,
Lei Wang,
Hua Zhang,
Simon X. Yang
Abstract:
This paper proposes an intelligent fault-tolerant control (FTC) strategy to tackle the trajectory tracking problem of an underwater vehicle (UV) under thruster damage (power loss) cases and meanwhile resolve the actuator saturation brought by the vehicle's physical constraints. In the proposed control strategy, the trajectory tracking component is formed by a refined backstep** algorithm that co…
▽ More
This paper proposes an intelligent fault-tolerant control (FTC) strategy to tackle the trajectory tracking problem of an underwater vehicle (UV) under thruster damage (power loss) cases and meanwhile resolve the actuator saturation brought by the vehicle's physical constraints. In the proposed control strategy, the trajectory tracking component is formed by a refined backstep** algorithm that controls the velocity variation and a sliding mode control deducts the torque/force outputs; the fault-tolerant component is established based on a Grasshopper Optimization Algorithm (GOA), which provides fast convergence speed as well as satisfactory accuracy of deducting optimized reallocation of the thruster forces to compensate for the power loss in different fault cases. Simulations with or without environmental perturbations under different fault cases and comparisons to other traditional FTCs are presented, thus verifying the effectiveness and robustness of the proposed GOA-based fault-tolerant trajectory tracking design.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
End-to-end Recording Device Identification Based on Deep Representation Learning
Authors:
Chunyan Zeng,
Dongliang Zhu,
Zhifeng Wang,
Minghu Wu,
Wei Xiong,
Nan Zhao
Abstract:
Deep learning techniques have achieved specific results in recording device source identification. The recording device source features include spatial information and certain temporal information. However, most recording device source identification methods based on deep learning only use spatial representation learning from recording device source features, which cannot make full use of recordin…
▽ More
Deep learning techniques have achieved specific results in recording device source identification. The recording device source features include spatial information and certain temporal information. However, most recording device source identification methods based on deep learning only use spatial representation learning from recording device source features, which cannot make full use of recording device source information. Therefore, in this paper, to fully explore the spatial information and temporal information of recording device source, we propose a new method for recording device source identification based on the fusion of spatial feature information and temporal feature information by using an end-to-end framework. From a feature perspective, we designed two kinds of networks to extract recording device source spatial and temporal information. Afterward, we use the attention mechanism to adaptively assign the weight of spatial information and temporal information to obtain fusion features. From a model perspective, our model uses an end-to-end framework to learn the deep representation from spatial feature and temporal feature and train using deep and shallow loss to joint optimize our network. This method is compared with our previous work and baseline system. The results show that the proposed method is better than our previous work and baseline system under general conditions.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Maurizio Denna,
Abdel Younes,
Ganzorig Gankhuyag,
**gang Huh,
Myeong Kyun Kim,
Kihwan Yoon,
Hyeon-Cheol Moon,
Seungho Lee,
Yoonsik Choe,
**woo Jeong,
Sungjei Kim,
Maciej Smyl,
Tomasz Latkowski,
Pawel Kubik,
Michal Sokolski,
Yujie Ma,
Jiahao Chao,
Zhou Zhou,
Hongfan Gao,
Zhengfeng Yang,
Zhenbing Zeng,
Zhengyang Zhuge,
Chenghua Li
, et al. (71 additional authors not shown)
Abstract:
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose…
▽ More
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Cheng-Ming Chiang,
Hsien-Kai Kuo,
Yu-Syuan Xu,
Man-Yu Lee,
Allen Lu,
Chia-Ming Cheng,
Chih-Cheng Chen,
Jia-Ying Yong,
Hong-Han Shuai,
Wen-Huang Cheng,
Zhuang Jia,
Tianyu Xu,
Yijian Zhang,
Long Bao,
Heng Sun,
Diankai Zhang,
Si Gao,
Shaoli Liu,
Biao Wu,
Xiaofeng Zhang,
Chengjian Zheng,
Kaidi Lu,
Ning Wang
, et al. (29 additional authors not shown)
Abstract:
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob…
▽ More
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Massive MIMO Evolution Towards 3GPP Release 18
Authors:
Huang** **,
Kunpeng Liu,
Gilwon Lee,
Emad J. Farag,
Min Zhang,
Dalin Zhu,
Leiming Zhang,
Eko Onggosanusi,
Mansoor Shafi,
Harsh Tataria
Abstract:
Since the introduction of fifth-generation new radio (5G-NR) in Third Generation Partnership Project (3GPP) Release 15, swift progress has been made to evolve 5G with 3GPP Release 18 emerging. A critical aspect is the design of massive multiple-input multiple-output (MIMO) technology. In this line, this paper makes several important contributions: We provide a comprehensive overview of the evoluti…
▽ More
Since the introduction of fifth-generation new radio (5G-NR) in Third Generation Partnership Project (3GPP) Release 15, swift progress has been made to evolve 5G with 3GPP Release 18 emerging. A critical aspect is the design of massive multiple-input multiple-output (MIMO) technology. In this line, this paper makes several important contributions: We provide a comprehensive overview of the evolution of standardized massive MIMO features from 3GPP Release 15 to 17 for both time/frequency-division duplex operation across bands FR-1 and FR-2. We analyze the progress on channel state information (CSI) frameworks, beam management frameworks and present enhancements for uplink CSI. We shed light on emerging 3GPP Release 18 problems requiring imminent attention. These include advanced codebook design and sounding reference signal design for coherent joint transmission (CJT) with multiple transmission/reception points (multi- TRPs). We discuss advancements in uplink demodulation reference signal design, enhancements for mobility to provide accurate CSI estimates, and unified transmission configuration indicator framework tailored for FR-2 bands. For each concept, we provide system level simulation results to highlight their performance benefits. Via field trials in an outdoor environment at Shanghai Jiaotong University, we demonstrate the gains of multi-TRP CJT relative to single TRP at 3.7 GHz.
△ Less
Submitted 15 October, 2022;
originally announced October 2022.
-
FocalUNETR: A Focal Transformer for Boundary-aware Segmentation of CT Images
Authors:
Chengyin Li,
Yao Qiang,
Rafi Ibn Sultan,
Hassan Bagher-Ebadian,
Prashant Khanduri,
Indrin J. Chetty,
Dongxiao Zhu
Abstract:
Computed Tomography (CT) based precise prostate segmentation for treatment planning is challenging due to (1) the unclear boundary of the prostate derived from CT's poor soft tissue contrast and (2) the limitation of convolutional neural network-based models in capturing long-range global context. Here we propose a novel focal transformer-based image segmentation architecture to effectively and ef…
▽ More
Computed Tomography (CT) based precise prostate segmentation for treatment planning is challenging due to (1) the unclear boundary of the prostate derived from CT's poor soft tissue contrast and (2) the limitation of convolutional neural network-based models in capturing long-range global context. Here we propose a novel focal transformer-based image segmentation architecture to effectively and efficiently extract local visual features and global context from CT images. Additionally, we design an auxiliary boundary-induced label regression task coupled with the main prostate segmentation task to address the unclear boundary issue in CT images. We demonstrate that this design significantly improves the quality of the CT-based prostate segmentation task over other competing methods, resulting in substantially improved performance, i.e., higher Dice Similarity Coefficient, lower Hausdorff Distance, and Average Symmetric Surface Distance, on both private and public CT image datasets. Our code is available at this \href{https://github.com/ChengyinLee/FocalUNETR.git}{link}.
△ Less
Submitted 18 July, 2023; v1 submitted 6 October, 2022;
originally announced October 2022.
-
A Fuzzy Logic-based Cascade Control without Actuator Saturation for the Unmanned Underwater Vehicle Trajectory Tracking
Authors:
Danjie Zhu,
Simon X. Yang,
Mohammad Biglarbegian
Abstract:
An intelligent control strategy is proposed to eliminate the actuator saturation problem that exists in the trajectory tracking process of unmanned underwater vehicles (UUV). The control strategy consists of two parts: for the kinematic modeling part, a fuzzy logic-refined backstep** control is developed to achieve control velocities within acceptable ranges and errors of small fluctuations; on…
▽ More
An intelligent control strategy is proposed to eliminate the actuator saturation problem that exists in the trajectory tracking process of unmanned underwater vehicles (UUV). The control strategy consists of two parts: for the kinematic modeling part, a fuzzy logic-refined backstep** control is developed to achieve control velocities within acceptable ranges and errors of small fluctuations; on the basis of the velocities deducted by the improved kinematic control, the sliding mode control (SMC) is introduced in the dynamic modeling to obtain corresponding torques and forces that should be applied to the vehicle body. With the control velocities computed by the kinematic model and applied forces derived by the dynamic model, the robustness and accuracy of the UUV trajectory without actuator saturation can be achieved.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Deep learning based sferics recognition for AMT data processing in the dead band
Authors:
Enhua Jiang,
Rujun Chen,
Xinming Wu,
Jianxin Liu,
Debin Zhu,
Weiqiang Liu
Abstract:
In the audio magnetotellurics (AMT) sounding data processing, the absence of sferic signals in some time ranges typically results in a lack of energy in the AMT dead band, which may cause unreliable resistivity estimate. We propose a deep convolutional neural network (CNN) to automatically recognize sferic signals from redundantly recorded data in a long time range and use them to compensate for t…
▽ More
In the audio magnetotellurics (AMT) sounding data processing, the absence of sferic signals in some time ranges typically results in a lack of energy in the AMT dead band, which may cause unreliable resistivity estimate. We propose a deep convolutional neural network (CNN) to automatically recognize sferic signals from redundantly recorded data in a long time range and use them to compensate for the resistivity estimation. We train the CNN by using field time series data with different signal to noise rations that were acquired from different regions in mainland China. To solve the potential overfitting problem due to the limited number of sferic labels, we propose a training strategy that randomly generates training samples (with random data augmentations) while optimizing the CNN model parameters. We stop the training process and data generation until the training loss converges. In addition, we use a weighted binary cross-entropy loss function to solve the sample imbalance problem to better optimize the network, use multiple reasonable metrics to evaluate network performance, and carry out ablation experiments to optimally choose the model hyperparameters. Extensive field data applications show that our trained CNN can robustly recognize sferic signals from noisy time series for subsequent impedance estimation. The subsequent processing results show that our method can significantly improve S/N and effectively solve the problem of lack of energy in dead band. Compared to the traditional processing method without sferic compensation, our method can generate a smoother and more reasonable apparent resistivity-phase curves and depolarized phase tensor, correct the estimation error of sudden drop of high-frequency apparent resistivity and abnormal behavior of phase reversal, and finally better restore the real shallow subsurface resistivity structure.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
Saliency Guided Adversarial Training for Learning Generalizable Features with Applications to Medical Imaging Classification System
Authors:
Xin Li,
Yao Qiang,
Chengyin Li,
Sijia Liu,
Dongxiao Zhu
Abstract:
This work tackles a central machine learning problem of performance degradation on out-of-distribution (OOD) test sets. The problem is particularly salient in medical imaging based diagnosis system that appears to be accurate but fails when tested in new hospitals/datasets. Recent studies indicate the system might learn shortcut and non-relevant features instead of generalizable features, so-calle…
▽ More
This work tackles a central machine learning problem of performance degradation on out-of-distribution (OOD) test sets. The problem is particularly salient in medical imaging based diagnosis system that appears to be accurate but fails when tested in new hospitals/datasets. Recent studies indicate the system might learn shortcut and non-relevant features instead of generalizable features, so-called good features. We hypothesize that adversarial training can eliminate shortcut features whereas saliency guided training can filter out non-relevant features; both are nuisance features accounting for the performance degradation on OOD test sets. With that, we formulate a novel model training scheme for the deep neural network to learn good features for classification and/or detection tasks ensuring a consistent generalization performance on OOD test sets. The experimental results qualitatively and quantitatively demonstrate the superior performance of our method using the benchmark CXR image data sets on classification tasks.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
Optimization of rule-based energy management strategies for hybrid vehicles using dynamic programming
Authors:
Di Zhu,
Ewan Pritchard,
Sumanth Reddy Dadam,
Vivek Kumar,
Yang Xu
Abstract:
Reducing energy consumption is a key focus for hybrid electric vehicle (HEV) development. The popular vehicle dynamic model used in many energy management optimization studies does not capture the vehicle dynamics that the in-vehicle measurement system does. However, feedback from the measurement system is what the vehicle controller actually uses to manage energy consumption. Therefore, the optim…
▽ More
Reducing energy consumption is a key focus for hybrid electric vehicle (HEV) development. The popular vehicle dynamic model used in many energy management optimization studies does not capture the vehicle dynamics that the in-vehicle measurement system does. However, feedback from the measurement system is what the vehicle controller actually uses to manage energy consumption. Therefore, the optimization solely using the model does not represent what the vehicle controller sees in the vehicle. This paper reports the utility factor-weighted energy consumption using a rule-based strategy under a real-world representative drive cycle. In addition, the vehicle test data was used to perform the optimization approach. By comparing results from both rule-based and optimization-based strategies, the areas for further improving rule-based strategy are discussed. Furthermore, recent development of OBD raises a concern about the increase of energy consumption. This paper investigates the energy consumption increase with extensive OBD usage.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Motion Planning and Tracking Control of Unmanned Underwater Vehicles: Technologies, Challenges and Prospects
Authors:
Danjie Zhu,
Tao Yan,
Simon X. Yang
Abstract:
The motion planning and tracking control techniques of unmanned underwater vehicles (UUV) are fundamentally significant for efficient and robust UUV navigation, which is crucial for underwater rescue, facility maintenance, marine resource exploration, aquatic recreation, etc. Studies on UUV motion planning and tracking control have been growing rapidly worldwide, which are usually sorted into the…
▽ More
The motion planning and tracking control techniques of unmanned underwater vehicles (UUV) are fundamentally significant for efficient and robust UUV navigation, which is crucial for underwater rescue, facility maintenance, marine resource exploration, aquatic recreation, etc. Studies on UUV motion planning and tracking control have been growing rapidly worldwide, which are usually sorted into the following topics: task assignment of the multi-UUV system, UUV path planning and UUV trajectory tracking. This paper provides a comprehensive review of conventional and intelligent technologies for motion planning and tracking control of UUVs. Analysis of the benefits and drawbacks of these various methodologies in literature is presented. In addition, the challenges and prospects of UUV motion planning and tracking control are provided as possible developments for future research.
△ Less
Submitted 9 July, 2022;
originally announced July 2022.
-
SCAI: A Spectral data Classification framework with Adaptive Inference for the IoT platform
Authors:
Yundong Sun,
Dongjie Zhu,
Haiwen Du,
Yansong Wang,
Zhaoshuo Tian
Abstract:
Currently, it is a hot research topic to realize accurate, efficient, and real-time identification of massive spectral data with the help of deep learning and IoT technology. Deep neural networks played a key role in spectral analysis. However, the inference of deeper models is performed in a static manner, and cannot be adjusted according to the device. Not all samples need to allocate all comput…
▽ More
Currently, it is a hot research topic to realize accurate, efficient, and real-time identification of massive spectral data with the help of deep learning and IoT technology. Deep neural networks played a key role in spectral analysis. However, the inference of deeper models is performed in a static manner, and cannot be adjusted according to the device. Not all samples need to allocate all computation to reach confident prediction, which hinders maximizing the overall performance. To address the above issues, we propose a Spectral data Classification framework with Adaptive Inference. Specifically, to allocate different computations for different samples while better exploiting the collaboration among different devices, we leverage Early-exit architecture, place intermediate classifiers at different depths of the architecture, and the model outputs the results when the prediction confidence reaches a preset threshold. We propose a training paradigm of self-distillation learning, the deepest classifier performs soft supervision on the shallow ones to maximize their performance and training speed. At the same time, to mitigate the vulnerability of performance to the location and number settings of intermediate classifiers in the Early-exit paradigm, we propose a Position-Adaptive residual network. It can adjust the number of layers in each block at different curve positions, so it can focus on important positions of the curve (e.g.: Raman peak), and accurately allocate the appropriate computational budget based on task performance and computing resources. To the best of our knowledge, this paper is the first attempt to conduct optimization by adaptive inference for spectral detection under the IoT platform. We conducted many experiments, the experimental results show that our proposed method can achieve higher performance with less computational budget than existing methods.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Bio-inspired Neural Network-based Optimal Path Planning for UUVs under the Effect of Ocean Currents
Authors:
Danjie Zhu,
Simon X. Yang
Abstract:
To eliminate the effect of ocean currents when addressing the optimal path in the underwater environment, an intelligent algorithm designed for the unmanned underwater vehicle (UUV) is proposed in this paper. The algorithm consists of two parts: a neural network-based algorithm that deducts the shortest path and avoids all possible collisions; and an adjusting component that balances off the devia…
▽ More
To eliminate the effect of ocean currents when addressing the optimal path in the underwater environment, an intelligent algorithm designed for the unmanned underwater vehicle (UUV) is proposed in this paper. The algorithm consists of two parts: a neural network-based algorithm that deducts the shortest path and avoids all possible collisions; and an adjusting component that balances off the deviation brought by the effect of ocean currents. The optimization results of the proposed algorithm are presented in detail, and compared with the path planning algorithm that does not consider the effect of currents. Results of the comparison prove the effectiveness of the path planning method when encountering currents of different directions and velocities.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
Bio-inspired Intelligence with Applications to Robotics: A Survey
Authors:
Junfei Li,
Zhe Xu,
Danjie Zhu,
Kevin Dong,
Tao Yan,
Zhu Zeng,
Simon X. Yang
Abstract:
In the past decades, considerable attention has been paid to bio-inspired intelligence and its applications to robotics. This paper provides a comprehensive survey of bio-inspired intelligence, with a focus on neurodynamics approaches, to various robotic applications, particularly to path planning and control of autonomous robotic systems. Firstly, the bio-inspired shunting model and its variants…
▽ More
In the past decades, considerable attention has been paid to bio-inspired intelligence and its applications to robotics. This paper provides a comprehensive survey of bio-inspired intelligence, with a focus on neurodynamics approaches, to various robotic applications, particularly to path planning and control of autonomous robotic systems. Firstly, the bio-inspired shunting model and its variants (additive model and gated dipole model) are introduced, and their main characteristics are given in detail. Then, two main neurodynamics applications to real-time path planning and control of various robotic systems are reviewed. A bio-inspired neural network framework, in which neurons are characterized by the neurodynamics models, is discussed for mobile robots, cleaning robots, and underwater robots. The bio-inspired neural network has been widely used in real-time collision-free navigation and cooperation without any learning procedures, global cost functions, and prior knowledge of the dynamic environment. In addition, bio-inspired backstep** controllers for various robotic systems, which are able to eliminate the speed jump when a large initial tracking error occurs, are further discussed. Finally, the current challenges and future research directions are discussed in this paper.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Formation Tracking for a Multi-Auv System Based on an Adaptive Sliding Mode Method in the Water Flow Environment
Authors:
Xin Li,
Daqi Zhu,
Bing Sun,
Qi Chen,
Wenyang Gan,
Zhigang Li
Abstract:
In this paper, formation tracking for a multi-AUV system (MAS) using an improved adaptive sliding mode control method is studied in the Three Dimensional (3-D) underwater environment. Firstly, the kinematics model and the dynamic model of the AUVs are given as the Six Dimensions of Freedom (6-DOF) considered. Then, control law based on the mathematical model of the AUVs is proposed based on the im…
▽ More
In this paper, formation tracking for a multi-AUV system (MAS) using an improved adaptive sliding mode control method is studied in the Three Dimensional (3-D) underwater environment. Firstly, the kinematics model and the dynamic model of the AUVs are given as the Six Dimensions of Freedom (6-DOF) considered. Then, control law based on the mathematical model of the AUVs is proposed based on the improved sliding mode method. A second order sliding mode control method is adopted to eliminate the chatting phenomenon of the controller. Thirdly, considering the water flow in the underwater working environment of the AUVs, an adaptive module is added to the controller. With the adaptive approach, the finite disturbances caused by water flow could be handled with the controller. The proposed method achieves stability by substituting an adaptive continuous term for the switching term in the controller. At last, a robust sliding mode controller with continuous model predictive control strategy for the multi-AUV system is developed to achieve leader-follower formation tracking under the presence of bounded flow disturbances, and simulations are implemented to confirm the effectiveness of the proposed method.
△ Less
Submitted 17 January, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results
Authors:
Eduardo Pérez-Pellitero,
Sibi Catley-Chandar,
Richard Shaw,
Aleš Leonardis,
Radu Timofte,
Zexin Zhang,
Cen Liu,
Yunbo Peng,
Yue Lin,
Gaocheng Yu,
** Zhang,
Zhe Ma,
Hongbin Wang,
Xiangyu Chen,
Xintao Wang,
Haiwei Wu,
Lin Liu,
Chao Dong,
Jiantao Zhou,
Qingsen Yan,
Song Zhang,
Weiye Chen,
Yuhang Liu,
Zhen Zhang,
Yanning Zhang
, et al. (68 additional authors not shown)
Abstract:
This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR)…
▽ More
This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR) observations, which might suffer from under- or over-exposed regions and different sources of noise. The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i.e. solutions can not exceed a given number of operations). In Track 2, participants are asked to minimize the complexity of their solutions while imposing a constraint on fidelity scores (i.e. solutions are required to obtain a higher fidelity score than the prescribed baseline). Both tracks use the same data and metrics: Fidelity is measured by means of PSNR with respect to a ground-truth HDR image (computed both directly and with a canonical tonemap** operation), while complexity metrics include the number of Multiply-Accumulate (MAC) operations and runtime (in seconds).
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Brain Cortical Functional Gradients Predict Cortical Folding Patterns via Attention Mesh Convolution
Authors:
Li Yang,
Zhibin He,
Changhe Li,
Junwei Han,
Dajiang Zhu,
Tianming Liu,
Tuo Zhang
Abstract:
Since gyri and sulci, two basic anatomical building blocks of cortical folding patterns, were suggested to bear different functional roles, a precise map** from brain function to gyro-sulcal patterns can provide profound insights into both biological and artificial neural networks. However, there lacks a generic theory and effective computational model so far, due to the highly nonlinear relatio…
▽ More
Since gyri and sulci, two basic anatomical building blocks of cortical folding patterns, were suggested to bear different functional roles, a precise map** from brain function to gyro-sulcal patterns can provide profound insights into both biological and artificial neural networks. However, there lacks a generic theory and effective computational model so far, due to the highly nonlinear relation between them, huge inter-individual variabilities and a sophisticated description of brain function regions/networks distribution as mosaics, such that spatial patterning of them has not been considered. we adopted brain functional gradients derived from resting-state fMRI to embed the "gradual" change of functional connectivity patterns, and developed a novel attention mesh convolution model to predict cortical gyro-sulcal segmentation maps on individual brains. The convolution on mesh considers the spatial organization of functional gradients and folding patterns on a cortical sheet and the newly designed channel attention block enhances the interpretability of the contribution of different functional gradients to cortical folding prediction. Experiments show that the prediction performance via our model outperforms other state-of-the-art models. In addition, we found that the dominant functional gradients contribute less to folding prediction. On the activation maps of the last layer, some well-studied cortical landmarks are found on the borders of, rather than within, the highly activated regions. These results and findings suggest that a specifically designed artificial neural network can improve the precision of the map** between brain functions and cortical folding patterns, and can provide valuable insight of brain anatomy-function relation for neuroscience.
△ Less
Submitted 21 May, 2022;
originally announced May 2022.
-
Discovering Dynamic Functional Brain Networks via Spatial and Channel-wise Attention
Authors:
Yiheng Liu,
Enjie Ge,
Mengshen He,
Zhengliang Liu,
Shijie Zhao,
Xintao Hu,
Dajiang Zhu,
Tianming Liu,
Bao Ge
Abstract:
Using deep learning models to recognize functional brain networks (FBNs) in functional magnetic resonance imaging (fMRI) has been attracting increasing interest recently. However, most existing work focuses on detecting static FBNs from entire fMRI signals, such as correlation-based functional connectivity. Sliding-window is a widely used strategy to capture the dynamics of FBNs, but it is still l…
▽ More
Using deep learning models to recognize functional brain networks (FBNs) in functional magnetic resonance imaging (fMRI) has been attracting increasing interest recently. However, most existing work focuses on detecting static FBNs from entire fMRI signals, such as correlation-based functional connectivity. Sliding-window is a widely used strategy to capture the dynamics of FBNs, but it is still limited in representing intrinsic functional interactive dynamics at each time step. And the number of FBNs usually need to be set manually. More over, due to the complexity of dynamic interactions in brain, traditional linear and shallow models are insufficient in identifying complex and spatially overlapped FBNs across each time step. In this paper, we propose a novel Spatial and Channel-wise Attention Autoencoder (SCAAE) for discovering FBNs dynamically. The core idea of SCAAE is to apply attention mechanism to FBNs construction. Specifically, we designed two attention modules: 1) spatial-wise attention (SA) module to discover FBNs in the spatial domain and 2) a channel-wise attention (CA) module to weigh the channels for selecting the FBNs automatically. We evaluated our approach on ADHD200 dataset and our results indicate that the proposed SCAAE method can effectively recover the dynamic changes of the FBNs at each fMRI time step, without using sliding windows. More importantly, our proposed hybrid attention modules (SA and CA) do not enforce assumptions of linearity and independence as previous methods, and thus provide a novel approach to better understanding dynamic functional brain networks.
△ Less
Submitted 31 May, 2022; v1 submitted 19 May, 2022;
originally announced May 2022.
-
Stochastic Gradient-based Fast Distributed Multi-Energy Management for an Industrial Park with Temporally-Coupled Constraints
Authors:
Dafeng Zhu,
Bo Yang,
Chengbin Ma,
Zhaojian Wang,
Shanying Zhu,
Kai Ma,
** Guan
Abstract:
Contemporary industrial parks are challenged by the growing concerns about high cost and low efficiency of energy supply. Moreover, in the case of uncertain supply/demand, how to mobilize delay-tolerant elastic loads and compensate real-time inelastic loads to match multi-energy generation/storage and minimize energy cost is a key issue. Since energy management is hardly to be implemented offline…
▽ More
Contemporary industrial parks are challenged by the growing concerns about high cost and low efficiency of energy supply. Moreover, in the case of uncertain supply/demand, how to mobilize delay-tolerant elastic loads and compensate real-time inelastic loads to match multi-energy generation/storage and minimize energy cost is a key issue. Since energy management is hardly to be implemented offline without knowing statistical information of random variables, this paper presents a systematic online energy cost minimization framework to fulfill the complementary utilization of multi-energy with time-varying generation, demand and price. Specifically to achieve charging/discharging constraints due to storage and short-term energy balancing, a fast distributed algorithm based on stochastic gradient with two-timescale implementation is proposed to ensure online implementation. To reduce the peak loads, an incentive mechanism is implemented by estimating users' willingness to shift. Analytical results on parameter setting are also given to guarantee feasibility and optimality of the proposed design. Numerical results show that when the bid-ask spread of electricity is small enough, the proposed algorithm can achieve the close-to-optimal cost asymptotically.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
Design and experimental investigation of a vibro-impact self-propelled capsule robot with orientation control
Authors:
Jiajia Zhang,
Jiyuan Tian,
Dibin Zhu,
Yang Liu,
Shyam Prasad
Abstract:
This paper presents a novel design and experimental investigation for a self-propelled capsule robot that can be used for painless colonoscopy during a retrograde progression from the patient's rectum. The steerable robot is driven forward and backward via its internal vibration and impact with orientation control by using an electromagnetic actuator. The actuator contains four sets of coils and a…
▽ More
This paper presents a novel design and experimental investigation for a self-propelled capsule robot that can be used for painless colonoscopy during a retrograde progression from the patient's rectum. The steerable robot is driven forward and backward via its internal vibration and impact with orientation control by using an electromagnetic actuator. The actuator contains four sets of coils and a shaft made by permanent magnet. The shaft can be excited linearly in a controllable and tilted angle, so guide the progression orientation of the robot. Two control strategies are studied in this work and compared via simulation and experiment. Extensive results are presented to demonstrate the progression efficiency of the robot and its potential for robotic colonoscopy.
△ Less
Submitted 1 March, 2022; v1 submitted 23 February, 2022;
originally announced February 2022.
-
Energy Management Based on Multi-Agent Deep Reinforcement Learning for A Multi-Energy Industrial Park
Authors:
Dafeng Zhu,
Bo Yang,
Yuxiang Liu,
Zhaojian Wang,
Kai Ma,
** Guan
Abstract:
Owing to large industrial energy consumption, industrial production has brought a huge burden to the grid in terms of renewable energy access and power supply. Due to the coupling of multiple energy sources and the uncertainty of renewable energy and demand, centralized methods require large calculation and coordination overhead. Thus, this paper proposes a multi-energy management framework achiev…
▽ More
Owing to large industrial energy consumption, industrial production has brought a huge burden to the grid in terms of renewable energy access and power supply. Due to the coupling of multiple energy sources and the uncertainty of renewable energy and demand, centralized methods require large calculation and coordination overhead. Thus, this paper proposes a multi-energy management framework achieved by decentralized execution and centralized training for an industrial park. The energy management problem is formulated as a partially-observable Markov decision process, which is intractable by dynamic programming due to the lack of the prior knowledge of the underlying stochastic process. The objective is to minimize long-term energy costs while ensuring the demand of users. To solve this issue and improve the calculation speed, a novel multi-agent deep reinforcement learning algorithm is proposed, which contains the following key points: counterfactual baseline for facilitating contributing agents to learn better policies, soft actor-critic for improving robustness and exploring optimal solutions. A novel reward is designed by Lagrange multiplier method to ensure the capacity constraints of energy storage. In addition, considering that the increase in the number of agents leads to performance degradation due to large observation spaces, an attention mechanism is introduced to enhance the stability of policy and enable agents to focus on important energy-related information, which improves the exploration efficiency of soft actor-critic. Numerical results based on actual data verify the performance of the proposed algorithm with high scalability, indicating that the industrial park can minimize energy costs under different demands.
△ Less
Submitted 11 February, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Data Driven based Dynamic Correction Prediction Model for NOx Emission of Coal Fired Boiler
Authors:
Zhenhao Tang,
Deyu Zhu,
Yang Li
Abstract:
The real-time prediction of NOx emissions is of great significance for pollutant emission control and unit operation of coal-fired power plants. Aiming at dealing with the large time delay and strong nonlinear characteristics of the combustion process, a dynamic correction prediction model considering the time delay is proposed. First, the maximum information coefficient (MIC) is used to calculate…
▽ More
The real-time prediction of NOx emissions is of great significance for pollutant emission control and unit operation of coal-fired power plants. Aiming at dealing with the large time delay and strong nonlinear characteristics of the combustion process, a dynamic correction prediction model considering the time delay is proposed. First, the maximum information coefficient (MIC) is used to calculate the delay time between related parameters and NOx emissions, and the modeling data set is reconstructed; then, an adaptive feature selection algorithm based on Lasso and ReliefF is constructed to filter out the high correlation with NOx emissions. Parameters; Finally, an extreme learning machine (ELM) model combined with error correction was established to achieve the purpose of dynamically predicting the concentration of nitrogen oxides. Experimental results based on actual data show that the same variable has different delay times under load conditions such as rising, falling, and steady; and there are differences in model characteristic variables under different load conditions; dynamic error correction strategies effectively improve modeling accuracy; proposed The prediction error of the algorithm under different working conditions is less than 2%, which can accurately predict the NOx concentration at the combustion outlet, and provide guidance for NOx emission monitoring and combustion process optimization.
△ Less
Submitted 29 October, 2021;
originally announced October 2021.
-
Fast Distributed Stochastic Scheduling for A Multi-Energy Industrial Park
Authors:
Dafeng Zhu,
Bo Yang,
Zhaojian Wang,
Chengbin Ma,
Kai Ma,
Shanying Zhu
Abstract:
The multi-energy management framework of industrial parks advocates energy conversion and scheduling, which takes full advantage of the compensation and temporal availability of multiple energy. However, how to exploit elastic loads and compensate inelastic loads to match multiple generators and storage is still a key problem under the uncertainty of demand and supply. To solve the issue, the ener…
▽ More
The multi-energy management framework of industrial parks advocates energy conversion and scheduling, which takes full advantage of the compensation and temporal availability of multiple energy. However, how to exploit elastic loads and compensate inelastic loads to match multiple generators and storage is still a key problem under the uncertainty of demand and supply. To solve the issue, the energy management problem is constructed as a stochastic optimization problem. The optimization aims are to minimize the time-averaged energy cost and improve the energy efficiency while respecting the energy constraints. To achieve the distributed implementation in real time without knowing any priori knowledge of underlying stochastic process, a distributed stochastic gradient algorithm based on dual decomposition and a fast scheme are proposed. The numerical results based on real data show that the industrial park, by adopting the proposed algorithm, can achieve social welfare maximization asymptotically.
△ Less
Submitted 24 May, 2022; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Single-stream CNN with Learnable Architecture for Multi-source Remote Sensing Data
Authors:
Yi Yang,
Daoye Zhu,
Tengteng Qu,
Qiangyu Wang,
Fuhu Ren,
Chengqi Cheng
Abstract:
In this paper, we propose an efficient and generalizable framework based on deep convolutional neural network (CNN) for multi-source remote sensing data joint classification. While recent methods are mostly based on multi-stream architectures, we use group convolution to construct equivalent network architectures efficiently within a single-stream network. We further adopt and improve dynamic grou…
▽ More
In this paper, we propose an efficient and generalizable framework based on deep convolutional neural network (CNN) for multi-source remote sensing data joint classification. While recent methods are mostly based on multi-stream architectures, we use group convolution to construct equivalent network architectures efficiently within a single-stream network. We further adopt and improve dynamic grou** convolution (DGConv) to make group convolution hyperparameters, and thus the overall network architecture, learnable during network training. The proposed method therefore can theoretically adjust any modern CNN models to any multi-source remote sensing data set, and can potentially avoid sub-optimal solutions caused by manually decided architecture hyperparameters. In the experiments, the proposed method is applied to ResNet and UNet, and the adjusted networks are verified on three very diverse benchmark data sets (i.e., Houston2018 data, Berlin data, and MUUFL data). Experimental results demonstrate the effectiveness of the proposed single-stream CNNs, and in particular ResNet18-DGConv improves the state-of-the-art classification overall accuracy (OA) on HS-SAR Berlin data set from $62.23\%$ to $68.21\%$. In the experiments we have two interesting findings. First, using DGConv generally reduces test OA variance. Second, multi-stream is harmful to model performance if imposed to the first few layers, but becomes beneficial if applied to deeper layers. Altogether, the findings imply that multi-stream architecture, instead of being a strictly necessary component in deep learning models for multi-source remote sensing data, essentially plays the role of model regularizer. Our code is publicly available at https://github.com/yyyyangyi/Multi-source-RS-DGConv. We hope our work can inspire novel research in the future.
△ Less
Submitted 6 February, 2022; v1 submitted 13 September, 2021;
originally announced September 2021.
-
A Large-Scale Dataset for Benchmarking Elevator Button Segmentation and Character Recognition
Authors:
Jianbang Liu,
Yuqi Fang,
Delong Zhu,
Nachuan Ma,
** Pan,
Max Q. -H. Meng
Abstract:
Human activities are hugely restricted by COVID-19, recently. Robots that can conduct inter-floor navigation attract much public attention, since they can substitute human workers to conduct the service work. However, current robots either depend on human assistance or elevator retrofitting, and fully autonomous inter-floor navigation is still not available. As the very first step of inter-floor n…
▽ More
Human activities are hugely restricted by COVID-19, recently. Robots that can conduct inter-floor navigation attract much public attention, since they can substitute human workers to conduct the service work. However, current robots either depend on human assistance or elevator retrofitting, and fully autonomous inter-floor navigation is still not available. As the very first step of inter-floor navigation, elevator button segmentation and recognition hold an important position. Therefore, we release the first large-scale publicly available elevator panel dataset in this work, containing 3,718 panel images with 35,100 button labels, to facilitate more powerful algorithms on autonomous elevator operation. Together with the dataset, a number of deep learning based implementations for button segmentation and recognition are also released to benchmark future methods in the community. The dataset will be available at \url{https://github.com/zhudelong/elevator_button_recognition
△ Less
Submitted 22 March, 2021; v1 submitted 16 March, 2021;
originally announced March 2021.
-
Sensing population distribution from satellite imagery via deep learning: model selection, neighboring effect, and systematic biases
Authors:
Xiao Huang,
Di Zhu,
Fan Zhang,
Tao Liu,
Xiao Li,
Lei Zou
Abstract:
The rapid development of remote sensing techniques provides rich, large-coverage, and high-temporal information of the ground, which can be coupled with the emerging deep learning approaches that enable latent features and hidden geographical patterns to be extracted. This study marks the first attempt to cross-compare performances of popular state-of-the-art deep learning models in estimating pop…
▽ More
The rapid development of remote sensing techniques provides rich, large-coverage, and high-temporal information of the ground, which can be coupled with the emerging deep learning approaches that enable latent features and hidden geographical patterns to be extracted. This study marks the first attempt to cross-compare performances of popular state-of-the-art deep learning models in estimating population distribution from remote sensing images, investigate the contribution of neighboring effect, and explore the potential systematic population estimation biases. We conduct an end-to-end training of four popular deep learning architectures, i.e., VGG, ResNet, Xception, and DenseNet, by establishing a map** between Sentinel-2 image patches and their corresponding population count from the LandScan population grid. The results reveal that DenseNet outperforms the other three models, while VGG has the worst performances in all evaluating metrics under all selected neighboring scenarios. As for the neighboring effect, contradicting existing studies, our results suggest that the increase of neighboring sizes leads to reduced population estimation performance, which is found universal for all four selected models in all evaluating metrics. In addition, there exists a notable, universal bias that all selected deep learning models tend to overestimate sparsely populated image patches and underestimate densely populated image patches, regardless of neighboring sizes. The methodological, experimental, and contextual knowledge this study provides is expected to benefit a wide range of future studies that estimate population distribution via remote sensing imagery.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Security Assessment and Impact Analysis of Cyberattacks in Integrated T&D Power Systems
Authors:
Ioannis Zografopoulos,
Charalambos Konstantinou,
Nektarios Georgios Tsoutsos,
Dan Zhu,
Robert Broadwater
Abstract:
In this paper, we examine the impact of cyberattacks in an integrated transmission and distribution (T&D) power grid model with distributed energy resource (DER) integration. We adopt the OCTAVE Allegro methodology to identify critical system assets, enumerate potential threats, analyze, and prioritize risks for threat scenarios. Based on the analysis, attack strategies and exploitation scenarios…
▽ More
In this paper, we examine the impact of cyberattacks in an integrated transmission and distribution (T&D) power grid model with distributed energy resource (DER) integration. We adopt the OCTAVE Allegro methodology to identify critical system assets, enumerate potential threats, analyze, and prioritize risks for threat scenarios. Based on the analysis, attack strategies and exploitation scenarios are identified which could lead to system compromise. Specifically, we investigate the impact of data integrity attacks in inverted-based solar PV controllers, control signal blocking attacks in protective switches and breakers, and coordinated monitoring and switching time-delay attacks.
△ Less
Submitted 11 April, 2021; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Fast Non-line-of-sight Imaging with Two-step Deep Remap**
Authors:
Dayu Zhu,
Wenshan Cai
Abstract:
Conventional imaging only records photons directly sent from the object to the detector, while non-line-of-sight (NLOS) imaging takes the indirect light into account. Most NLOS solutions employ a transient scanning process, followed by a physical based algorithm to reconstruct the NLOS scenes. However, the transient detection requires sophisticated apparatus, with long scanning time and low robust…
▽ More
Conventional imaging only records photons directly sent from the object to the detector, while non-line-of-sight (NLOS) imaging takes the indirect light into account. Most NLOS solutions employ a transient scanning process, followed by a physical based algorithm to reconstruct the NLOS scenes. However, the transient detection requires sophisticated apparatus, with long scanning time and low robustness to ambient environment, and the reconstruction algorithms are typically time-consuming and computationally expensive. Here we propose a new NLOS solution to address the above defects, with innovations on both equipment and algorithm. We apply inexpensive commercial Lidar for detection, with much higher scanning speed and better compatibility to real-world imaging. Our reconstruction framework is deep learning based, with a generative two-step remap** strategy to guarantee high reconstruction fidelity. The overall detection and reconstruction process allows for millisecond responses, with reconstruction precision of millimeter level. We have experimentally tested the proposed solution on both synthetic and real objects, and further demonstrated our method to be applicable to full-color NLOS imaging.
△ Less
Submitted 25 March, 2021; v1 submitted 25 January, 2021;
originally announced January 2021.
-
Multi-Grid Back-Projection Networks
Authors:
Pablo Navarrete Michelini,
Wenbin Chen,
Hanwen Liu,
Dan Zhu,
Xingqun Jiang
Abstract:
Multi-Grid Back-Projection (MGBP) is a fully-convolutional network architecture that can learn to restore images and videos with upscaling artifacts. Using the same strategy of multi-grid partial differential equation (PDE) solvers this multiscale architecture scales computational complexity efficiently with increasing output resolutions. The basic processing block is inspired in the iterative bac…
▽ More
Multi-Grid Back-Projection (MGBP) is a fully-convolutional network architecture that can learn to restore images and videos with upscaling artifacts. Using the same strategy of multi-grid partial differential equation (PDE) solvers this multiscale architecture scales computational complexity efficiently with increasing output resolutions. The basic processing block is inspired in the iterative back-projection (IBP) algorithm and constitutes a type of cross-scale residual block with feedback from low resolution references. The architecture performs in par with state-of-the-arts alternatives for regression targets that aim to recover an exact copy of a high resolution image or video from which only a downscale image is known. A perceptual quality target aims to create more realistic outputs by introducing artificial changes that can be different from a high resolution original content as long as they are consistent with the low resolution input. For this target we propose a strategy using noise inputs in different resolution scales to control the amount of artificial details generated in the output. The noise input controls the amount of innovation that the network uses to create artificial realistic details. The effectiveness of this strategy is shown in benchmarks and it is explained as a particular strategy to traverse the perception-distortion plane.
△ Less
Submitted 31 December, 2020;
originally announced January 2021.
-
Robust Retinal Vessel Segmentation from a Data Augmentation Perspective
Authors:
Xu Sun,
Huihui Fang,
Yehui Yang,
Dongwei Zhu,
Lei Wang,
Junwei Liu,
Yanwu Xu
Abstract:
Retinal vessel segmentation is a fundamental step in screening, diagnosis, and treatment of various cardiovascular and ophthalmic diseases. Robustness is one of the most critical requirements for practical utilization, since the test images may be captured using different fundus cameras, or be affected by various pathological changes. We investigate this problem from a data augmentation perspectiv…
▽ More
Retinal vessel segmentation is a fundamental step in screening, diagnosis, and treatment of various cardiovascular and ophthalmic diseases. Robustness is one of the most critical requirements for practical utilization, since the test images may be captured using different fundus cameras, or be affected by various pathological changes. We investigate this problem from a data augmentation perspective, with the merits of no additional training data or inference time. In this paper, we propose two new data augmentation modules, namely, channel-wise random Gamma correction and channel-wise random vessel augmentation. Given a training color fundus image, the former applies random gamma correction on each color channel of the entire image, while the latter intentionally enhances or decreases only the fine-grained blood vessel regions using morphological transformations. With the additional training samples generated by applying these two modules sequentially, a model could learn more invariant and discriminating features against both global and local disturbances. Experimental results on both real-world and synthetic datasets demonstrate that our method can improve the performance and robustness of a classic convolutional neural network architecture. The source code is available at \url{https://github.com/PaddlePaddle/Research/tree/master/CV/robust_vessel_segmentation}.
△ Less
Submitted 28 September, 2021; v1 submitted 31 July, 2020;
originally announced July 2020.
-
Defending against adversarial attacks on medical imaging AI system, classification or detection?
Authors:
Xin Li,
Deng Pan,
Dongxiao Zhu
Abstract:
Medical imaging AI systems such as disease classification and segmentation are increasingly inspired and transformed from computer vision based AI systems. Although an array of adversarial training and/or loss function based defense techniques have been developed and proved to be effective in computer vision, defending against adversarial attacks on medical images remains largely an uncharted terr…
▽ More
Medical imaging AI systems such as disease classification and segmentation are increasingly inspired and transformed from computer vision based AI systems. Although an array of adversarial training and/or loss function based defense techniques have been developed and proved to be effective in computer vision, defending against adversarial attacks on medical images remains largely an uncharted territory due to the following unique challenges: 1) label scarcity in medical images significantly limits adversarial generalizability of the AI system; 2) vastly similar and dominant fore- and background in medical images make it hard samples for learning the discriminating features between different disease classes; and 3) crafted adversarial noises added to the entire medical image as opposed to the focused organ target can make clean and adversarial examples more discriminate than that between different disease classes. In this paper, we propose a novel robust medical imaging AI framework based on Semi-Supervised Adversarial Training (SSAT) and Unsupervised Adversarial Detection (UAD), followed by designing a new measure for assessing systems adversarial risk. We systematically demonstrate the advantages of our robust medical imaging AI system over the existing adversarial defense techniques under diverse real-world settings of adversarial attacks using a benchmark OCT imaging data set.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Interpretable Multimodal Learning for Intelligent Regulation in Online Payment Systems
Authors:
Shuoyao Wang,
Diwei Zhu
Abstract:
With the explosive growth of transaction activities in online payment systems, effective and realtime regulation becomes a critical problem for payment service providers. Thanks to the rapid development of artificial intelligence (AI), AI-enable regulation emerges as a promising solution. One main challenge of the AI-enabled regulation is how to utilize multimedia information, i.e., multimodal sig…
▽ More
With the explosive growth of transaction activities in online payment systems, effective and realtime regulation becomes a critical problem for payment service providers. Thanks to the rapid development of artificial intelligence (AI), AI-enable regulation emerges as a promising solution. One main challenge of the AI-enabled regulation is how to utilize multimedia information, i.e., multimodal signals, in Financial Technology (FinTech). Inspired by the attention mechanism in nature language processing, we propose a novel cross-modal and intra-modal attention network (CIAN) to investigate the relation between the text and transaction. More specifically, we integrate the text and transaction information to enhance the text-trade jointembedding learning, which clusters positive pairs and push negative pairs away from each other. Another challenge of intelligent regulation is the interpretability of complicated machine learning models. To sustain the requirements of financial regulation, we design a CIAN-Explainer to interpret how the attention mechanism interacts the original features, which is formulated as a low-rank matrix approximation problem. With the real datasets from the largest online payment system, WeChat Pay of Tencent, we conduct experiments to validate the practical application value of CIAN, where our method outperforms the state-of-the-art methods.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Energy Trading in Microgrids for Synergies among Electricity, Hydrogen and Heat Networks
Authors:
Dafeng Zhu,
Bo Yang,
Qi Liu,
Kai Ma,
Shanying Zhu,
** Guan
Abstract:
The emerging paradigm of interconnected microgrids advocates energy trading or sharing among multiple microgrids. It helps make full use of the temporal availability of energy and diversity in operational costs when meeting various energy loads. However, energy trading might not completely absorb excess renewable energy. A multi-energy management framework including fuel cell vehicles, energy stor…
▽ More
The emerging paradigm of interconnected microgrids advocates energy trading or sharing among multiple microgrids. It helps make full use of the temporal availability of energy and diversity in operational costs when meeting various energy loads. However, energy trading might not completely absorb excess renewable energy. A multi-energy management framework including fuel cell vehicles, energy storage, combined heat and power system, and renewable energy is proposed, and the characteristics and scheduling arrangements of fuel cell vehicles are considered to further improve the local absorption of the renewable energy and enhance the economic benefits of microgrids. While intensive research has been conducted on energy scheduling and trading problem, a fundamental question still remains unanswered on microgrid economics. Namely, due to multi-energy coupling, stochastic renewable energy generation and demands, when and how a microgrid should schedule and trade energy with others, which maximizes its long-term benefit. This paper designs a joint energy scheduling and trading algorithm based on Lyapunov optimization and a double-auction mechanism. Its purpose is to determine the valuations of energy in the auction, optimally schedule energy distribution, and strategically purchase and sell energy with the current electricity prices. Simulations based on real data show that each individual microgrid, under the management of the proposed algorithm, can achieve a time-averaged profit that is arbitrarily close to an optimum value, while avoiding compromising its own comfort.
△ Less
Submitted 11 June, 2020; v1 submitted 1 May, 2020;
originally announced May 2020.
-
COVID-MobileXpert: On-Device COVID-19 Patient Triage and Follow-up using Chest X-rays
Authors:
Xin Li,
Chengyin Li,
Dongxiao Zhu
Abstract:
During the COVID-19 pandemic, there has been an emerging need for rapid, dedicated, and point-of-care COVID-19 patient disposition techniques to optimize resource utilization and clinical workflow. In view of this need, we present COVID-MobileXpert: a lightweight deep neural network (DNN) based mobile app that can use chest X-ray (CXR) for COVID-19 case screening and radiological trajectory predic…
▽ More
During the COVID-19 pandemic, there has been an emerging need for rapid, dedicated, and point-of-care COVID-19 patient disposition techniques to optimize resource utilization and clinical workflow. In view of this need, we present COVID-MobileXpert: a lightweight deep neural network (DNN) based mobile app that can use chest X-ray (CXR) for COVID-19 case screening and radiological trajectory prediction. We design and implement a novel three-player knowledge transfer and distillation (KTD) framework including a pre-trained attending physician (AP) network that extracts CXR imaging features from a large scale of lung disease CXR images, a fine-tuned resident fellow (RF) network that learns the essential CXR imaging features to discriminate COVID-19 from pneumonia and/or normal cases with a small amount of COVID-19 cases, and a trained lightweight medical student (MS) network to perform on-device COVID-19 patient triage and follow-up. To tackle the challenge of vastly similar and dominant fore- and background in medical images, we employ novel loss functions and training schemes for the MS network to learn the robust features. We demonstrate the significant potential of COVID-MobileXpert for rapid deployment via extensive experiments with diverse MS architecture and tuning parameter settings. The source codes for cloud and mobile based models are available from the following url: https://github.com/xinli0928/COVID-Xray.
△ Less
Submitted 7 September, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
Autonomous Removal of Perspective Distortion for Robotic Elevator Button Recognition
Authors:
Delong Zhu,
Jianbang Liu,
Nachuan Ma,
Zhe Min,
Max Q. -H. Meng
Abstract:
Elevator button recognition is considered an indispensable function for enabling the autonomous elevator operation of mobile robots. However, due to unfavorable image conditions and various image distortions, the recognition accuracy remains to be improved. In this paper, we present a novel algorithm that can autonomously correct perspective distortions of elevator panel images. The algorithm firs…
▽ More
Elevator button recognition is considered an indispensable function for enabling the autonomous elevator operation of mobile robots. However, due to unfavorable image conditions and various image distortions, the recognition accuracy remains to be improved. In this paper, we present a novel algorithm that can autonomously correct perspective distortions of elevator panel images. The algorithm first leverages the Gaussian Mixture Model (GMM) to conduct a grid fitting process based on button recognition results, then utilizes the estimated grid centers as reference features to estimate camera motions for correcting perspective distortions. The algorithm performs on a single image autonomously and does not need explicit feature detection or feature matching procedure, which is much more robust to noises and outliers than traditional feature-based geometric approaches. To verify the effectiveness of the algorithm, we collect an elevator panel dataset of 50 images captured from different angles of view. Experimental results show that the proposed algorithm can accurately estimate camera motions and effectively remove perspective distortions.
△ Less
Submitted 25 December, 2019;
originally announced December 2019.
-
Toward Better Understanding of Saliency Prediction in Augmented 360 Degree Videos
Authors:
Yucheng Zhu,
Xiongkuo Min,
DanDan Zhu,
Ke Gu,
Jiantao Zhou,
Guangtao Zhai,
Xiaokang Yang,
Wenjun Zhang
Abstract:
Augmented reality (AR) overlays digital content onto the reality. In AR system, correct and precise estimations of user's visual fixations and head movements can enhance the quality of experience by allocating more computation resources on the areas of interest. However, there is inadequate research about understanding the visual exploration of users when using an AR system or modeling AR visual a…
▽ More
Augmented reality (AR) overlays digital content onto the reality. In AR system, correct and precise estimations of user's visual fixations and head movements can enhance the quality of experience by allocating more computation resources on the areas of interest. However, there is inadequate research about understanding the visual exploration of users when using an AR system or modeling AR visual attention. To bridge the gap between the saliency prediction on real-world scene and on scene augmented by virtual information, we construct the ARVR saliency dataset with 12 diverse videos viewed by 20 people. The virtual reality (VR) technique is employed to simulate the real-world. Annotations of object recognition and tracking as augmented contents are blended into the omnidirectional videos. The saliency annotations of head and eye movements for both original and augmented videos are collected and together constitute the ARVR dataset. We also design a model which is capable of solving the saliency prediction problem in AR. Local block images are extracted to simulate the viewport and offset the projection distortion. Conspicuous visual cues in local viewports are extracted to constitute the spatial features. The optical flow information is estimated as the important temporal feature. We also consider the interplay between virtual information and reality. The composition of the augmentation information is distinguished, and the joint effects of adversarial augmentation and complementary augmentation are estimated. We generate a graph by taking each block image as one node. Both the visual saliency mechanism and the characteristics of viewing behaviors are considered in the computation of edge weights on the graph which are interpreted as Markov chains. The fraction of the visual attention that is diverted to each block image is estimated through equilibrium distribution on of this chain.
△ Less
Submitted 20 July, 2020; v1 submitted 12 December, 2019;
originally announced December 2019.
-
MGBPv2: Scaling Up Multi-Grid Back-Projection Networks
Authors:
Pablo Navarrete Michelini,
Wenbin Chen,
Hanwen Liu,
Dan Zhu
Abstract:
Here, we describe our solution for the AIM-2019 Extreme Super-Resolution Challenge, where we won the 1st place in terms of perceptual quality (MOS) similar to the ground truth and achieved the 5th place in terms of high-fidelity (PSNR). To tackle this challenge, we introduce the second generation of MultiGrid BackProjection networks (MGBPv2) whose major modifications make the system scalable and m…
▽ More
Here, we describe our solution for the AIM-2019 Extreme Super-Resolution Challenge, where we won the 1st place in terms of perceptual quality (MOS) similar to the ground truth and achieved the 5th place in terms of high-fidelity (PSNR). To tackle this challenge, we introduce the second generation of MultiGrid BackProjection networks (MGBPv2) whose major modifications make the system scalable and more general than its predecessor. It combines the scalability of the multigrid algorithm and the performance of iterative backprojections. In its original form, MGBP is limited to a small number of parameters due to a strongly recursive structure. In MGBPv2, we make full use of the multigrid recursion from the beginning of the network; we allow different parameters in every module of the network; we simplify the main modules; and finally, we allow adjustments of the number of network features based on the scale of operation. For inference tasks, we introduce an overlap** patch approach to further allow processing of very large images (e.g. 8K). Our training strategies make use of a multiscale loss, combining distortion and/or perception losses on the output as well as downscaled output images. The final system can balance between high quality and high performance.
△ Less
Submitted 27 September, 2019;
originally announced September 2019.
-
Interpreting Age Effects of Human Fetal Brain from Spontaneous fMRI using Deep 3D Convolutional Neural Networks
Authors:
Xiangrui Li,
Jasmine Hect,
Moriah Thomason,
Dongxiao Zhu
Abstract:
Understanding human fetal neurodevelopment is of great clinical importance as abnormal development is linked to adverse neuropsychiatric outcomes after birth. Recent advances in functional Magnetic Resonance Imaging (fMRI) have provided new insight into development of the human brain before birth, but these studies have predominately focused on brain functional connectivity (i.e. Fisher z-score),…
▽ More
Understanding human fetal neurodevelopment is of great clinical importance as abnormal development is linked to adverse neuropsychiatric outcomes after birth. Recent advances in functional Magnetic Resonance Imaging (fMRI) have provided new insight into development of the human brain before birth, but these studies have predominately focused on brain functional connectivity (i.e. Fisher z-score), which requires manual processing steps for feature extraction from fMRI images. Deep learning approaches (i.e., Convolutional Neural Networks) have achieved remarkable success on learning directly from image data, yet have not been applied on fetal fMRI for understanding fetal neurodevelopment. Here, we bridge this gap by applying a novel application of deep 3D CNN to fetal blood oxygen-level dependence (BOLD) resting-state fMRI data. Specifically, we test a supervised CNN framework as a data-driven approach to isolate variation in fMRI signals that relate to younger v.s. older fetal age groups. Based on the learned CNN, we further perform sensitivity analysis to identify brain regions in which changes in BOLD signal are strongly associated with fetal brain age. The findings demonstrate that deep CNNs are a promising approach for identifying spontaneous functional patterns in fetal brain activity that discriminate age groups. Further, we discovered that regions that most strongly differentiate groups are largely bilateral, share similar distribution in older and younger age groups, and are areas of heightened metabolic activity in early human development.
△ Less
Submitted 9 June, 2019;
originally announced June 2019.