Search | arXiv e-print repository

QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks. △ Less

Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

Comments: initial technical report

arXiv:2405.12569 [pdf, other]

TypeII-CsiNet: CSI Feedback with TypeII Codebook

Authors: Yiliang Sang, Ke Ma, Yang Ming, ** Lian, Zhaocheng Wang

Abstract: The latest TypeII codebook selects partial strongest angular-delay ports for the feedback of downlink channel state information (CSI), whereas its performance is limited due to the deficiency of utilizing the correlations among the port coefficients. To tackle this issue, we propose a tailored autoencoder named TypeII-CsiNet to effectively integrate the TypeII codebook with deep learning, wherein… ▽ More The latest TypeII codebook selects partial strongest angular-delay ports for the feedback of downlink channel state information (CSI), whereas its performance is limited due to the deficiency of utilizing the correlations among the port coefficients. To tackle this issue, we propose a tailored autoencoder named TypeII-CsiNet to effectively integrate the TypeII codebook with deep learning, wherein three novel designs are developed for sufficiently boosting the sum rate performance. Firstly, a dedicated pre-processing module is designed to sort the selected ports for reserving the correlations of their corresponding coefficients. Secondly, a position-filling layer is developed in the decoder to fill the feedback coefficients into their ports in the recovered CSI matrix, so that the corresponding angular-delay-domain structure is adequately leveraged to enhance the reconstruction accuracy. Thirdly, a two-stage loss function is proposed to improve the sum rate performance while avoiding the trap** in local optimums during model training. Simulation results verify that our proposed TypeII-CsiNet outperforms the TypeII codebook and existing deep learning benchmarks. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2404.01672 [pdf, other]

The Meta Distribution of the SIR in Joint Communication and Sensing Networks

Authors: Kun Ma, Chenyuan Feng, Giovanni Geraci, Howard H. Yang

Abstract: In this paper, we introduce a novel mathematical framework for assessing the performance of joint communication and sensing (JCAS) in wireless networks, employing stochastic geometry as an analytical tool. We focus on deriving the meta distribution of the signal-to-interference ratio (SIR) for JCAS networks. This approach enables a fine-grained quantification of individual user or radar performanc… ▽ More In this paper, we introduce a novel mathematical framework for assessing the performance of joint communication and sensing (JCAS) in wireless networks, employing stochastic geometry as an analytical tool. We focus on deriving the meta distribution of the signal-to-interference ratio (SIR) for JCAS networks. This approach enables a fine-grained quantification of individual user or radar performance intrinsic to these networks. Our work involves the modeling of JCAS networks and the derivation of mathematical expressions for the JCAS SIR meta distribution. Through simulations, we validate both our theoretical analysis and illustrate how the JCAS SIR meta distribution varies with the network deployment density. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.00252 [pdf, other]

Learned Scanpaths Aid Blind Panoramic Video Quality Assessment

Authors: Kanglong Fan, Wen Wen, Mu Li, Yifan Peng, Kede Ma

Abstract: Panoramic videos have the advantage of providing an immersive and interactive viewing experience. Nevertheless, their spherical nature gives rise to various and uncertain user viewing behaviors, which poses significant challenges for panoramic video quality assessment (PVQA). In this work, we propose an end-to-end optimized, blind PVQA method with explicit modeling of user viewing patterns through… ▽ More Panoramic videos have the advantage of providing an immersive and interactive viewing experience. Nevertheless, their spherical nature gives rise to various and uncertain user viewing behaviors, which poses significant challenges for panoramic video quality assessment (PVQA). In this work, we propose an end-to-end optimized, blind PVQA method with explicit modeling of user viewing patterns through visual scanpaths. Our method consists of two modules: a scanpath generator and a quality assessor. The scanpath generator is initially trained to predict future scanpaths by minimizing their expected code length and then jointly optimized with the quality assessor for quality prediction. Our blind PVQA method enables direct quality assessment of panoramic images by treating them as videos composed of identical frames. Experiments on three public panoramic image and video quality datasets, encompassing both synthetic and authentic distortions, validate the superiority of our blind PVQA model over existing methods. △ Less

Submitted 15 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.12369 [pdf, other]

Block-Dominant Compressed Sensing for Near-Field Communications: Fundamentals, Solutions and Future Directions

Authors: Liyang Lu, Ke Ma, Zhaocheng Wang

Abstract: Near-field (NF) communications draw much attention in the context of extremely large-scale antenna arrays (ELAA). Owing to a large number of antennas and high carrier frequency, the NF coverage distance is quite substantial, where the electromagnetic radiation propagates by spherical waves, in contrast to the conventional planar waves of the far-field. Motivated by these facts, the block-dominant… ▽ More Near-field (NF) communications draw much attention in the context of extremely large-scale antenna arrays (ELAA). Owing to a large number of antennas and high carrier frequency, the NF coverage distance is quite substantial, where the electromagnetic radiation propagates by spherical waves, in contrast to the conventional planar waves of the far-field. Motivated by these facts, the block-dominant compressed sensing (BD-CS) assisted NF communications are proposed. Specifically, we elucidate why block sparsity exists in the distance-limited NF region. Then, block-dominant side-information (BD-SI) is introduced in support of the actual NF communication implementation. We validate that BD-CS is capable of providing exceptional channel estimation accuracy and high spectral efficiency, where the associated challenges, opportunities and its actual implementation in NF communications need to be carefully addressed. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Submitted to IEEE for possible publication

arXiv:2402.19276 [pdf, other]

Modular Blind Video Quality Assessment

Authors: Wen Wen, Mu Li, Yabin Zhang, Yiting Liao, Junlin Li, Li Zhang, Kede Ma

Abstract: Blind video quality assessment (BVQA) plays a pivotal role in evaluating and improving the viewing experience of end-users across a wide range of video-based platforms and services. Contemporary deep learning-based models primarily analyze video content in its aggressively subsampled format, while being blind to the impact of the actual spatial resolution and frame rate on video quality. In this p… ▽ More Blind video quality assessment (BVQA) plays a pivotal role in evaluating and improving the viewing experience of end-users across a wide range of video-based platforms and services. Contemporary deep learning-based models primarily analyze video content in its aggressively subsampled format, while being blind to the impact of the actual spatial resolution and frame rate on video quality. In this paper, we propose a modular BVQA model and a method of training it to improve its modularity. Our model comprises a base quality predictor, a spatial rectifier, and a temporal rectifier, responding to the visual content and distortion, spatial resolution, and frame rate changes on video quality, respectively. During training, spatial and temporal rectifiers are dropped out with some probabilities to render the base quality predictor a standalone BVQA model, which should work better with the rectifiers. Extensive experiments on both professionally-generated content and user-generated content video databases show that our quality model achieves superior or comparable performance to current methods. Additionally, the modularity of our model offers an opportunity to analyze existing video quality databases in terms of their spatial and temporal complexity. △ Less

Submitted 31 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted by CVPR 2024; Camera-ready version

arXiv:2402.11419 [pdf, other]

A Self-Healing Magnetic-Array-Type Current Sensor with Data-Driven Identification of Abnormal Magnetic Measurement Units

Authors: Xiaohu Liu, Wei Zhao, Kang Ma, Jian Liu, Lisha Peng, Songling Huang, Shisong Li

Abstract: Magnetic-array-type current sensors have garnered increasing popularity owing to their notable advantages, including broadband functionality, a large dynamic range, cost-effectiveness, and compact dimensions. However, the susceptibility of the measurement error of one or more magnetic measurement units (MMUs) within the current sensor to drift significantly from the nominal value due to environmen… ▽ More Magnetic-array-type current sensors have garnered increasing popularity owing to their notable advantages, including broadband functionality, a large dynamic range, cost-effectiveness, and compact dimensions. However, the susceptibility of the measurement error of one or more magnetic measurement units (MMUs) within the current sensor to drift significantly from the nominal value due to environmental factors poses a potential threat to the measurement accuracy of the current sensor.In light of the need to ensure sustained measurement accuracy over the long term, this paper proposes an innovative self-healing approach rooted in cyber-physics correlation. This approach aims to identify MMUs exhibiting abnormal measurement errors, allowing for the exclusive utilization of the remaining unaffected MMUs in the current measurement process. To achieve this, principal component analysis (PCA) is employed to discern the primary component, arising from fluctuations of the measured current, from the residual component, attributed to the drift in measurement error. This analysis is conducted by scrutinizing the measured data obtained from the MMUs. Subsequently, the squared prediction error (SPE) statistic (also called $Q$ statistic) is deployed to individually identify any MMU displaying abnormal behavior. The experimental results demonstrate the successful online identification of abnormal MMUs without the need for a standard magnetic field sensor. By eliminating the contributions from the identified abnormal MMUs, the accuracy of the current measurement is effectively preserved. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: 11 pages, 10 figures

arXiv:2402.11250 [pdf, other]

Hierarchical Prior-based Super Resolution for Point Cloud Geometry Compression

Authors: Dingquan Li, Kede Ma, **g Wang, Ge Li

Abstract: The Geometry-based Point Cloud Compression (G-PCC) has been developed by the Moving Picture Experts Group to compress point clouds. In its lossy mode, the reconstructed point cloud by G-PCC often suffers from noticeable distortions due to the naïve geometry quantization (i.e., grid downsampling). This paper proposes a hierarchical prior-based super resolution method for point cloud geometry compre… ▽ More The Geometry-based Point Cloud Compression (G-PCC) has been developed by the Moving Picture Experts Group to compress point clouds. In its lossy mode, the reconstructed point cloud by G-PCC often suffers from noticeable distortions due to the naïve geometry quantization (i.e., grid downsampling). This paper proposes a hierarchical prior-based super resolution method for point cloud geometry compression. The content-dependent hierarchical prior is constructed at the encoder side, which enables coarse-to-fine super resolution of the point cloud geometry at the decoder side. A more accurate prior generally yields improved reconstruction performance, at the cost of increased bits required to encode this side information. With a proper balance between prior accuracy and bit consumption, the proposed method demonstrates substantial Bjontegaard-delta bitrate savings on the MPEG Cat1A dataset, surpassing the octree-based and trisoup-based G-PCC v14. We provide our implementations for reproducible research at https://github.com/lidq92/mpeg-pcc-tmc13. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.05817 [pdf]

Using YOLO v7 to Detect Kidney in Magnetic Resonance Imaging

Authors: Pouria Yazdian Anari, Fiona Obiezu, Nathan Lay, Fatemeh Dehghani Firouzabadi, Aditi Chaurasia, Mahshid Golagha, Shiva Singh, Fatemeh Homayounieh, Aryan Zahergivar, Stephanie Harmon, Evrim Turkbey, Rabindra Gautam, Kevin Ma, Maria Merino, Elizabeth C. Jones, Mark W. Ball, W. Marston Linehan, Baris Turkbey, Ashkan A. Malayeri

Abstract: Introduction This study explores the use of the latest You Only Look Once (YOLO V7) object detection method to enhance kidney detection in medical imaging by training and testing a modified YOLO V7 on medical image formats. Methods Study includes 878 patients with various subtypes of renal cell carcinoma (RCC) and 206 patients with normal kidneys. A total of 5657 MRI scans for 1084 patients were r… ▽ More Introduction This study explores the use of the latest You Only Look Once (YOLO V7) object detection method to enhance kidney detection in medical imaging by training and testing a modified YOLO V7 on medical image formats. Methods Study includes 878 patients with various subtypes of renal cell carcinoma (RCC) and 206 patients with normal kidneys. A total of 5657 MRI scans for 1084 patients were retrieved. 326 patients with 1034 tumors recruited from a retrospective maintained database, and bounding boxes were drawn around their tumors. A primary model was trained on 80% of annotated cases, with 20% saved for testing (primary test set). The best primary model was then used to identify tumors in the remaining 861 patients and bounding box coordinates were generated on their scans using the model. Ten benchmark training sets were created with generated coordinates on not-segmented patients. The final model used to predict the kidney in the primary test set. We reported the positive predictive value (PPV), sensitivity, and mean average precision (mAP). Results The primary training set showed an average PPV of 0.94 +/- 0.01, sensitivity of 0.87 +/- 0.04, and mAP of 0.91 +/- 0.02. The best primary model yielded a PPV of 0.97, sensitivity of 0.92, and mAP of 0.95. The final model demonstrated an average PPV of 0.95 +/- 0.03, sensitivity of 0.98 +/- 0.004, and mAP of 0.95 +/- 0.01. Conclusion Using a semi-supervised approach with a medical image library, we developed a high-performing model for kidney detection. Further external validation is required to assess the model's generalizability. △ Less

Submitted 12 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

arXiv:2312.12795 [pdf, ps, other]

doi 10.1109/TSG.2023.3326928

Joint Trading and Scheduling among Coupled Carbon-Electricity-Heat-Gas Industrial Clusters

Authors: Dafeng Zhu, Bo Yang, Yu Wu, Haoran Deng, Zhaoyang Dong, Kai Ma, ** Guan

Abstract: This paper presents a carbon-energy coupling management framework for an industrial park, where the carbon flow model accompanying multi-energy flows is adopted to track and suppress carbon emissions on the user side. To deal with the quadratic constraint of gas flows, a bound tightening algorithm for constraints relaxation is adopted. The synergies among the carbon capture, energy storage, power-… ▽ More This paper presents a carbon-energy coupling management framework for an industrial park, where the carbon flow model accompanying multi-energy flows is adopted to track and suppress carbon emissions on the user side. To deal with the quadratic constraint of gas flows, a bound tightening algorithm for constraints relaxation is adopted. The synergies among the carbon capture, energy storage, power-to-gas further consume renewable energy and reduce carbon emissions. Aiming at carbon emissions disparities and supply-demand imbalances, this paper proposes a carbon trading ladder reward and punishment mechanism and an energy trading and scheduling method based on Lyapunov optimization and matching game to maximize the long-term benefits of each industrial cluster without knowing the prior information of random variables. Case studies show that our proposed trading method can reduce overall costs and carbon emissions while relieving energy pressure, which is important for Environmental, Social and Governance (ESG). △ Less

Submitted 20 December, 2023; originally announced December 2023.

Journal ref: IEEE Transactions on Smart Grid, 2023

arXiv:2312.01679 [pdf, other]

Adversarial Medical Image with Hierarchical Feature Hiding

Authors: Qingsong Yao, Zecheng He, Yuexiang Li, Yi Lin, Kai Ma, Yefeng Zheng, S. Kevin Zhou

Abstract: Deep learning based methods for medical images can be easily compromised by adversarial examples (AEs), posing a great security flaw in clinical decision-making. It has been discovered that conventional adversarial attacks like PGD which optimize the classification logits, are easy to distinguish in the feature space, resulting in accurate reactive defenses. To better understand this phenomenon an… ▽ More Deep learning based methods for medical images can be easily compromised by adversarial examples (AEs), posing a great security flaw in clinical decision-making. It has been discovered that conventional adversarial attacks like PGD which optimize the classification logits, are easy to distinguish in the feature space, resulting in accurate reactive defenses. To better understand this phenomenon and reassess the reliability of the reactive defenses for medical AEs, we thoroughly investigate the characteristic of conventional medical AEs. Specifically, we first theoretically prove that conventional adversarial attacks change the outputs by continuously optimizing vulnerable features in a fixed direction, thereby leading to outlier representations in the feature space. Then, a stress test is conducted to reveal the vulnerability of medical images, by comparing with natural images. Interestingly, this vulnerability is a double-edged sword, which can be exploited to hide AEs. We then propose a simple-yet-effective hierarchical feature constraint (HFC), a novel add-on to conventional white-box attacks, which assists to hide the adversarial feature in the target feature distribution. The proposed method is evaluated on three medical datasets, both 2D and 3D, with different modalities. The experimental results demonstrate the superiority of HFC, \emph{i.e.,} it bypasses an array of state-of-the-art adversarial medical AE detectors more efficiently than competing adaptive attacks, which reveals the deficiencies of medical reactive defense and allows to develop more robust defenses in future. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Our code is available at \url{https://github.com/qsyao/Hierarchical_Feature_Constraint}. arXiv admin note: text overlap with arXiv:2012.09501

arXiv:2310.12877 [pdf, other]

Perceptual Assessment and Optimization of HDR Image Rendering

Authors: Peibei Cao, Rafal K. Mantiuk, Kede Ma

Abstract: High dynamic range (HDR) rendering has the ability to faithfully reproduce the wide luminance ranges in natural scenes, but how to accurately assess the rendering quality is relatively underexplored. Existing quality models are mostly designed for low dynamic range (LDR) images, and do not align well with human perception of HDR image quality. To fill this gap, we propose a family of HDR quality m… ▽ More High dynamic range (HDR) rendering has the ability to faithfully reproduce the wide luminance ranges in natural scenes, but how to accurately assess the rendering quality is relatively underexplored. Existing quality models are mostly designed for low dynamic range (LDR) images, and do not align well with human perception of HDR image quality. To fill this gap, we propose a family of HDR quality metrics, in which the key step is employing a simple inverse display model to decompose an HDR image into a stack of LDR images with varying exposures. Subsequently, these decomposed images are assessed through well-established LDR quality metrics. Our HDR quality models present three distinct benefits. First, they directly inherit the recent advancements of LDR quality metrics. Second, they do not rely on human perceptual data of HDR image quality for re-calibration. Third, they facilitate the alignment and prioritization of specific luminance ranges for more accurate and detailed quality assessment. Experimental results show that our HDR quality metrics consistently outperform existing models in terms of quality assessment on four HDR image quality datasets and perceptual optimization of HDR novel view synthesis. △ Less

Submitted 16 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.05962 [pdf, other]

Improving the Performance of R17 Type-II Codebook with Deep Learning

Authors: Ke Ma, Yiliang Sang, Yang Ming, ** Lian, Chang Tian, Zhaocheng Wang

Abstract: The Type-II codebook in Release 17 (R17) exploits the angular-delay-domain partial reciprocity between uplink and downlink channels to select part of angular-delay-domain ports for measuring and feeding back the downlink channel state information (CSI), where the performance of existing deep learning enhanced CSI feedback methods is limited due to the deficiency of sparse structures. To address th… ▽ More The Type-II codebook in Release 17 (R17) exploits the angular-delay-domain partial reciprocity between uplink and downlink channels to select part of angular-delay-domain ports for measuring and feeding back the downlink channel state information (CSI), where the performance of existing deep learning enhanced CSI feedback methods is limited due to the deficiency of sparse structures. To address this issue, we propose two new perspectives of adopting deep learning to improve the R17 Type-II codebook. Firstly, considering the low signal-to-noise ratio of uplink channels, deep learning is utilized to accurately select the dominant angular-delay-domain ports, where the focal loss is harnessed to solve the class imbalance problem. Secondly, we propose to adopt deep learning to reconstruct the downlink CSI based on the feedback of the R17 Type-II codebook at the base station, where the information of sparse structures can be effectively leveraged. Besides, a weighted shortcut module is designed to facilitate the accurate reconstruction. Simulation results demonstrate that our proposed methods could improve the sum rate performance compared with its traditional R17 Type-II codebook and deep learning benchmarks. △ Less

Submitted 13 September, 2023; originally announced October 2023.

Comments: Accepted by IEEE GLOBECOM 2023, conference version of Arxiv:2305.08081

arXiv:2309.12461 [pdf, other]

Knowledge Base Aware Semantic Communication in Vehicular Networks

Authors: Le Xia, Yao Sun, Dusit Niyato, Kairong Ma, Jiawen Kang, Muhammad Ali Imran

Abstract: Semantic communication (SemCom) has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to vehicular networks, which normally consume a tremendous amount of resources to achieve stringent requirements on high reliability and low latency. Unfortunately, the unique background k… ▽ More Semantic communication (SemCom) has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to vehicular networks, which normally consume a tremendous amount of resources to achieve stringent requirements on high reliability and low latency. Unfortunately, the unique background knowledge matching mechanism in SemCom makes it challenging to realize efficient vehicle-to-vehicle service provisioning for multiple users at the same time. To this end, this paper identifies and jointly addresses two fundamental problems of knowledge base construction (KBC) and vehicle service pairing (VSP) inherently existing in SemCom-enabled vehicular networks. Concretely, we first derive the knowledge matching based queuing latency specific for semantic data packets, and then formulate a latency-minimization problem subject to several KBC and VSP related reliability constraints. Afterward, a SemCom-empowered Service Supplying Solution (S$^{\text{4}}$) is proposed along with the theoretical analysis of its optimality guarantee. Simulation results demonstrate the superiority of S$^{\text{4}}$ in terms of average queuing latency, semantic data packet throughput, and user knowledge preference satisfaction compared with two different benchmarks. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: This paper has been accepted for publication by 2023 IEEE International Conference on Communications (ICC 2023). Copyright may be transferred without notice, after which this version may no longer be accessible. arXiv admin note: substantial text overlap with arXiv:2302.11993

arXiv:2308.00382 [pdf]

doi 10.1109/JESTPE.2021.3105522

Artificial-Intelligence-Based Triple Phase Shift Modulation for Dual Active Bridge Converter with Minimized Current Stress

Authors: Xinze Li, Xin Zhang, Fanfan Lin, Changjiang Sun, Kezhi Mao

Abstract: The dual active bridge (DAB) converter has been popular in many applications for its outstanding power density and bidirectional power transfer capacity. Up to now, triple phase shift (TPS) can be considered as one of the most advanced modulation techniques for DAB converter. It can widen zero voltage switching range and improve power efficiency significantly. Currently, current stress of the DAB… ▽ More The dual active bridge (DAB) converter has been popular in many applications for its outstanding power density and bidirectional power transfer capacity. Up to now, triple phase shift (TPS) can be considered as one of the most advanced modulation techniques for DAB converter. It can widen zero voltage switching range and improve power efficiency significantly. Currently, current stress of the DAB converter has been an important performance indicator when TPS modulation is applied for smaller size and higher efficiency. However, to minimize the current stress when the DAB converter is under TPS modulation, two difficulties exist in analysis process and realization process, respectively. Firstly, three degrees of modulation variables in TPS modulation bring challenges to the analysis of current stress in different operating modes. This analysis and deduction process leads to heavy computational burden and also suffers from low accuracy. Secondly, to realize TPS modulation, if a lookup table is adopted after the optimization of modulation variables, modulation performance will be unsatisfactory because of the discrete nature of lookup table. Therefore, an AI-based TPS modulation (AI-TPSM) strategy is proposed in this paper. Neural network (NN) and fuzzy inference system (FIS) are utilized to deal with the two difficulties mentioned above. With the proposed AI-TPSM, the optimization of TPS modulation for minimized current stress will enjoy high degree of automation which can relieve engineers' working burden and improve accuracy. In the end of this paper, the effectiveness of the proposed AI-TPSM has been experimentally verified with a 1 kW prototype. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 12 pages, 29 figures

arXiv:2308.00381 [pdf]

doi 10.1109/JESTPE.2022.3185090

Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency

Authors: Xinze Li, Xin Zhang, Fanfan Lin, Changjiang Sun, Kezhi Mao

Abstract: Dual active bridge (DAB) converter is the key enabler in many popular applications such as wireless charging, electric vehicle and renewable energy. ZVS range and efficiency are two significant performance indicators for DAB converter. To obtain the desired ZVS and efficiency performance, modulation should be carefully designed. Hybrid modulation considers several single modulation strategies to a… ▽ More Dual active bridge (DAB) converter is the key enabler in many popular applications such as wireless charging, electric vehicle and renewable energy. ZVS range and efficiency are two significant performance indicators for DAB converter. To obtain the desired ZVS and efficiency performance, modulation should be carefully designed. Hybrid modulation considers several single modulation strategies to achieve good comprehensive performance. Conventionally, to design a hybrid modulation, harmonic approach or piecewise approach is used, but they suffer from time-consuming model building process and inaccuracy. Therefore, an artificial-intelligence-based hybrid extended phase shift (HEPS) modulation is proposed. Generally, the HEPS modulation is developed in an automated fashion, which alleviates cumbersome model building process while kee** high model accuracy. In HEPS modulation, two EPS strategies are considered to realize optimal efficiency with full ZVS operation over entire operating ranges. Specifically, to build data-driven models of ZVS and efficiency performance, extreme gradient boosting (XGBoost), which is a state-of-the-art ensemble learning algorithm, is adopted. Afterwards, particle swarm optimization with state-based adaptive velocity limit (PSO-SAVL) is utilized to select the best EPS strategy and optimize modulation parameters. With 1 kW hardware experiments, the feasibility of HEPS has been verified, achieving optimal efficiency with maximum of 97.1% and full-range ZVS operation. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 13 pages, 32 figures

arXiv:2307.13981 [pdf, other]

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models

Authors: Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, Kede Ma

Abstract: Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to proper… ▽ More Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models. △ Less

Submitted 3 April, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

arXiv:2307.09570 [pdf, other]

SAM-Path: A Segment Anything Model for Semantic Segmentation in Digital Pathology

Authors: **gwei Zhang, Ke Ma, Saarthak Kapse, Joel Saltz, Maria Vakalopoulou, Prateek Prasanna, Dimitris Samaras

Abstract: Semantic segmentations of pathological entities have crucial clinical value in computational pathology workflows. Foundation models, such as the Segment Anything Model (SAM), have been recently proposed for universal use in segmentation tasks. SAM shows remarkable promise in instance segmentation on natural images. However, the applicability of SAM to computational pathology tasks is limited due t… ▽ More Semantic segmentations of pathological entities have crucial clinical value in computational pathology workflows. Foundation models, such as the Segment Anything Model (SAM), have been recently proposed for universal use in segmentation tasks. SAM shows remarkable promise in instance segmentation on natural images. However, the applicability of SAM to computational pathology tasks is limited due to the following factors: (1) lack of comprehensive pathology datasets used in SAM training and (2) the design of SAM is not inherently optimized for semantic segmentation tasks. In this work, we adapt SAM for semantic segmentation by introducing trainable class prompts, followed by further enhancements through the incorporation of a pathology encoder, specifically a pathology foundation model. Our framework, SAM-Path enhances SAM's ability to conduct semantic segmentation in digital pathology without human input prompts. Through experiments on two public pathology datasets, the BCSS and the CRAG datasets, we demonstrate that the fine-tuning with trainable class prompts outperforms vanilla SAM with manual prompts and post-processing by 27.52% in Dice score and 71.63% in IOU. On these two datasets, the proposed additional pathology foundation model further achieves a relative improvement of 5.07% to 5.12% in Dice score and 4.50% to 8.48% in IOU. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: Submitted to MedAGI 2023

arXiv:2305.10353 [pdf, other]

An Ensemble Learning Approach for Exercise Detection in Type 1 Diabetes Patients

Authors: Ke Ma, Hongkai Chen, Shan Lin

Abstract: Type 1 diabetes is a serious disease in which individuals are unable to regulate their blood glucose levels, leading to various medical complications. Artificial pancreas (AP) systems have been developed as a solution for type 1 diabetic patients to mimic the behavior of the pancreas and regulate blood glucose levels. However, current AP systems lack detection capabilities for exercise-induced glu… ▽ More Type 1 diabetes is a serious disease in which individuals are unable to regulate their blood glucose levels, leading to various medical complications. Artificial pancreas (AP) systems have been developed as a solution for type 1 diabetic patients to mimic the behavior of the pancreas and regulate blood glucose levels. However, current AP systems lack detection capabilities for exercise-induced glucose intake, which can last up to 4 to 8 hours. This incapability can lead to hypoglycemia, which if left untreated, could have serious consequences, including death. Existing exercise detection methods are either limited to single sensor data or use inaccurate models for exercise detection, making them less effective in practice. In this work, we propose an ensemble learning framework that combines a data-driven physiological model and a Siamese network to leverage multiple physiological signal streams for exercise detection with high accuracy. To evaluate the effectiveness of our proposed approach, we utilized a public dataset with 12 diabetic patients collected from an 8-week clinical trial. Our approach achieves a true positive rate for exercise detection of 86.4% and a true negative rate of 99.1%, outperforming state-of-the-art solutions. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: 10 pages, 7 figures, 2 tables

MSC Class: 68T07 (Primary) 34A05 (Secondary) ACM Class: J.3

arXiv:2305.00837 [pdf, other]

LCAUnet: A skin lesion segmentation network with enhanced edge and body fusion

Authors: Qisen Ma, Keming Mao, Gao Wang, Lisheng Xu, Yuhai Zhao

Abstract: Accurate segmentation of skin lesions in dermatoscopic images is crucial for the early diagnosis of skin cancer and improving the survival rate of patients. However, it is still a challenging task due to the irregularity of lesion areas, the fuzziness of boundaries, and other complex interference factors. In this paper, a novel LCAUnet is proposed to improve the ability of complementary representa… ▽ More Accurate segmentation of skin lesions in dermatoscopic images is crucial for the early diagnosis of skin cancer and improving the survival rate of patients. However, it is still a challenging task due to the irregularity of lesion areas, the fuzziness of boundaries, and other complex interference factors. In this paper, a novel LCAUnet is proposed to improve the ability of complementary representation with fusion of edge and body features, which are often paid little attentions in traditional methods. First, two separate branches are set for edge and body segmentation with CNNs and Transformer based architecture respectively. Then, LCAF module is utilized to fuse feature maps of edge and body of the same level by local cross-attention operation in encoder stage. Furthermore, PGMF module is embedded for feature integration with prior guided multi-scale adaption. Comprehensive experiments on public available dataset ISIC 2017, ISIC 2018, and PH2 demonstrate that LCAUnet outperforms most state-of-the-art methods. The ablation studies also verify the effectiveness of the proposed fusion techniques. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: 14 pages, 10 figures

arXiv:2303.15043 [pdf, other]

Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time

Authors: Wei Shang, Dongwei Ren, Yi Yang, Hongzhi Zhang, Kede Ma, Wangmeng Zuo

Abstract: Natural videos captured by consumer cameras often suffer from low framerate and motion blur due to the combination of dynamic scene complexity, lens and sensor imperfection, and less than ideal exposure setting. As a result, computational methods that jointly perform video frame interpolation and deblurring begin to emerge with the unrealistic assumption that the exposure time is known and fixed.… ▽ More Natural videos captured by consumer cameras often suffer from low framerate and motion blur due to the combination of dynamic scene complexity, lens and sensor imperfection, and less than ideal exposure setting. As a result, computational methods that jointly perform video frame interpolation and deblurring begin to emerge with the unrealistic assumption that the exposure time is known and fixed. In this work, we aim ambitiously for a more realistic and challenging task - joint video multi-frame interpolation and deblurring under unknown exposure time. Toward this goal, we first adopt a variant of supervised contrastive learning to construct an exposure-aware representation from input blurred frames. We then train two U-Nets for intra-motion and inter-motion analysis, respectively, adapting to the learned exposure representation via gain tuning. We finally build our video reconstruction network upon the exposure and motion representation by progressive exposure-adaptive convolution and motion refinement. Extensive experiments on both simulated and real-world datasets show that our optimized method achieves notable performance gains over the state-of-the-art on the joint video x8 interpolation and deblurring task. Moreover, on the seemingly implausible x16 interpolation task, our method outperforms existing methods by more than 1.5 dB in terms of PSNR. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023, available at https://github.com/shangwei5/VIDUE

ACM Class: I.4.3

arXiv:2303.14964 [pdf, other]

Learning a Deep Color Difference Metric for Photographic Images

Authors: Haoyu Chen, Zhihua Wang, Yang Yang, Qilin Sun, Kede Ma

Abstract: Most well-established and widely used color difference (CD) metrics are handcrafted and subject-calibrated against uniformly colored patches, which do not generalize well to photographic images characterized by natural scene complexities. Constructing CD formulae for photographic images is still an active research topic in imaging/illumination, vision science, and color science communities. In thi… ▽ More Most well-established and widely used color difference (CD) metrics are handcrafted and subject-calibrated against uniformly colored patches, which do not generalize well to photographic images characterized by natural scene complexities. Constructing CD formulae for photographic images is still an active research topic in imaging/illumination, vision science, and color science communities. In this paper, we aim to learn a deep CD metric for photographic images with four desirable properties. First, it well aligns with the observations in vision science that color and form are linked inextricably in visual cortical processing. Second, it is a proper metric in the mathematical sense. Third, it computes accurate CDs between photographic images, differing mainly in color appearances. Fourth, it is robust to mild geometric distortions (e.g., translation or due to parallax), which are often present in photographic images of the same scene captured by different digital cameras. We show that all these properties can be satisfied at once by learning a multi-scale autoregressive normalizing flow for feature transform, followed by the Euclidean distance which is linearly proportional to the human perceptual CD. Quantitative and qualitative experiments on the large-scale SPCD dataset demonstrate the promise of the learned CD metric. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.09400 [pdf, other]

Enhancing Vital Sign Estimation Performance of FMCW MIMO Radar by Prior Human Shape Recognition

Authors: Hadi Alidoustaghdam, Min Chen, Ben Willetts, Kai Mao, André Kokkeler, Yang Miao

Abstract: Radio technology enabled contact-free human posture and vital sign estimation is promising for health monitoring. Radio systems at millimeter-wave (mmWave) frequencies advantageously bring large bandwidth, multi-antenna array and beam steering capability. \textit{However}, the human point cloud obtained by mmWave radar and utilized for posture estimation is likely to be sparse and incomplete. Addi… ▽ More Radio technology enabled contact-free human posture and vital sign estimation is promising for health monitoring. Radio systems at millimeter-wave (mmWave) frequencies advantageously bring large bandwidth, multi-antenna array and beam steering capability. \textit{However}, the human point cloud obtained by mmWave radar and utilized for posture estimation is likely to be sparse and incomplete. Additionally, human's random body movements deteriorate the estimation of breathing and heart rates, therefore the information of the chest location and a narrow radar beam toward the chest are demanded for more accurate vital sign estimation. In this paper, we propose a pipeline aiming to enhance the vital sign estimation performance of mmWave FMCW MIMO radar. The first step is to recognize human body part and posture, where we exploit a trained Convolutional Neural Networks (CNN) to efficiently process the imperfect human form point cloud. The CNN framework outputs the key point of different body parts, and was trained by using RGB image reference and Augmentative Ellipse Fitting Algorithm (AEFA). The next step is to utilize the chest information of the prior estimated human posture for vital sign estimation. While CNN is initially trained based on the frame-by-frame point clouds of human for posture estimation, the vital signs are extracted through beamforming toward the human chest. The numerical results show that this spatial filtering improves the estimation of the vital signs in regard to lowering the level of side harmonics and detecting the harmonics of vital signs efficiently, i.e., peak-to-average power ratio in the harmonics of vital signal is improved up to 0.02 and 0.07dB for the studied cases. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: Accepted for presentation at the IEEE ICC 2023 conference

arXiv:2212.13059 [pdf]

OMSN and FAROS: OCTA Microstructure Segmentation Network and Fully Annotated Retinal OCTA Segmentation Dataset

Authors: Peng Xiao, Xiaodong Hu, Ke Ma, Gengyuan Wang, Ziqing Feng, Yuancong Huang, ** Yuan

Abstract: The lack of efficient segmentation methods and fully-labeled datasets limits the comprehensive assessment of optical coherence tomography angiography (OCTA) microstructures like retinal vessel network (RVN) and foveal avascular zone (FAZ), which are of great value in ophthalmic and systematic diseases evaluation. Here, we introduce an innovative OCTA microstructure segmentation network (OMSN) by c… ▽ More The lack of efficient segmentation methods and fully-labeled datasets limits the comprehensive assessment of optical coherence tomography angiography (OCTA) microstructures like retinal vessel network (RVN) and foveal avascular zone (FAZ), which are of great value in ophthalmic and systematic diseases evaluation. Here, we introduce an innovative OCTA microstructure segmentation network (OMSN) by combining an encoder-decoder-based architecture with multi-scale skip connections and the split-attention-based residual network ResNeSt, paying specific attention to OCTA microstructural features while facilitating better model convergence and feature representations. The proposed OMSN achieves excellent single/multi-task performances for RVN or/and FAZ segmentation. Especially, the evaluation metrics on multi-task models outperform single-task models on the same dataset. On this basis, a fully annotated retinal OCTA segmentation (FAROS) dataset is constructed semi-automatically, filling the vacancy of a pixel-level fully-labeled OCTA dataset. OMSN multi-task segmentation model retrained with FAROS further certifies its outstanding accuracy for simultaneous RVN and FAZ segmentation. △ Less

Submitted 26 December, 2022; originally announced December 2022.

Comments: 10 pages, 6 figures, submitted to IEEE Transactions on Medical Imaging (TMI)

arXiv:2212.02764 [pdf, other]

A Trustworthy Framework for Medical Image Analysis with Deep Learning

Authors: Kai Ma, Siyuan He, Pengcheng Xi, Ashkan Ebadi, Stéphane Tremblay, Alexander Wong

Abstract: Computer vision and machine learning are playing an increasingly important role in computer-assisted diagnosis; however, the application of deep learning to medical imaging has challenges in data availability and data imbalance, and it is especially important that models for medical imaging are built to be trustworthy. Therefore, we propose TRUDLMIA, a trustworthy deep learning framework for medic… ▽ More Computer vision and machine learning are playing an increasingly important role in computer-assisted diagnosis; however, the application of deep learning to medical imaging has challenges in data availability and data imbalance, and it is especially important that models for medical imaging are built to be trustworthy. Therefore, we propose TRUDLMIA, a trustworthy deep learning framework for medical image analysis, which adopts a modular design, leverages self-supervised pre-training, and utilizes a novel surrogate loss function. Experimental evaluations indicate that models generated from the framework are both trustworthy and high-performing. It is anticipated that the framework will support researchers and clinicians in advancing the use of deep learning for dealing with public health crises including COVID-19. △ Less

Submitted 6 December, 2022; originally announced December 2022.

arXiv:2210.03904 [pdf, other]

LW-ISP: A Lightweight Model with ISP and Deep Learning

Authors: Hongyang Chen, Kaisheng Ma

Abstract: The deep learning (DL)-based methods of low-level tasks have many advantages over the traditional camera in terms of hardware prospects, error accumulation and imaging effects. Recently, the application of deep learning to replace the image signal processing (ISP) pipeline has appeared one after another; however, there is still a long way to go towards real landing. In this paper, we show the poss… ▽ More The deep learning (DL)-based methods of low-level tasks have many advantages over the traditional camera in terms of hardware prospects, error accumulation and imaging effects. Recently, the application of deep learning to replace the image signal processing (ISP) pipeline has appeared one after another; however, there is still a long way to go towards real landing. In this paper, we show the possibility of learning-based method to achieve real-time high-performance processing in the ISP pipeline. We propose LW-ISP, a novel architecture designed to implicitly learn the image map** from RAW data to RGB image. Based on U-Net architecture, we propose the fine-grained attention module and a plug-and-play upsampling block suitable for low-level tasks. In particular, we design a heterogeneous distillation algorithm to distill the implicit features and reconstruction information of the clean image, so as to guide the learning of the student model. Our experiments demonstrate that LW-ISP has achieved a 0.38 dB improvement in PSNR compared to the previous best method, while the model parameters and calculation have been reduced by 23 times and 81 times. The inference efficiency has been accelerated by at least 15 times. Without bells and whistles, LW-ISP has achieved quite competitive results in ISP subtasks including image denoising and enhancement. △ Less

Submitted 8 October, 2022; originally announced October 2022.

Comments: 16 PAGES, ACCEPTED AS A CONFERENCE PAPER AT: BMVC 2022

arXiv:2210.02245 [pdf, other]

Channel Modeling for UAV-to-Ground Communications with Posture Variation and Fuselage Scattering Effect

Authors: Boyu Hua, Haoran Ni, Qiuming Zhu, Cheng-Xiang Wang, Tongtong Zhou, Kai Mao, Junwei Bao, Xiaofei Zhang

Abstract: Unmanned aerial vehicle (UAV)-to-ground (U2G) channel models play a pivotal role for reliable communications between UAV and ground terminal. This paper proposes a three-dimensional (3D) non-stationary hybrid model including both large-scale and small-scale fading for U2G multiple-input-multiple-output (MIMO) channels. Distinctive channel characteristics under U2G scenarios, i.e., 3D trajectory an… ▽ More Unmanned aerial vehicle (UAV)-to-ground (U2G) channel models play a pivotal role for reliable communications between UAV and ground terminal. This paper proposes a three-dimensional (3D) non-stationary hybrid model including both large-scale and small-scale fading for U2G multiple-input-multiple-output (MIMO) channels. Distinctive channel characteristics under U2G scenarios, i.e., 3D trajectory and posture of UAV, fuselage scattering effect (FSE), and posture variation fading (PVF), are incorporated into the proposed model. The channel parameters, i.e., path loss (PL), shadow fading (SF), path delay, and path angle, are generated incorporating machine learning (ML) and ray tracing (RT) techniques to capture the structure-related characteristics. In order to guarantee the physical continuity of channel parameters such as Doppler phase and path power, the time evolution methods of inter- and intra- stationary intervals are proposed. Key statistical properties , i.e., temporal autocorrection function (ACF), power delay profile (PDP), level crossing rate (LCR), average fading duration (AFD), and stationary interval (SI) are given, and the impact of the change of fuselage and posture variation is analyzed. It is demonstrated that both posture variation and fuselage scattering have crucial effects on channel characteristics. The validity and practicability of the proposed model are verified by comparing the simulation results with the measured ones. △ Less

Submitted 13 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

arXiv:2210.00933 [pdf, other]

Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop

Authors: Weixia Zhang, Dingquan Li, Xiongkuo Min, Guangtao Zhai, Guodong Guo, Xiaokang Yang, Kede Ma

Abstract: No-reference image quality assessment (NR-IQA) aims to quantify how humans perceive visual distortions of digital images without access to their undistorted references. NR-IQA models are extensively studied in computational vision, and are widely used for performance evaluation and perceptual optimization of man-made vision systems. Here we make one of the first attempts to examine the perceptual… ▽ More No-reference image quality assessment (NR-IQA) aims to quantify how humans perceive visual distortions of digital images without access to their undistorted references. NR-IQA models are extensively studied in computational vision, and are widely used for performance evaluation and perceptual optimization of man-made vision systems. Here we make one of the first attempts to examine the perceptual robustness of NR-IQA models. Under a Lagrangian formulation, we identify insightful connections of the proposed perceptual attack to previous beautiful ideas in computer vision and machine learning. We test one knowledge-driven and three data-driven NR-IQA methods under four full-reference IQA models (as approximations to human perception of just-noticeable differences). Through carefully designed psychophysical experiments, we find that all four NR-IQA models are vulnerable to the proposed perceptual attack. More interestingly, we observe that the generated counterexamples are not transferable, manifesting themselves as distinct design flows of respective NR-IQA methods. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022

arXiv:2209.08800 [pdf, ps, other]

A Realistic 3D Non-Stationary Channel Model for UAV-to-Vehicle Communications Incorporating Fuselage Posture

Authors: Boyu Hua, Tongtong Zhou, Qiuming Zhu, Kai Mao, Junwei Bao, Weizhi Zhong, Naeem Ahmed

Abstract: Considering the unmanned aerial vehicle (UAV) three-dimensional (3D) posture, a novel 3D non-stationary geometry-based stochastic model (GBSM) is proposed for multiple-input multiple-output (MIMO) UAV-to-vehicle (U2V) channels. It consists of a line-of-sight (LoS) and non-line-of-sight (NLoS) components. The factor of fuselage posture is considered by introducing a time-variant 3D posture matrix.… ▽ More Considering the unmanned aerial vehicle (UAV) three-dimensional (3D) posture, a novel 3D non-stationary geometry-based stochastic model (GBSM) is proposed for multiple-input multiple-output (MIMO) UAV-to-vehicle (U2V) channels. It consists of a line-of-sight (LoS) and non-line-of-sight (NLoS) components. The factor of fuselage posture is considered by introducing a time-variant 3D posture matrix. Some important statistical properties, i.e. the temporal autocorrelation function (ACF) and spatial cross correlation function (CCF), are derived and investigated. Simulation results show that the fuselage posture has significant impact on the U2V channel characteristic and aggravate the non-stationarity. The agreements between analytical, simulated, and measured results verify the correctness of proposed model and derivations. Moreover, it is demonstrated that the proposed model is also compatible to the existing GBSM without considering fuselage posture. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 12 pages, 8 figures, CNCOM

arXiv:2207.09312 [pdf, other]

Towards Trustworthy Healthcare AI: Attention-Based Feature Learning for COVID-19 Screening With Chest Radiography

Authors: Kai Ma, Pengcheng Xi, Karim Habashy, Ashkan Ebadi, Stéphane Tremblay, Alexander Wong

Abstract: Building AI models with trustworthiness is important especially in regulated areas such as healthcare. In tackling COVID-19, previous work uses convolutional neural networks as the backbone architecture, which has shown to be prone to over-caution and overconfidence in making decisions, rendering them less trustworthy -- a crucial flaw in the context of medical imaging. In this study, we propose a… ▽ More Building AI models with trustworthiness is important especially in regulated areas such as healthcare. In tackling COVID-19, previous work uses convolutional neural networks as the backbone architecture, which has shown to be prone to over-caution and overconfidence in making decisions, rendering them less trustworthy -- a crucial flaw in the context of medical imaging. In this study, we propose a feature learning approach using Vision Transformers, which use an attention-based mechanism, and examine the representation learning capability of Transformers as a new backbone architecture for medical imaging. Through the task of classifying COVID-19 chest radiographs, we investigate into whether generalization capabilities benefit solely from Vision Transformers' architectural advances. Quantitative and qualitative evaluations are conducted on the trustworthiness of the models, through the use of "trust score" computation and a visual explainability technique. We conclude that the attention-based feature learning approach is promising in building trustworthy deep learning models for healthcare. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: Accepted to 39th International Conference on Machine Learning, Workshop on Healthcare AI and COVID-19

arXiv:2206.09146 [pdf, other]

A Perceptually Optimized and Self-Calibrated Tone Map** Operator

Authors: Peibei Cao, Chenyang Le, Yuming Fang, Kede Ma

Abstract: With the increasing popularity and accessibility of high dynamic range (HDR) photography, tone map** operators (TMOs) for dynamic range compression are practically demanding. In this paper, we develop a two-stage neural network-based TMO that is self-calibrated and perceptually optimized. In Stage one, motivated by the physiology of the early stages of the human visual system, we first decompose… ▽ More With the increasing popularity and accessibility of high dynamic range (HDR) photography, tone map** operators (TMOs) for dynamic range compression are practically demanding. In this paper, we develop a two-stage neural network-based TMO that is self-calibrated and perceptually optimized. In Stage one, motivated by the physiology of the early stages of the human visual system, we first decompose an HDR image into a normalized Laplacian pyramid. We then use two lightweight deep neural networks (DNNs), taking the normalized representation as input and estimating the Laplacian pyramid of the corresponding LDR image. We optimize the tone map** network by minimizing the normalized Laplacian pyramid distance (NLPD), a perceptual metric aligning with human judgments of tone-mapped image quality. In Stage two, the input HDR image is self-calibrated to compute the final LDR image. We feed the same HDR image but rescaled with different maximum luminances to the learned tone map** network, and generate a pseudo-multi-exposure image stack with different detail visibility and color saturation. We then train another lightweight DNN to fuse the LDR image stack into a desired LDR image by maximizing a variant of the structural similarity index for multi-exposure image fusion (MEF-SSIM), which has been proven perceptually relevant to fused image quality. The proposed self-calibration mechanism through MEF enables our TMO to accept uncalibrated HDR images, while being physiology-driven. Extensive experiments show that our method produces images with consistently better visual quality. Additionally, since our method builds upon three lightweight DNNs, it is among the fastest local TMOs. △ Less

Submitted 25 August, 2023; v1 submitted 18 June, 2022; originally announced June 2022.

Comments: 15 pages,17 figures

arXiv:2206.08751 [pdf, other]

Perceptual Quality Assessment of Virtual Reality Videos in the Wild

Authors: Wen Wen, Mu Li, Yiru Yao, Xiangjie Sui, Yabin Zhang, Long Lan, Yuming Fang, Kede Ma

Abstract: Investigating how people perceive virtual reality (VR) videos in the wild (i.e., those captured by everyday users) is a crucial and challenging task in VR-related applications due to complex authentic distortions localized in space and time. Existing panoramic video databases only consider synthetic distortions, assume fixed viewing conditions, and are limited in size. To overcome these shortcomin… ▽ More Investigating how people perceive virtual reality (VR) videos in the wild (i.e., those captured by everyday users) is a crucial and challenging task in VR-related applications due to complex authentic distortions localized in space and time. Existing panoramic video databases only consider synthetic distortions, assume fixed viewing conditions, and are limited in size. To overcome these shortcomings, we construct the VR Video Quality in the Wild (VRVQW) database, containing $502$ user-generated videos with diverse content and distortion characteristics. Based on VRVQW, we conduct a formal psychophysical experiment to record the scanpaths and perceived quality scores from $139$ participants under two different viewing conditions. We provide a thorough statistical analysis of the recorded data, observing significant impact of viewing conditions on both human scanpaths and perceived quality. Moreover, we develop an objective quality assessment model for VR videos based on pseudocylindrical representation and convolution. Results on the proposed VRVQW show that our method is superior to existing video quality assessment models. We have made the database and code available at https://github.com/limuhit/VR-Video-Quality-in-the-Wild. △ Less

Submitted 15 March, 2024; v1 submitted 12 June, 2022; originally announced June 2022.

Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology

arXiv:2205.13489 [pdf, other]

Measuring Perceptual Color Differences of Smartphone Photographs

Authors: Zhihua Wang, Keshuo Xu, Yang Yang, Jianlei Dong, Shuhang Gu, Lihao Xu, Yuming Fang, Kede Ma

Abstract: Measuring perceptual color differences (CDs) is of great importance in modern smartphone photography. Despite the long history, most CD measures have been constrained by psychophysical data of homogeneous color patches or a limited number of simplistic natural photographic images. It is thus questionable whether existing CD measures generalize in the age of smartphone photography characterized by… ▽ More Measuring perceptual color differences (CDs) is of great importance in modern smartphone photography. Despite the long history, most CD measures have been constrained by psychophysical data of homogeneous color patches or a limited number of simplistic natural photographic images. It is thus questionable whether existing CD measures generalize in the age of smartphone photography characterized by greater content complexities and learning-based image signal processors. In this paper, we put together so far the largest image dataset for perceptual CD assessment, in which the photographic images are 1) captured by six flagship smartphones, 2) altered by Photoshop, 3) post-processed by built-in filters of the smartphones, and 4) reproduced with incorrect color profiles. We then conduct a large-scale psychophysical experiment to gather perceptual CDs of 30,000 image pairs in a carefully controlled laboratory environment. Based on the newly established dataset, we make one of the first attempts to construct an end-to-end learnable CD formula based on a lightweight neural network, as a generalization of several previous metrics. Extensive experiments demonstrate that the optimized formula outperforms 33 existing CD measures by a large margin, offers reasonable local CD maps without the use of dense supervision, generalizes well to homogeneous color patch data, and empirically behaves as a proper metric in the mathematical sense. Our dataset and code are publicly available at https://github.com/hellooks/CDNet. △ Less

Submitted 31 March, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: 10 figures, 8 tables, 14 pages

arXiv:2204.10090 [pdf, other]

Learn from Unpaired Data for Image Restoration: A Variational Bayes Approach

Authors: Dihan Zheng, Xiaowen Zhang, Kaisheng Ma, Chenglong Bao

Abstract: Collecting paired training data is difficult in practice, but the unpaired samples broadly exist. Current approaches aim at generating synthesized training data from unpaired samples by exploring the relationship between the corrupted and clean data. This work proposes LUD-VAE, a deep generative method to learn the joint probability density function from data sampled from marginal distributions. O… ▽ More Collecting paired training data is difficult in practice, but the unpaired samples broadly exist. Current approaches aim at generating synthesized training data from unpaired samples by exploring the relationship between the corrupted and clean data. This work proposes LUD-VAE, a deep generative method to learn the joint probability density function from data sampled from marginal distributions. Our approach is based on a carefully designed probabilistic graphical model in which the clean and corrupted data domains are conditionally independent. Using variational inference, we maximize the evidence lower bound (ELBO) to estimate the joint probability density function. Furthermore, we show that the ELBO is computable without paired samples under the inference invariant assumption. This property provides the mathematical rationale of our approach in the unpaired setting. Finally, we apply our method to real-world image denoising, super-resolution, and low-light image enhancement tasks and train the models using the synthetic data generated by the LUD-VAE. Experimental results validate the advantages of our method over other approaches. △ Less

Submitted 11 September, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

arXiv:2204.04088 [pdf, other]

Stochastic Gradient-based Fast Distributed Multi-Energy Management for an Industrial Park with Temporally-Coupled Constraints

Authors: Dafeng Zhu, Bo Yang, Chengbin Ma, Zhaojian Wang, Shanying Zhu, Kai Ma, ** Guan

Abstract: Contemporary industrial parks are challenged by the growing concerns about high cost and low efficiency of energy supply. Moreover, in the case of uncertain supply/demand, how to mobilize delay-tolerant elastic loads and compensate real-time inelastic loads to match multi-energy generation/storage and minimize energy cost is a key issue. Since energy management is hardly to be implemented offline… ▽ More Contemporary industrial parks are challenged by the growing concerns about high cost and low efficiency of energy supply. Moreover, in the case of uncertain supply/demand, how to mobilize delay-tolerant elastic loads and compensate real-time inelastic loads to match multi-energy generation/storage and minimize energy cost is a key issue. Since energy management is hardly to be implemented offline without knowing statistical information of random variables, this paper presents a systematic online energy cost minimization framework to fulfill the complementary utilization of multi-energy with time-varying generation, demand and price. Specifically to achieve charging/discharging constraints due to storage and short-term energy balancing, a fast distributed algorithm based on stochastic gradient with two-timescale implementation is proposed to ensure online implementation. To reduce the peak loads, an incentive mechanism is implemented by estimating users' willingness to shift. Analytical results on parameter setting are also given to guarantee feasibility and optimality of the proposed design. Numerical results show that when the bid-ask spread of electricity is small enough, the proposed algorithm can achieve the close-to-optimal cost asymptotically. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: Accepted by Applied Energy

arXiv:2203.07659 [pdf]

Breast Cancer Molecular Subtypes Prediction on Pathological Images with Discriminative Patch Selecting and Multi-Instance Learning

Authors: Hong Liu, Wen-Dong Xu, Zi-Hao Shang, Xiang-Dong Wang, Hai-Yan Zhou, Ke-Wen Ma, Huan Zhou, Jia-Lin Qi, Jia-Rui Jiang, Li-Lan Tan, Hui-Min Zeng, Hui-Juan Cai, Kuan-Song Wang, Yue-Liang Qian

Abstract: Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from con… ▽ More Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from conventional H&E pathological whole slide images (WSI) using AI method is useful and critical to assist pathologists pre-screen proper paraffin block for IHC. It's a challenging task since only WSI level labels of molecular subtypes can be obtained from IHC. Gigapixel WSIs are divided into a huge number of patches to be computationally feasible for deep learning. While with coarse slide-level labels, patch-based methods may suffer from abundant noise patches, such as folds, overstained regions, or non-tumor tissues. A weakly supervised learning framework based on discriminative patch selecting and multi-instance learning was proposed for breast cancer molecular subtype prediction from H&E WSIs. Firstly, co-teaching strategy was adopted to learn molecular subtype representations and filter out noise patches. Then, a balanced sampling strategy was used to handle the imbalance in subtypes in the dataset. In addition, a noise patch filtering algorithm that used local outlier factor based on cluster centers was proposed to further select discriminative patches. Finally, a loss function integrating patch with slide constraint information was used to finetune MIL framework on obtained discriminative patches and further improve the performance of molecular subty**. The experimental results confirmed the effectiveness of the proposed method and our models outperformed even senior pathologists, with potential to assist pathologists to pre-screen paraffin blocks for IHC in clinic. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2203.03640 [pdf, other]

doi 10.1109/TMI.2020.3014433

Conquering Data Variations in Resolution: A Slice-Aware Multi-Branch Decoder Network

Authors: Shuxin Wang, Shilei Cao, Zhizhong Chai, Dong Wei, Kai Ma, Liansheng Wang, Yefeng Zheng

Abstract: Fully convolutional neural networks have made promising progress in joint liver and liver tumor segmentation. Instead of following the debates over 2D versus 3D networks (for example, pursuing the balance between large-scale 2D pretraining and 3D context), in this paper, we novelly identify the wide variation in the ratio between intra- and inter-slice resolutions as a crucial obstacle to the perf… ▽ More Fully convolutional neural networks have made promising progress in joint liver and liver tumor segmentation. Instead of following the debates over 2D versus 3D networks (for example, pursuing the balance between large-scale 2D pretraining and 3D context), in this paper, we novelly identify the wide variation in the ratio between intra- and inter-slice resolutions as a crucial obstacle to the performance. To tackle the mismatch between the intra- and inter-slice information, we propose a slice-aware 2.5D network that emphasizes extracting discriminative features utilizing not only in-plane semantics but also out-of-plane coherence for each separate slice. Specifically, we present a slice-wise multi-input multi-output architecture to instantiate such a design paradigm, which contains a Multi-Branch Decoder (MD) with a Slice-centric Attention Block (SAB) for learning slice-specific features and a Densely Connected Dice (DCD) loss to regularize the inter-slice predictions to be coherent and continuous. Based on the aforementioned innovations, we achieve state-of-the-art results on the MICCAI 2017 Liver Tumor Segmentation (LiTS) dataset. Besides, we also test our model on the ISBI 2019 Segmentation of THoracic Organs at Risk (SegTHOR) dataset, and the result proves the robustness and generalizability of the proposed method in other segmentation tasks. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: Published by IEEE TMI

arXiv:2203.02390 [pdf, other]

doi 10.1007/978-3-030-87237-3_11

Simultaneous Alignment and Surface Regression Using Hybrid 2D-3D Networks for 3D Coherent Layer Segmentation of Retina OCT Images

Authors: Hong Liu, Dong Wei, Donghuan Lu, Yuexiang Li, Kai Ma, Liansheng Wang, Yefeng Zheng

Abstract: Automated surface segmentation of retinal layer is important and challenging in analyzing optical coherence tomography (OCT). Recently, many deep learning based methods have been developed for this task and yield remarkable performance. However, due to large spatial gap and potential mismatch between the B-scans of OCT data, all of them are based on 2D segmentation of individual B-scans, which may… ▽ More Automated surface segmentation of retinal layer is important and challenging in analyzing optical coherence tomography (OCT). Recently, many deep learning based methods have been developed for this task and yield remarkable performance. However, due to large spatial gap and potential mismatch between the B-scans of OCT data, all of them are based on 2D segmentation of individual B-scans, which may loss the continuity information across the B-scans. In addition, 3D surface of the retina layers can provide more diagnostic information, which is crucial in quantitative image analysis. In this study, a novel framework based on hybrid 2D-3D convolutional neural networks (CNNs) is proposed to obtain continuous 3D retinal layer surfaces from OCT. The 2D features of individual B-scans are extracted by an encoder consisting of 2D convolutions. These 2D features are then used to produce the alignment displacement field and layer segmentation by two 3D decoders, which are coupled via a spatial transformer module. The entire framework is trained end-to-end. To the best of our knowledge, this is the first study that attempts 3D retinal layer segmentation in volumetric OCT images based on CNNs. Experiments on a publicly available dataset show that our framework achieves superior results to state-of-the-art 2D methods in terms of both layer segmentation accuracy and cross-B-scan 3D continuity, thus offering more clinical values than previous works. △ Less

Submitted 4 March, 2022; originally announced March 2022.

Comments: Presented at MICCAI 2021

arXiv:2203.00270 [pdf, other]

Bidirectional Pricing and Demand Response for Nanogrids with HVAC Systems

Authors: Jiaxin Cao, Bo Yang, Shanying Zhu, Kai Ma, ** Guan

Abstract: Owing to the fluctuant renewable generation and power demand, the energy surplus or deficit in each nanogrid is embodied differently across time. To stimulate local renewable energy consumption and minimize the long-term energy cost, some issues still remain to be explored: when and how the energy demand and bidirectional trading prices are scheduled considering personal comfort preferences and en… ▽ More Owing to the fluctuant renewable generation and power demand, the energy surplus or deficit in each nanogrid is embodied differently across time. To stimulate local renewable energy consumption and minimize the long-term energy cost, some issues still remain to be explored: when and how the energy demand and bidirectional trading prices are scheduled considering personal comfort preferences and environmental factors. For this purpose, the demand response and two-way pricing problems concurrently for nanogrids and a public monitoring entity (PME) are studied with exploiting the large potential thermal elastic ability of heating, ventilation and air-conditioning (HVAC) units. Different from nanogrids, in terms of minimizing time-average costs, PME aims to set reasonable prices and optimize profits by trading with nanogrids and the main grid bi-directionally. In particular, such bilevel energy management problem is formulated as a stochastic form in a long-term horizon. Since there are uncertain system parameters, time-coupled queue constraints and the interplay of bilevel decision-making, it is challenging to solve the formulated problems. To this end, we derive a form of relaxation based on Lyapunov optimization technique to make the energy management problem tractable without forecasting the related system parameters. The transaction between nanogrids and PME is captured by a one-leader and multi-follower Stackelberg game framework. Then, theoretical analysis of the existence and uniqueness of Stackelberg equilibrium (SE) is developed based on the proposed game property. Following that, we devise an optimization algorithm to reach the SE with less information exchange. Numerical experiments validate the effectiveness of the proposed approach. △ Less

Submitted 1 March, 2022; originally announced March 2022.

arXiv:2202.08437 [pdf, other]

doi 10.1109/ISBI52829.2022.9761489

Visual attention analysis of pathologists examining whole slide images of Prostate cancer

Authors: Souradeep Chakraborty, Ke Ma, Rajarsi Gupta, Beatrice Knudsen, Gregory J. Zelinsky, Joel H. Saltz, Dimitris Samaras

Abstract: We study the attention of pathologists as they examine whole-slide images (WSIs) of prostate cancer tissue using a digital microscope. To the best of our knowledge, our study is the first to report in detail how pathologists navigate WSIs of prostate cancer as they accumulate information for their diagnoses. We collected slide navigation data (i.e., viewport location, magnification level, and time… ▽ More We study the attention of pathologists as they examine whole-slide images (WSIs) of prostate cancer tissue using a digital microscope. To the best of our knowledge, our study is the first to report in detail how pathologists navigate WSIs of prostate cancer as they accumulate information for their diagnoses. We collected slide navigation data (i.e., viewport location, magnification level, and time) from 13 pathologists in 2 groups (5 genitourinary (GU) specialists and 8 general pathologists) and generated visual attention heatmaps and scanpaths. Each pathologist examined five WSIs from the TCGA PRAD dataset, which were selected by a GU pathology specialist. We examined and analyzed the distributions of visual attention for each group of pathologists after each WSI was examined. To quantify the relationship between a pathologist's attention and evidence for cancer in the WSI, we obtained tumor annotations from a genitourinary specialist. We used these annotations to compute the overlap between the distribution of visual attention and annotated tumor region to identify strong correlations. Motivated by this analysis, we trained a deep learning model to predict visual attention on unseen WSIs. We find that the attention heatmaps predicted by our model correlate quite well with the ground truth attention heatmap and tumor annotations on a test set of 17 WSIs by using various spatial and temporal evaluation metrics. △ Less

Submitted 2 May, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: ISBI 2022 (Oral presentation)

arXiv:2202.08195 [pdf, other]

doi 10.1016/j.media.2023.102933

Nuclei Segmentation with Point Annotations from Pathology Images via Self-Supervised Learning and Co-Training

Authors: Yi Lin, Zhiyong Qu, Hao Chen, Zhongke Gao, Yuexiang Li, Lili Xia, Kai Ma, Yefeng Zheng, Kwang-Ting Cheng

Abstract: Nuclei segmentation is a crucial task for whole slide image analysis in digital pathology. Generally, the segmentation performance of fully-supervised learning heavily depends on the amount and quality of the annotated data. However, it is time-consuming and expensive for professional pathologists to provide accurate pixel-level ground truth, while it is much easier to get coarse labels such as po… ▽ More Nuclei segmentation is a crucial task for whole slide image analysis in digital pathology. Generally, the segmentation performance of fully-supervised learning heavily depends on the amount and quality of the annotated data. However, it is time-consuming and expensive for professional pathologists to provide accurate pixel-level ground truth, while it is much easier to get coarse labels such as point annotations. In this paper, we propose a weakly-supervised learning method for nuclei segmentation that only requires point annotations for training. First, coarse pixel-level labels are derived from the point annotations based on the Voronoi diagram and the k-means clustering method to avoid overfitting. Second, a co-training strategy with an exponential moving average method is designed to refine the incomplete supervision of the coarse labels. Third, a self-supervised visual representation learning method is tailored for nuclei segmentation of pathology images that transforms the hematoxylin component images into the H&E stained images to gain better understanding of the relationship between the nuclei and cytoplasm. We comprehensively evaluate the proposed method using two public datasets. Both visual and quantitative results demonstrate the superiority of our method to the state-of-the-art methods, and its competitive performance compared to the fully-supervised methods. Code: https://github.com/hust-linyi/SC-Net △ Less

Submitted 17 August, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Accepted by MedIA

arXiv:2202.03771 [pdf, ps, other]

doi 10.1016/j.apenergy.2022.118636

Energy Management Based on Multi-Agent Deep Reinforcement Learning for A Multi-Energy Industrial Park

Authors: Dafeng Zhu, Bo Yang, Yuxiang Liu, Zhaojian Wang, Kai Ma, ** Guan

Abstract: Owing to large industrial energy consumption, industrial production has brought a huge burden to the grid in terms of renewable energy access and power supply. Due to the coupling of multiple energy sources and the uncertainty of renewable energy and demand, centralized methods require large calculation and coordination overhead. Thus, this paper proposes a multi-energy management framework achiev… ▽ More Owing to large industrial energy consumption, industrial production has brought a huge burden to the grid in terms of renewable energy access and power supply. Due to the coupling of multiple energy sources and the uncertainty of renewable energy and demand, centralized methods require large calculation and coordination overhead. Thus, this paper proposes a multi-energy management framework achieved by decentralized execution and centralized training for an industrial park. The energy management problem is formulated as a partially-observable Markov decision process, which is intractable by dynamic programming due to the lack of the prior knowledge of the underlying stochastic process. The objective is to minimize long-term energy costs while ensuring the demand of users. To solve this issue and improve the calculation speed, a novel multi-agent deep reinforcement learning algorithm is proposed, which contains the following key points: counterfactual baseline for facilitating contributing agents to learn better policies, soft actor-critic for improving robustness and exploring optimal solutions. A novel reward is designed by Lagrange multiplier method to ensure the capacity constraints of energy storage. In addition, considering that the increase in the number of agents leads to performance degradation due to large observation spaces, an attention mechanism is introduced to enhance the stability of policy and enable agents to focus on important energy-related information, which improves the exploration efficiency of soft actor-critic. Numerical results based on actual data verify the performance of the proposed algorithm with high scalability, indicating that the industrial park can minimize energy costs under different demands. △ Less

Submitted 11 February, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: Accepted by Applied Energy

Journal ref: Applied Energy 311 (2022) 118636

arXiv:2201.08388 [pdf, other]

Steerable Pyramid Transform Enables Robust Left Ventricle Quantification

Authors: Xiangyang Zhu, Kede Ma, Wufeng Xue

Abstract: Predicting cardiac indices has long been a focal point in the medical imaging community. While various deep learning models have demonstrated success in quantifying cardiac indices, they remain susceptible to mild input perturbations, e.g., spatial transformations, image distortions, and adversarial attacks. This vulnerability undermines confidence in using learning-based automated systems for dia… ▽ More Predicting cardiac indices has long been a focal point in the medical imaging community. While various deep learning models have demonstrated success in quantifying cardiac indices, they remain susceptible to mild input perturbations, e.g., spatial transformations, image distortions, and adversarial attacks. This vulnerability undermines confidence in using learning-based automated systems for diagnosing cardiovascular diseases. In this work, we describe a simple yet effective method to learn robust models for left ventricle (LV) quantification, encompassing cavity and myocardium areas, directional dimensions, and regional wall thicknesses. Our success hinges on employing the biologically inspired steerable pyramid transform (SPT) for fixed front-end processing, which offers three main benefits. First, the basis functions of SPT align with the anatomical structure of LV and the geometric features of the measured indices. Second, SPT facilitates weight sharing across different orientations as a form of parameter regularization and naturally captures the scale variations of LV. Third, the residual highpass subband can be conveniently discarded, promoting robust feature learning. Extensive experiments on the Cardiac-Dig benchmark show that our SPT-augmented model not only achieves reasonable prediction accuracy compared to state-of-the-art methods, but also exhibits significantly improved robustness against input perturbations. △ Less

Submitted 2 July, 2024; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Code is available at https://github.com/yangyangyang127/RobustLV

arXiv:2112.13227 [pdf, other]

Pseudocylindrical Convolutions for Learned Omnidirectional Image Compression

Authors: Mu Li, Kede Ma, **xing Li, David Zhang

Abstract: Although equirectangular projection (ERP) is a convenient form to store omnidirectional images (also known as 360-degree images), it is neither equal-area nor conformal, thus not friendly to subsequent visual communication. In the context of image compression, ERP will over-sample and deform things and stuff near the poles, making it difficult for perceptually optimal bit allocation. In convention… ▽ More Although equirectangular projection (ERP) is a convenient form to store omnidirectional images (also known as 360-degree images), it is neither equal-area nor conformal, thus not friendly to subsequent visual communication. In the context of image compression, ERP will over-sample and deform things and stuff near the poles, making it difficult for perceptually optimal bit allocation. In conventional 360-degree image compression, techniques such as region-wise packing and tiled representation are introduced to alleviate the over-sampling problem, achieving limited success. In this paper, we make one of the first attempts to learn deep neural networks for omnidirectional image compression. We first describe parametric pseudocylindrical representation as a generalization of common pseudocylindrical map projections. A computationally tractable greedy method is presented to determine the (sub)-optimal configuration of the pseudocylindrical representation in terms of a novel proxy objective for rate-distortion performance. We then propose pseudocylindrical convolutions for 360-degree image compression. Under reasonable constraints on the parametric representation, the pseudocylindrical convolution can be efficiently implemented by standard convolution with the so-called pseudocylindrical padding. To demonstrate the feasibility of our idea, we implement an end-to-end 360-degree image compression system, consisting of the learned pseudocylindrical representation, an analysis transform, a non-uniform quantizer, a synthesis transform, and an entropy model. Experimental results on $19,790$ omnidirectional images show that our method achieves consistently better rate-distortion performance than the competing methods. Moreover, the visual quality by our method is significantly improved for all images at all bitrates. △ Less

Submitted 25 December, 2021; originally announced December 2021.

arXiv:2110.14209 [pdf, ps, other]

Fast Distributed Stochastic Scheduling for A Multi-Energy Industrial Park

Authors: Dafeng Zhu, Bo Yang, Zhaojian Wang, Chengbin Ma, Kai Ma, Shanying Zhu

Abstract: The multi-energy management framework of industrial parks advocates energy conversion and scheduling, which takes full advantage of the compensation and temporal availability of multiple energy. However, how to exploit elastic loads and compensate inelastic loads to match multiple generators and storage is still a key problem under the uncertainty of demand and supply. To solve the issue, the ener… ▽ More The multi-energy management framework of industrial parks advocates energy conversion and scheduling, which takes full advantage of the compensation and temporal availability of multiple energy. However, how to exploit elastic loads and compensate inelastic loads to match multiple generators and storage is still a key problem under the uncertainty of demand and supply. To solve the issue, the energy management problem is constructed as a stochastic optimization problem. The optimization aims are to minimize the time-averaged energy cost and improve the energy efficiency while respecting the energy constraints. To achieve the distributed implementation in real time without knowing any priori knowledge of underlying stochastic process, a distributed stochastic gradient algorithm based on dual decomposition and a fast scheme are proposed. The numerical results based on real data show that the industrial park, by adopting the proposed algorithm, can achieve social welfare maximization asymptotically. △ Less

Submitted 24 May, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

arXiv:2110.09699 [pdf, ps, other]

Image Quality Assessment in the Modern Age

Authors: Kede Ma, Yuming Fang

Abstract: This tutorial provides the audience with the basic theories, methodologies, and current progresses of image quality assessment (IQA). From an actionable perspective, we will first revisit several subjective quality assessment methodologies, with emphasis on how to properly select visual stimuli. We will then present in detail the design principles of objective quality assessment models, supplement… ▽ More This tutorial provides the audience with the basic theories, methodologies, and current progresses of image quality assessment (IQA). From an actionable perspective, we will first revisit several subjective quality assessment methodologies, with emphasis on how to properly select visual stimuli. We will then present in detail the design principles of objective quality assessment models, supplemented by an in-depth analysis of their advantages and disadvantages. Both hand-engineered and (deep) learning-based methods will be covered. Moreover, the limitations with the conventional model comparison methodology for objective quality models will be pointed out, and novel comparison methodologies such as those based on the theory of "analysis by synthesis" will be introduced. We will last discuss the real-world multimedia applications of IQA, and give a list of open challenging problems, in the hope of encouraging more and more talented researchers and engineers devoting to this exciting and rewarding research field. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: ACM Multimedia 2021 Tutorial

arXiv:2110.08521 [pdf, other]

doi 10.1145/3474085.3475419

Locally Adaptive Structure and Texture Similarity for Image Quality Assessment

Authors: Keyan Ding, Yi Liu, Xueyi Zou, Shiqi Wang, Kede Ma

Abstract: The latest advances in full-reference image quality assessment (IQA) involve unifying structure and texture similarity based on deep representations. The resulting Deep Image Structure and Texture Similarity (DISTS) metric, however, makes rather global quality measurements, ignoring the fact that natural photographic images are locally structured and textured across space and scale. In this paper,… ▽ More The latest advances in full-reference image quality assessment (IQA) involve unifying structure and texture similarity based on deep representations. The resulting Deep Image Structure and Texture Similarity (DISTS) metric, however, makes rather global quality measurements, ignoring the fact that natural photographic images are locally structured and textured across space and scale. In this paper, we describe a locally adaptive structure and texture similarity index for full-reference IQA, which we term A-DISTS. Specifically, we rely on a single statistical feature, namely the dispersion index, to localize texture regions at different scales. The estimated probability (of one patch being texture) is in turn used to adaptively pool local structure and texture measurements. The resulting A-DISTS is adapted to local image content, and is free of expensive human perceptual scores for supervised training. We demonstrate the advantages of A-DISTS in terms of correlation with human data on ten IQA databases and optimization of single image super-resolution methods. △ Less

Submitted 16 October, 2021; originally announced October 2021.

Journal ref: Proceedings of the 29th ACM International Conference on Multimedia, 2021

arXiv:2109.13930 [pdf, other]

All-Around Real Label Supervision: Cyclic Prototype Consistency Learning for Semi-supervised Medical Image Segmentation

Authors: Zhe Xu, Yixin Wang, Donghuan Lu, Lequan Yu, Jiangpeng Yan, Jie Luo, Kai Ma, Yefeng Zheng, Raymond Kai-yu Tong

Abstract: Semi-supervised learning has substantially advanced medical image segmentation since it alleviates the heavy burden of acquiring the costly expert-examined annotations. Especially, the consistency-based approaches have attracted more attention for their superior performance, wherein the real labels are only utilized to supervise their paired images via supervised loss while the unlabeled images ar… ▽ More Semi-supervised learning has substantially advanced medical image segmentation since it alleviates the heavy burden of acquiring the costly expert-examined annotations. Especially, the consistency-based approaches have attracted more attention for their superior performance, wherein the real labels are only utilized to supervise their paired images via supervised loss while the unlabeled images are exploited by enforcing the perturbation-based \textit{"unsupervised"} consistency without explicit guidance from those real labels. However, intuitively, the expert-examined real labels contain more reliable supervision signals. Observing this, we ask an unexplored but interesting question: can we exploit the unlabeled data via explicit real label supervision for semi-supervised training? To this end, we discard the previous perturbation-based consistency but absorb the essence of non-parametric prototype learning. Based on the prototypical network, we then propose a novel cyclic prototype consistency learning (CPCL) framework, which is constructed by a labeled-to-unlabeled (L2U) prototypical forward process and an unlabeled-to-labeled (U2L) backward process. Such two processes synergistically enhance the segmentation network by encouraging more discriminative and compact features. In this way, our framework turns previous \textit{"unsupervised"} consistency into new \textit{"supervised"} consistency, obtaining the \textit{"all-around real label supervision"} property of our method. Extensive experiments on brain tumor segmentation from MRI and kidney segmentation from CT images show that our CPCL can effectively exploit the unlabeled data and outperform other state-of-the-art semi-supervised medical image segmentation methods. △ Less

Submitted 15 March, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: 11 pages

arXiv:2109.11715 [pdf, other]

Training Automatic View Planner for Cardiac MR Imaging via Self-Supervision by Spatial Relationship between Views

Authors: Dong Wei, Kai Ma, Yefeng Zheng

Abstract: View planning for the acquisition of cardiac magnetic resonance imaging (CMR) requires acquaintance with the cardiac anatomy and remains a challenging task in clinical practice. Existing approaches to its automation relied either on an additional volumetric image not typically acquired in clinic routine, or on laborious manual annotations of cardiac structural landmarks. This work presents a clini… ▽ More View planning for the acquisition of cardiac magnetic resonance imaging (CMR) requires acquaintance with the cardiac anatomy and remains a challenging task in clinical practice. Existing approaches to its automation relied either on an additional volumetric image not typically acquired in clinic routine, or on laborious manual annotations of cardiac structural landmarks. This work presents a clinic-compatible and annotation-free system for automatic CMR view planning. The system mines the spatial relationship -- more specifically, locates and exploits the intersecting lines -- between the source and target views, and trains deep networks to regress heatmaps defined by these intersecting lines. As the spatial relationship is self-contained in properly stored data, e.g., in the DICOM format, the need for manual annotation is eliminated. Then, a multi-view planning strategy is proposed to aggregate information from the predicted heatmaps for all the source views of a target view, for a globally optimal prescription. The multi-view aggregation mimics the similar strategy practiced by skilled human prescribers. Experimental results on 181 clinical CMR exams show that our system achieves superior accuracy to existing approaches including conventional atlas-based and newer deep learning based ones, in prescribing four standard CMR views. The mean angle difference and point-to-plane distance evaluated against the ground truth planes are 5.98 degrees and 3.48 mm, respectively. △ Less

Submitted 23 September, 2021; originally announced September 2021.

Comments: Accepted by MICCAI 2021

arXiv:2109.05298 [pdf, other]

InDuDoNet: An Interpretable Dual Domain Network for CT Metal Artifact Reduction

Authors: Hong Wang, Yuexiang Li, Haimiao Zhang, Jiawei Chen, Kai Ma, Deyu Meng, Yefeng Zheng

Abstract: For the task of metal artifact reduction (MAR), although deep learning (DL)-based methods have achieved promising performances, most of them suffer from two problems: 1) the CT imaging geometry constraint is not fully embedded into the network during training, leaving room for further performance improvement; 2) the model interpretability is lack of sufficient consideration. Against these issues,… ▽ More For the task of metal artifact reduction (MAR), although deep learning (DL)-based methods have achieved promising performances, most of them suffer from two problems: 1) the CT imaging geometry constraint is not fully embedded into the network during training, leaving room for further performance improvement; 2) the model interpretability is lack of sufficient consideration. Against these issues, we propose a novel interpretable dual domain network, termed as InDuDoNet, which combines the advantages of model-driven and data-driven methodologies. Specifically, we build a joint spatial and Radon domain reconstruction model and utilize the proximal gradient technique to design an iterative algorithm for solving it. The optimization algorithm only consists of simple computational operators, which facilitate us to correspondingly unfold iterative steps into network modules and thus improve the interpretablility of the framework. Extensive experiments on synthesized and clinical data show the superiority of our InDuDoNet. Code is available in \url{https://github.com/hongwang01/InDuDoNet}.%method on the tasks of MAR and downstream multi-class pelvic fracture segmentation. △ Less

Submitted 11 September, 2021; originally announced September 2021.

Journal ref: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2021

Showing 1–50 of 94 results for author: Mao, K