Search | arXiv e-print repository

MMR-Mamba: Multi-Contrast MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion

Authors: **g Zou, Lanqing Liu, Qi Chen, Shujun Wang, Xiaohan Xing, **g Qin

Abstract: Multi-contrast MRI acceleration has become prevalent in MR imaging, enabling the reconstruction of high-quality MR images from under-sampled k-space data of the target modality, using guidance from a fully-sampled auxiliary modality. The main crux lies in efficiently and comprehensively integrating complementary information from the auxiliary modality. Existing methods either suffer from quadratic… ▽ More Multi-contrast MRI acceleration has become prevalent in MR imaging, enabling the reconstruction of high-quality MR images from under-sampled k-space data of the target modality, using guidance from a fully-sampled auxiliary modality. The main crux lies in efficiently and comprehensively integrating complementary information from the auxiliary modality. Existing methods either suffer from quadratic computational complexity or fail to capture long-range correlated features comprehensively. In this work, we propose MMR-Mamba, a novel framework that achieves comprehensive integration of multi-contrast features through Mamba and spatial-frequency information fusion. Firstly, we design the \textit{Target modality-guided Cross Mamba} (TCM) module in the spatial domain, which maximally restores the target modality information by selectively absorbing useful information from the auxiliary modality. Secondly, leveraging global properties of the Fourier domain, we introduce the \textit{Selective Frequency Fusion} (SFF) module to efficiently integrate global information in the frequency domain and recover high-frequency signals for the reconstruction of structure details. Additionally, we present the \textit{Adaptive Spatial-Frequency Fusion} (ASFF) module, which enhances fused features by supplementing less informative features from one domain with corresponding features from the other domain. These innovative strategies ensure efficient feature fusion across spatial and frequency domains, avoiding the introduction of redundant information and facilitating the reconstruction of high-quality target images. Extensive experiments on the BraTS and fastMRI knee datasets demonstrate the superiority of the proposed MMR-Mamba over state-of-the-art MRI reconstruction methods. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 10 pages, 5 figure

arXiv:2405.19665 [pdf]

A novel fault localization with data refinement for hydroelectric units

Authors: Jialong Huang, Junlin Song, Penglong Lian, Mengjie Gan, Zhiheng Su, Benhao Wang, Wenji Zhu, Xiaomin Pu, Jianxiao Zou, Shicai Fan

Abstract: Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni… ▽ More Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learning (SG-WMBDL) based fault localization method for hydroelectric units is proposed. To overcome the data scarcity, a SAE is embedded into the GAN to generate more high-quality samples in the data generation module. Considering the signals involving non-linear and non-smooth characteristics, the improved WNR which combining both soft and hard thresholding and local linear embedding (LLE) are utilized to the data preprocessing module in order to reduce the noise and effectively capture the local features. In addition, to seek higher performance, the novel Adaptive Boost (AdaBoost) combined with multi deep learning is proposed to achieve accurate fault localization. The experimental results show that the SG-WMBDL can locate faults for hydroelectric units under a small number of fault samples with non-linear and non-smooth characteristics on higher precision and accuracy compared to other frontier methods, which verifies the effectiveness and practicality of the proposed method. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 6pages,4 figures,Conference on Decision and Control(CDC) conference

arXiv:2405.17766 [pdf, other]

SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals

Authors: Rahul Thapa, Bryan He, Magnus Ruud Kjaer, Hyatt Moore, Gauri Ganjoo, Emmanuel Mignot, James Zou

Abstract: Sleep is a complex physiological process evaluated through various modalities recording electrical brain, cardiac, and respiratory activities. We curate a large polysomnography dataset from over 14,000 participants comprising over 100,000 hours of multi-modal sleep recordings. Leveraging this extensive dataset, we developed SleepFM, the first multi-modal foundation model for sleep analysis. We sho… ▽ More Sleep is a complex physiological process evaluated through various modalities recording electrical brain, cardiac, and respiratory activities. We curate a large polysomnography dataset from over 14,000 participants comprising over 100,000 hours of multi-modal sleep recordings. Leveraging this extensive dataset, we developed SleepFM, the first multi-modal foundation model for sleep analysis. We show that a novel leave-one-out approach for contrastive learning significantly improves downstream task performance compared to representations from standard pairwise contrastive learning. A logistic regression model trained on SleepFM's learned embeddings outperforms an end-to-end trained convolutional neural network (CNN) on sleep stage classification (macro AUROC 0.88 vs 0.72 and macro AUPRC 0.72 vs 0.48) and sleep disordered breathing detection (AUROC 0.85 vs 0.69 and AUPRC 0.77 vs 0.61). Notably, the learned embeddings achieve 48% top-1 average accuracy in retrieving the corresponding recording clips of other modalities from 90,000 candidates. This work demonstrates the value of holistic multi-modal sleep modeling to fully capture the richness of sleep recordings. SleepFM is open source and available at https://github.com/rthapa84/sleepfm-codebase. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.05715 [pdf, other]

Shifting the ISAC Trade-Off with Fluid Antenna Systems

Authors: Jiaqi Zou, Hao Xu, Chao Wang, Lvxin Xu, Songlin Sun, Kaitao Meng, Christos Masouros, Kai-Kit Wong

Abstract: As an emerging antenna technology, a fluid antenna system (FAS) enhances spatial diversity to improve both sensing and communication performance by shifting the active antennas among available ports. In this letter, we study the potential of shifting the integrated sensing and communication (ISAC) trade-off with FAS. We propose the model for FAS-enabled ISAC and jointly optimize the transmit beamf… ▽ More As an emerging antenna technology, a fluid antenna system (FAS) enhances spatial diversity to improve both sensing and communication performance by shifting the active antennas among available ports. In this letter, we study the potential of shifting the integrated sensing and communication (ISAC) trade-off with FAS. We propose the model for FAS-enabled ISAC and jointly optimize the transmit beamforming and port selection of FAS. In particular, we aim to minimize the transmit power, while satisfying both communication and sensing requirements. An efficient iterative algorithm based on sparse optimization, convex approximation, and a penalty approach is developed. The simulation results show that the proposed scheme can attain 33% reductions in transmit power with guaranteed sensing and communication performance, showing the great potential of the fluid antenna for striking a flexible tradeoff between sensing and communication in ISAC systems. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 5 pages, 5 figures

arXiv:2404.03209 [pdf, other]

CSR-dMRI: Continuous Super-Resolution of Diffusion MRI with Anatomical Structure-assisted Implicit Neural Representation Learning

Authors: Ruoyou Wu, Jian Cheng, Cheng Li, Juan Zou, **g Yang, Wenxin Fan, Shanshan Wang

Abstract: Deep learning-based dMRI super-resolution methods can effectively enhance image resolution by leveraging the learning capabilities of neural networks on large datasets. However, these methods tend to learn a fixed scale map** between low-resolution (LR) and high-resolution (HR) images, overlooking the need for radiologists to scale the images at arbitrary resolutions. Moreover, the pixel-wise lo… ▽ More Deep learning-based dMRI super-resolution methods can effectively enhance image resolution by leveraging the learning capabilities of neural networks on large datasets. However, these methods tend to learn a fixed scale map** between low-resolution (LR) and high-resolution (HR) images, overlooking the need for radiologists to scale the images at arbitrary resolutions. Moreover, the pixel-wise loss in the image domain tends to generate over-smoothed results, losing fine textures and edge information. To address these issues, we propose a novel continuous super-resolution of dMRI with anatomical structure-assisted implicit neural representation learning method, called CSR-dMRI. Specifically, the CSR-dMRI model consists of two components. The first is the latent feature extractor, which primarily extracts latent space feature maps from LR dMRI and anatomical images while learning structural prior information from the anatomical images. The second is the implicit function network, which utilizes voxel coordinates and latent feature vectors to generate voxel intensities at corresponding positions. Additionally, a frequency-domain-based loss is introduced to preserve the structural and texture information, further enhancing the image quality. Extensive experiments on the publicly available HCP dataset validate the effectiveness of our approach. Furthermore, our method demonstrates superior generalization capability and can be applied to arbitrary-scale super-resolution, including non-integer scale factors, expanding its applicability beyond conventional approaches. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2403.07721 [pdf, other]

Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

Authors: Dongyang Li, Chen Wei, Shiying Li, Jiachen Zou, Quanying Liu

Abstract: How to decode human vision through neural signals has attracted a long-standing interest in neuroscience and machine learning. Modern contrastive learning and generative models improved the performance of fMRI-based visual decoding and reconstruction. However, the high cost and low temporal resolution of fMRI limit their applications in brain-computer interfaces (BCIs), prompting a high need for E… ▽ More How to decode human vision through neural signals has attracted a long-standing interest in neuroscience and machine learning. Modern contrastive learning and generative models improved the performance of fMRI-based visual decoding and reconstruction. However, the high cost and low temporal resolution of fMRI limit their applications in brain-computer interfaces (BCIs), prompting a high need for EEG-based visual reconstruction. In this study, we present an EEG-based visual reconstruction framework. It consists of a plug-and-play EEG encoder called the Adaptive Thinking Mapper (ATM), which is aligned with image embeddings, and a two-stage EEG guidance image generator that first transforms EEG features into image priors and then reconstructs the visual stimuli with a pre-trained image generator. Our approach allows EEG embeddings to achieve superior performance in image classification and retrieval tasks. Our two-stage image generation strategy vividly reconstructs images seen by humans. Furthermore, we analyzed the impact of signals from different time windows and brain regions on decoding and reconstruction. The versatility of our framework is demonstrated in the magnetoencephalogram (MEG) data modality. We report that EEG-based visual decoding achieves SOTA performance, highlighting the portability, low cost, and high temporal resolution of EEG, enabling a wide range of BCI applications. The code of ATM is available at https://github.com/dongyangli-del/EEG_Image_decode. △ Less

Submitted 4 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.02419 [pdf, other]

Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems

Authors: Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou

Abstract: Many recent state-of-the-art results in language tasks were achieved using compound systems that perform multiple Language Model (LM) calls and aggregate their responses. However, there is little understanding of how the number of LM calls - e.g., when asking the LM to answer each question multiple times and taking a majority vote - affects such a compound system's performance. In this paper, we i… ▽ More Many recent state-of-the-art results in language tasks were achieved using compound systems that perform multiple Language Model (LM) calls and aggregate their responses. However, there is little understanding of how the number of LM calls - e.g., when asking the LM to answer each question multiple times and taking a majority vote - affects such a compound system's performance. In this paper, we initiate the study of scaling properties of compound inference systems. We analyze, theoretically and empirically, how the number of LM calls affects the performance of Vote and Filter-Vote, two of the simplest compound system designs, which aggregate LM responses via majority voting, optionally applying LM filters. We find, surprisingly, that across multiple language tasks, the performance of both Vote and Filter-Vote can first increase but then decrease as a function of the number of LM calls. Our theoretical results suggest that this non-monotonicity is due to the diversity of query difficulties within a task: more LM calls lead to higher performance on "easy" queries, but lower performance on "hard" queries, and non-monotone behavior can emerge when a task contains both types of queries. This insight then allows us to compute, from a small number of samples, the number of LM calls that maximizes system performance, and define an analytical scaling model for both systems. Experiments show that our scaling model can accurately predict the performance of Vote and Filter-Vote systems and thus find the optimal number of LM calls to make. △ Less

Submitted 4 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.07595 [pdf, other]

Comparative Analysis of ImageNet Pre-Trained Deep Learning Models and DINOv2 in Medical Imaging Classification

Authors: Yuning Huang, **gchen Zou, Lanxi Meng, Xin Yue, Qing Zhao, Jianqiang Li, Changwei Song, Gabriel Jimenez, Shaowu Li, Guanghui Fu

Abstract: Medical image analysis frequently encounters data scarcity challenges. Transfer learning has been effective in addressing this issue while conserving computational resources. The recent advent of foundational models like the DINOv2, which uses the vision transformer architecture, has opened new opportunities in the field and gathered significant interest. However, DINOv2's performance on clinical… ▽ More Medical image analysis frequently encounters data scarcity challenges. Transfer learning has been effective in addressing this issue while conserving computational resources. The recent advent of foundational models like the DINOv2, which uses the vision transformer architecture, has opened new opportunities in the field and gathered significant interest. However, DINOv2's performance on clinical data still needs to be verified. In this paper, we performed a glioma grading task using three clinical modalities of brain MRI data. We compared the performance of various pre-trained deep learning models, including those based on ImageNet and DINOv2, in a transfer learning context. Our focus was on understanding the impact of the freezing mechanism on performance. We also validated our findings on three other types of public datasets: chest radiography, fundus radiography, and dermoscopy. Our findings indicate that in our clinical dataset, DINOv2's performance was not as strong as ImageNet-based pre-trained models, whereas in public datasets, DINOv2 generally outperformed other models, especially when using the frozen mechanism. Similar performance was observed with various sizes of DINOv2 models across different tasks. In summary, DINOv2 is viable for medical image classification tasks, particularly with data resembling natural images. However, its effectiveness may vary with data that significantly differs from natural images such as MRI. In addition, employing smaller versions of the model can be adequate for medical task, offering resource-saving benefits. Our codes are available at https://github.com/GuanghuiFU/medical_DINOv2_eval. △ Less

Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.01693 [pdf, other]

AID-DTI: Accelerating High-fidelity Diffusion Tensor Imaging with Detail-Preserving Model-based Deep Learning

Authors: Wenxin Fan, Jian Cheng, Cheng Li, Xinrui Ma, **g Yang, Juan Zou, Ruoyou Wu, Qiegen Liu, Shanshan Wang

Abstract: Deep learning has shown great potential in accelerating diffusion tensor imaging (DTI). Nevertheless, existing methods tend to suffer from Rician noise and detail loss in reconstructing the DTI-derived parametric maps especially when sparsely sampled q-space data are used. This paper proposes a novel method, AID-DTI (Accelerating hIgh fiDelity Diffusion Tensor Imaging), to facilitate fast and accu… ▽ More Deep learning has shown great potential in accelerating diffusion tensor imaging (DTI). Nevertheless, existing methods tend to suffer from Rician noise and detail loss in reconstructing the DTI-derived parametric maps especially when sparsely sampled q-space data are used. This paper proposes a novel method, AID-DTI (Accelerating hIgh fiDelity Diffusion Tensor Imaging), to facilitate fast and accurate DTI with only six measurements. AID-DTI is equipped with a newly designed Singular Value Decomposition (SVD)-based regularizer, which can effectively capture fine details while suppressing noise during network training. Experimental results on Human Connectome Project (HCP) data consistently demonstrate that the proposed method estimates DTI parameter maps with fine-grained details and outperforms three state-of-the-art methods both quantitatively and qualitatively. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.00981 [pdf, other]

Securing the Sensing Functionality in ISAC Networks: An Artificial Noise Design

Authors: Jiaqi Zou, Christos Masouros, Fan Liu, Songlin Sun

Abstract: Integrated sensing and communications (ISAC) systems employ dual-functional signals to simultaneously accomplish radar sensing and wireless communication tasks. However, ISAC systems open up new sensing security vulnerabilities to malicious illegitimate eavesdroppers (Eves) that can also exploit the transmitted waveform to extract sensing information from the environment. In this paper, we investi… ▽ More Integrated sensing and communications (ISAC) systems employ dual-functional signals to simultaneously accomplish radar sensing and wireless communication tasks. However, ISAC systems open up new sensing security vulnerabilities to malicious illegitimate eavesdroppers (Eves) that can also exploit the transmitted waveform to extract sensing information from the environment. In this paper, we investigate the beamforming design to enhance the sensing security of an ISAC system, where the communication user (CU) serves as a sensing Eve. Our objective is to maximize the mutual information (MI) for the legitimate radar sensing receiver while considering the constraint of the MI for the Eve and the quality of service to the CUs. Then, we consider the artificial noise (AN)-aided beamforming to further enhance the sensing security. Simulation results demonstrate that our proposed methods achieve MI improvement of the legitimate receiver while limiting the sensing MI of the Eve, compared with the baseline scheme, and that the utilization of AN further contributes to sensing security. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 5 pages

arXiv:2311.13028 [pdf, other]

DMLR: Data-centric Machine Learning Research -- Past, Present and Future

Authors: Luis Oala, Manil Maskey, Lilith Bat-Leah, Alicia Parrish, Nezihe Merve Gürel, Tzu-Sheng Kuo, Yang Liu, Rotem Dror, Danilo Brajovic, Xiaozhe Yao, Max Bartolo, William A Gaviria Rojas, Ryan Hileman, Rainier Aliment, Michael W. Mahoney, Meg Risdal, Matthew Lease, Wojciech Samek, Debojyoti Dutta, Curtis G Northcutt, Cody Coleman, Braden Hancock, Bernard Koch, Girmaw Abebe Tadesse, Bojan Karlaš , et al. (13 additional authors not shown)

Abstract: Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods tow… ▽ More Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact. △ Less

Submitted 1 June, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: Published in the Journal of Data-centric Machine Learning Research (DMLR) at https://data.mlr.press/assets/pdf/v01-5.pdf

arXiv:2311.08758 [pdf, other]

A Novel Tree Model-based DNN to Achieve a High-Resolution DOA Estimation via Massive MIMO receive array

Authors: Yifan Li, Feng Shu, Jun Zou, Wei Gao, Yaoliang Song, Jiangzhou Wang

Abstract: To satisfy the high-resolution requirements of direction-of-arrival (DOA) estimation, conventional deep neural network (DNN)-based methods using grid idea need to significantly increase the number of output classifications and also produce a huge high model complexity. To address this problem, a multi-level tree-based DNN model (TDNN) is proposed as an alternative, where each level takes small-sca… ▽ More To satisfy the high-resolution requirements of direction-of-arrival (DOA) estimation, conventional deep neural network (DNN)-based methods using grid idea need to significantly increase the number of output classifications and also produce a huge high model complexity. To address this problem, a multi-level tree-based DNN model (TDNN) is proposed as an alternative, where each level takes small-scale multi-layer neural networks (MLNNs) as nodes to divide the target angular interval into multiple sub-intervals, and each output class is associated to a MLNN at the next level. Then the number of MLNNs is gradually increasing from the first level to the last level, and so increasing the depth of tree will dramatically raise the number of output classes to improve the estimation accuracy. More importantly, this network is extended to make a multi-emitter DOA estimation. Simulation results show that the proposed TDNN performs much better than conventional DNN and root-MUSIC at extremely low signal-to-noise ratio (SNR), and can achieve Cramer-Rao lower bound (CRLB). Additionally, in the multi-emitter scenario, the proposed Q-TDNN has also made a substantial performance enhancement over DNN and Root-MUSIC, and this gain grows as the number of emitters increases. △ Less

Submitted 12 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.16387 [pdf, other]

Frequency-Aware Transformer for Learned Image Compression

Authors: Han Li, Shaohui Li, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

Abstract: Learned image compression (LIC) has gained traction as an effective solution for image storage and transmission in recent years. However, existing LIC methods are redundant in latent representation due to limitations in capturing anisotropic frequency components and preserving directional details. To overcome these challenges, we propose a novel frequency-aware transformer (FAT) block that for the… ▽ More Learned image compression (LIC) has gained traction as an effective solution for image storage and transmission in recent years. However, existing LIC methods are redundant in latent representation due to limitations in capturing anisotropic frequency components and preserving directional details. To overcome these challenges, we propose a novel frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for LIC. The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images. Additionally, we introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance. Furthermore, we present a transformer-based channel-wise autoregressive (T-CA) model that effectively exploits channel dependencies. Experiments show that our method achieves state-of-the-art rate-distortion performance compared to existing LIC methods, and evidently outperforms latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in BD-rate on the Kodak, Tecnick, and CLIC datasets. △ Less

Submitted 21 March, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: ICLR2024 poster

arXiv:2309.08415 [pdf]

A new method of modeling the multi-stage decision-making process of CRT using machine learning with uncertainty quantification

Authors: Kristoffer Larsen, Chen Zhao, Joyce Keyak, Qiuying Sha, Diana Paez, Xinwei Zhang, Guang-Uei Hung, Jiangang Zou, Amalia Peix, Weihua Zhou

Abstract: Aims. The purpose of this study is to create a multi-stage machine learning model to predict cardiac resynchronization therapy (CRT) response for heart failure (HF) patients. This model exploits uncertainty quantification to recommend additional collection of single-photon emission computed tomography myocardial perfusion imaging (SPECT MPI) variables if baseline clinical variables and features fr… ▽ More Aims. The purpose of this study is to create a multi-stage machine learning model to predict cardiac resynchronization therapy (CRT) response for heart failure (HF) patients. This model exploits uncertainty quantification to recommend additional collection of single-photon emission computed tomography myocardial perfusion imaging (SPECT MPI) variables if baseline clinical variables and features from electrocardiogram (ECG) are not sufficient. Methods. 218 patients who underwent rest-gated SPECT MPI were enrolled in this study. CRT response was defined as an increase in left ventricular ejection fraction (LVEF) > 5% at a 6+-1 month follow-up. A multi-stage ML model was created by combining two ensemble models: Ensemble 1 was trained with clinical variables and ECG; Ensemble 2 included Ensemble 1 plus SPECT MPI features. Uncertainty quantification from Ensemble 1 allowed for multi-stage decision-making to determine if the acquisition of SPECT data for a patient is necessary. The performance of the multi-stage model was compared with that of Ensemble models 1 and 2. Results. The response rate for CRT was 55.5% (n = 121) with overall male gender 61.0% (n = 133), an average age of 62.0+-11.8, and LVEF of 27.7+-11.0. The multi-stage model performed similarly to Ensemble 2 (which utilized the additional SPECT data) with AUC of 0.75 vs. 0.77, accuracy of 0.71 vs. 0.69, sensitivity of 0.70 vs. 0.72, and specificity 0.72 vs. 0.65, respectively. However, the multi-stage model only required SPECT MPI data for 52.7% of the patients across all folds. Conclusions. By using rule-based logic stemming from uncertainty quantification, the multi-stage model was able to reduce the need for additional SPECT MPI data acquisition without sacrificing performance. △ Less

Submitted 28 April, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 30 pages,6 figures. arXiv admin note: text overlap with arXiv:2305.02475

arXiv:2308.13995 [pdf, other]

Generalizable Learning Reconstruction for Accelerating MR Imaging via Federated Neural Architecture Search

Authors: Ruoyou Wu, Cheng Li, Juan Zou, Shanshan Wang

Abstract: Heterogeneous data captured by different scanning devices and imaging protocols can affect the generalization performance of the deep learning magnetic resonance (MR) reconstruction model. While a centralized training model is effective in mitigating this problem, it raises concerns about privacy protection. Federated learning is a distributed training paradigm that can utilize multi-institutional… ▽ More Heterogeneous data captured by different scanning devices and imaging protocols can affect the generalization performance of the deep learning magnetic resonance (MR) reconstruction model. While a centralized training model is effective in mitigating this problem, it raises concerns about privacy protection. Federated learning is a distributed training paradigm that can utilize multi-institutional data for collaborative training without sharing data. However, existing federated learning MR image reconstruction methods rely on models designed manually by experts, which are complex and computational expensive, suffering from performance degradation when facing heterogeneous data distributions. In addition, these methods give inadequate consideration to fairness issues, namely, ensuring that the model's training does not introduce bias towards any specific dataset's distribution. To this end, this paper proposes a generalizable federated neural architecture search framework for accelerating MR imaging (GAutoMRI). Specifically, automatic neural architecture search is investigated for effective and efficient neural network representation learning of MR images from different centers. Furthermore, we design a fairness adjustment approach that can enable the model to learn features fairly from inconsistent distributions of different devices and centers, and thus enforce the model generalize to the unseen center. Extensive experiments show that our proposed GAutoMRI has better performances and generalization ability compared with six state-of-the-art federated learning methods. Moreover, the GAutoMRI model is significantly more lightweight, making it an efficient choice for MR image reconstruction tasks. The code will be made available at https://github.com/ternencewu123/GAutoMRI. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: 10 pages

arXiv:2307.11538 [pdf, other]

FedAutoMRI: Federated Neural Architecture Search for MR Image Reconstruction

Authors: Ruoyou Wu, Cheng Li, Juan Zou, Shanshan Wang

Abstract: Centralized training methods have shown promising results in MR image reconstruction, but privacy concerns arise when gathering data from multiple institutions. Federated learning, a distributed collaborative training scheme, can utilize multi-center data without the need to transfer data between institutions. However, existing federated learning MR image reconstruction methods rely on manually de… ▽ More Centralized training methods have shown promising results in MR image reconstruction, but privacy concerns arise when gathering data from multiple institutions. Federated learning, a distributed collaborative training scheme, can utilize multi-center data without the need to transfer data between institutions. However, existing federated learning MR image reconstruction methods rely on manually designed models which have extensive parameters and suffer from performance degradation when facing heterogeneous data distributions. To this end, this paper proposes a novel FederAted neUral archiTecture search approach fOr MR Image reconstruction (FedAutoMRI). The proposed method utilizes differentiable architecture search to automatically find the optimal network architecture. In addition, an exponential moving average method is introduced to improve the robustness of the client model to address the data heterogeneity issue. To the best of our knowledge, this is the first work to use federated neural architecture search for MR image reconstruction. Experimental results demonstrate that our proposed FedAutoMRI can achieve promising performances while utilizing a lightweight model with only a small number of model parameters compared to the classical federated learning methods. △ Less

Submitted 21 July, 2023; originally announced July 2023.

Comments: 10 pages

arXiv:2307.04002 [pdf, other]

Energy-Efficient Beamforming Design for Integrated Sensing and Communications Systems

Authors: Jiaqi Zou, Songlin Sun, Christos Masouros, Yuanhao Cui, Yafeng Liu, Derrick Wing Kwan Ng

Abstract: In this paper, we investigate the design of energy-efficient beamforming for an ISAC system, where the transmitted waveform is optimized for joint multi-user communication and target estimation simultaneously. We aim to maximize the system energy efficiency (EE), taking into account the constraints of a maximum transmit power budget, a minimum required signal-to-interference-plus-noise ratio (SINR… ▽ More In this paper, we investigate the design of energy-efficient beamforming for an ISAC system, where the transmitted waveform is optimized for joint multi-user communication and target estimation simultaneously. We aim to maximize the system energy efficiency (EE), taking into account the constraints of a maximum transmit power budget, a minimum required signal-to-interference-plus-noise ratio (SINR) for communication, and a maximum tolerable Cramer-Rao bound (CRB) for target estimation. We first consider communication-centric EE maximization. To handle the non-convex fractional objective function, we propose an iterative quadratic-transform-Dinkelbach method, where Schur complement and semi-definite relaxation (SDR) techniques are leveraged to solve the subproblem in each iteration. For the scenarios where sensing is critical, we propose a novel performance metric for characterizing the sensing-centric EE and optimize the metric adopted in the scenario of sensing a point-like target and an extended target. To handle the nonconvexity, we employ the successive convex approximation (SCA) technique to develop an efficient algorithm for approximating the nonconvex problem as a sequence of convex ones. Furthermore, we adopt a Pareto optimization mechanism to articulate the tradeoff between the communication-centric EE and sensing-centric EE. We formulate the search of the Pareto boundary as a constrained optimization problem and propose a computationally efficient algorithm to handle it. Numerical results validate the effectiveness of our proposed algorithms compared with the baseline schemes and the obtained approximate Pareto boundary shows that there is a non-trivial tradeoff between communication-centric EE and sensing-centric EE, where the number of communication users and EE requirements have serious effects on the achievable tradeoff. △ Less

Submitted 8 July, 2023; originally announced July 2023.

arXiv:2306.01210 [pdf]

A new method using deep transfer learning on ECG to predict the response to cardiac resynchronization therapy

Authors: Zhuo He, Hong** Si, Xinwei Zhang, Qing-Hui Chen, Jiangang Zou, Weihua Zhou

Abstract: Background: Cardiac resynchronization therapy (CRT) has emerged as an effective treatment for heart failure patients with electrical dyssynchrony. However, accurately predicting which patients will respond to CRT remains a challenge. This study explores the application of deep transfer learning techniques to train a predictive model for CRT response. Methods: In this study, the short-time Fourier… ▽ More Background: Cardiac resynchronization therapy (CRT) has emerged as an effective treatment for heart failure patients with electrical dyssynchrony. However, accurately predicting which patients will respond to CRT remains a challenge. This study explores the application of deep transfer learning techniques to train a predictive model for CRT response. Methods: In this study, the short-time Fourier transform (STFT) technique was employed to transform ECG signals into two-dimensional images. A transfer learning approach was then applied on the MIT-BIT ECG database to pre-train a convolutional neural network (CNN) model. The model was fine-tuned to extract relevant features from the ECG images, and then tested on our dataset of CRT patients to predict their response. Results: Seventy-one CRT patients were enrolled in this study. The transfer learning model achieved an accuracy of 72% in distinguishing responders from non-responders in the local dataset. Furthermore, the model showed good sensitivity (0.78) and specificity (0.79) in identifying CRT responders. The performance of our model outperformed clinic guidelines and traditional machine learning approaches. Conclusion: The utilization of ECG images as input and leveraging the power of transfer learning allows for improved accuracy in identifying CRT responders. This approach offers potential for enhancing patient selection and improving outcomes of CRT. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.06066 [pdf, other]

Self-Supervised Federated Learning for Fast MR Imaging

Authors: Juan Zou, Cheng Li, Ruoyou Wu, Tingrui Pei, Hairong Zheng, Shanshan Wang

Abstract: Federated learning (FL) based magnetic resonance (MR) image reconstruction can facilitate learning valuable priors from multi-site institutions without violating patient's privacy for accelerating MR imaging. However, existing methods rely on fully sampled data for collaborative training of the model. The client that only possesses undersampled data can neither participate in FL nor benefit from o… ▽ More Federated learning (FL) based magnetic resonance (MR) image reconstruction can facilitate learning valuable priors from multi-site institutions without violating patient's privacy for accelerating MR imaging. However, existing methods rely on fully sampled data for collaborative training of the model. The client that only possesses undersampled data can neither participate in FL nor benefit from other clients. Furthermore, heterogeneous data distributions hinder FL from training an effective deep learning reconstruction model and thus cause performance degradation. To address these issues, we propose a Self-Supervised Federated Learning method (SSFedMRI). SSFedMRI explores the physics-based contrastive reconstruction networks in each client to realize cross-site collaborative training in the absence of fully sampled data. Furthermore, a personalized soft update scheme is designed to simultaneously capture the global shared representations among different centers and maintain the specific data distribution of each client. The proposed method is evaluated on four datasets and compared to the latest state-of-the-art approaches. Experimental results demonstrate that SSFedMRI possesses strong capability in reconstructing accurate MR images both visually and quantitatively on both in-distribution and out-of-distribution datasets. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: 10 pages,4 figures

MSC Class: 68T10 ACM Class: I.4.5

arXiv:2305.00650 [pdf, other]

Discover and Cure: Concept-aware Mitigation of Spurious Correlation

Authors: Shirley Wu, Mert Yuksekgonul, Linjun Zhang, James Zou

Abstract: Deep neural networks often rely on spurious correlations to make predictions, which hinders generalization beyond training environments. For instance, models that associate cats with bed backgrounds can fail to predict the existence of cats in other environments without beds. Mitigating spurious correlations is crucial in building trustworthy models. However, the existing works lack transparency t… ▽ More Deep neural networks often rely on spurious correlations to make predictions, which hinders generalization beyond training environments. For instance, models that associate cats with bed backgrounds can fail to predict the existence of cats in other environments without beds. Mitigating spurious correlations is crucial in building trustworthy models. However, the existing works lack transparency to offer insights into the mitigation process. In this work, we propose an interpretable framework, Discover and Cure (DISC), to tackle the issue. With human-interpretable concepts, DISC iteratively 1) discovers unstable concepts across different environments as spurious attributes, then 2) intervenes on the training data using the discovered concepts to reduce spurious correlation. Across systematic experiments, DISC provides superior generalization ability and interpretability than the existing approaches. Specifically, it outperforms the state-of-the-art methods on an object recognition task and a skin-lesion classification task by 7.5% and 9.6%, respectively. Additionally, we offer theoretical analysis and guarantees to understand the benefits of models trained by DISC. Code and data are available at https://github.com/Wuyxin/DISC. △ Less

Submitted 5 June, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: ICML 2023

arXiv:2305.00250 [pdf, other]

A Direct Sampling-Based Deep Learning Approach for Inverse Medium Scattering Problems

Authors: Jianfeng Ning, Fuqun Han, Jun Zou

Abstract: In this work, we focus on the inverse medium scattering problem (IMSP), which aims to recover unknown scatterers based on measured scattered data. Motivated by the efficient direct sampling method (DSM) introduced in [23], we propose a novel direct sampling-based deep learning approach (DSM-DL)for reconstructing inhomogeneous scatterers. In particular, we use the U-Net neural network to learn the… ▽ More In this work, we focus on the inverse medium scattering problem (IMSP), which aims to recover unknown scatterers based on measured scattered data. Motivated by the efficient direct sampling method (DSM) introduced in [23], we propose a novel direct sampling-based deep learning approach (DSM-DL)for reconstructing inhomogeneous scatterers. In particular, we use the U-Net neural network to learn the relation between the index functions and the true contrasts. Our proposed DSM-DL is computationally efficient, robust to noise, easy to implement, and able to naturally incorporate multiple measured data to achieve high-quality reconstructions. Some representative tests are carried out with varying numbers of incident waves and different noise levels to evaluate the performance of the proposed method. The results demonstrate the promising benefits of combining deep learning techniques with the DSM for IMSP. △ Less

Submitted 29 April, 2023; originally announced May 2023.

arXiv:2305.00179 [pdf, other]

Integrated Sensing and Communications: Recent Advances and Ten Open Challenges

Authors: Shihang Lu, Fan Liu, Yunxin Li, Kecheng Zhang, Hongjia Huang, Jiaqi Zou, Xinyu Li, Yuxiang Dong, Fuwang Dong, Jia Zhu, Yifeng Xiong, Weijie Yuan, Yuanhao Cui, Lajos Hanzo

Abstract: It is anticipated that integrated sensing and communications (ISAC) would be one of the key enablers of next-generation wireless networks (such as beyond 5G (B5G) and 6G) for supporting a variety of emerging applications. In this paper, we provide a comprehensive review of the recent advances in ISAC systems, with a particular focus on their foundations, system design, networking aspects and ISAC… ▽ More It is anticipated that integrated sensing and communications (ISAC) would be one of the key enablers of next-generation wireless networks (such as beyond 5G (B5G) and 6G) for supporting a variety of emerging applications. In this paper, we provide a comprehensive review of the recent advances in ISAC systems, with a particular focus on their foundations, system design, networking aspects and ISAC applications. Furthermore, we discuss the corresponding open questions of the above that emerged in each issue. Hence, we commence with the information theory of sensing and communications (S$\&$C), followed by the information-theoretic limits of ISAC systems by shedding light on the fundamental performance metrics. Next, we discuss their clock synchronization and phase offset problems, the associated Pareto-optimal signaling strategies, as well as the associated super-resolution ISAC system design. Moreover, we envision that ISAC ushers in a paradigm shift for the future cellular networks relying on network sensing, transforming the classic cellular architecture, cross-layer resource management methods, and transmission protocols. In ISAC applications, we further highlight the security and privacy issues of wireless sensing. Finally, we close by studying the recent advances in a representative ISAC use case, namely the multi-object multi-task (MOMT) recognition problem using wireless signals. △ Less

Submitted 17 December, 2023; v1 submitted 29 April, 2023; originally announced May 2023.

Comments: 26 pages, 22 figures, resubmitted to IEEE Journal. Appreciation for the outstanding contributions of coauthors in the paper!

arXiv:2304.07502 [pdf, other]

Model-based Federated Learning for Accurate MR Image Reconstruction from Undersampled k-space Data

Authors: Ruoyou Wu, Cheng Li, Juan Zou, Qiegen Liu, Hairong Zheng, Shanshan Wang

Abstract: Deep learning-based methods have achieved encouraging performances in the field of magnetic resonance (MR) image reconstruction. Nevertheless, to properly learn a powerful and robust model, these methods generally require large quantities of data, the collection of which from multiple centers may cause ethical and data privacy violation issues. Lately, federated learning has served as a promising… ▽ More Deep learning-based methods have achieved encouraging performances in the field of magnetic resonance (MR) image reconstruction. Nevertheless, to properly learn a powerful and robust model, these methods generally require large quantities of data, the collection of which from multiple centers may cause ethical and data privacy violation issues. Lately, federated learning has served as a promising solution to exploit multi-center data while getting rid of the data transfer between institutions. However, high heterogeneity exists in the data from different centers, and existing federated learning methods tend to use average aggregation methods to combine the client's information, which limits the performance and generalization capability of the trained models. In this paper, we propose a Model-based Federated learning framework (ModFed). ModFed has three major contributions: 1) Different from the existing data-driven federated learning methods, model-driven neural networks are designed to relieve each client's dependency on large data; 2) An adaptive dynamic aggregation scheme is proposed to address the data heterogeneity issue and improve the generalization capability and robustness the trained neural network models; 3) A spatial Laplacian attention mechanism and a personalized client-side loss regularization are introduced to capture the detailed information for accurate image reconstruction. ModFed is evaluated on three in-vivo datasets. Experimental results show that ModFed has strong capability in improving image reconstruction quality and enforcing model generalization capability when compared to the other five state-of-the-art federated learning approaches. Codes will be made available at https://github.com/ternencewu123/ModFed. △ Less

Submitted 15 April, 2023; originally announced April 2023.

Comments: 10 pages

arXiv:2303.08113 [pdf, other]

Learning Homeomorphic Image Registration via Conformal-Invariant Hyperelastic Regularisation

Authors: **g Zou, Noémie Debroux, Lihao Liu, **g Qin, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Abstract: Deformable image registration is a fundamental task in medical image analysis and plays a crucial role in a wide range of clinical applications. Recently, deep learning-based approaches have been widely studied for deformable medical image registration and achieved promising results. However, existing deep learning image registration techniques do not theoretically guarantee topology-preserving tr… ▽ More Deformable image registration is a fundamental task in medical image analysis and plays a crucial role in a wide range of clinical applications. Recently, deep learning-based approaches have been widely studied for deformable medical image registration and achieved promising results. However, existing deep learning image registration techniques do not theoretically guarantee topology-preserving transformations. This is a key property to preserve anatomical structures and achieve plausible transformations that can be used in real clinical settings. We propose a novel framework for deformable image registration. Firstly, we introduce a novel regulariser based on conformal-invariant properties in a nonlinear elasticity setting. Our regulariser enforces the deformation field to be smooth, invertible and orientation-preserving. More importantly, we strictly guarantee topology preservation yielding to a clinical meaningful registration. Secondly, we boost the performance of our regulariser through coordinate MLPs, where one can view the to-be-registered images as continuously differentiable entities. We demonstrate, through numerical and visual experiments, that our framework is able to outperform current techniques for image registration. △ Less

Submitted 30 June, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: 13 pages, 3 figures

arXiv:2303.02666 [pdf, other]

Learned Lossless Compression for JPEG via Frequency-Domain Prediction

Authors: Jixiang Luo, Shaohui Li, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

Abstract: JPEG images can be further compressed to enhance the storage and transmission of large-scale image datasets. Existing learned lossless compressors for RGB images cannot be well transferred to JPEG images due to the distinguishing distribution of DCT coefficients and raw pixels. In this paper, we propose a novel framework for learned lossless compression of JPEG images that achieves end-to-end opti… ▽ More JPEG images can be further compressed to enhance the storage and transmission of large-scale image datasets. Existing learned lossless compressors for RGB images cannot be well transferred to JPEG images due to the distinguishing distribution of DCT coefficients and raw pixels. In this paper, we propose a novel framework for learned lossless compression of JPEG images that achieves end-to-end optimized prediction of the distribution of decoded DCT coefficients. To enable learning in the frequency domain, DCT coefficients are partitioned into groups to utilize implicit local redundancy. An autoencoder-like architecture is designed based on the weight-shared blocks to realize entropy modeling of grouped DCT coefficients and independently compress the priors. We attempt to realize learned lossless compression of JPEG images in the frequency domain. Experimental results demonstrate that the proposed framework achieves superior or comparable performance in comparison to most recent lossless compressors with handcrafted context modeling for JPEG images. △ Less

Submitted 5 March, 2023; originally announced March 2023.

arXiv:2301.05898 [pdf]

doi 10.1016/j.neubiorev.2023.105111

Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope

Authors: Yuran Zhang, Jiajie Zou, Nai Ding

Abstract: The syllable is a perceptually salient unit in speech. Since both the syllable and its acoustic correlate, i.e., the speech envelope, have a preferred range of rhythmicity between 4 and 8 Hz, it is hypothesized that theta-band neural oscillations play a major role in extracting syllables based on the envelope. A literature survey, however, reveals inconsistent evidence about the relationship betwe… ▽ More The syllable is a perceptually salient unit in speech. Since both the syllable and its acoustic correlate, i.e., the speech envelope, have a preferred range of rhythmicity between 4 and 8 Hz, it is hypothesized that theta-band neural oscillations play a major role in extracting syllables based on the envelope. A literature survey, however, reveals inconsistent evidence about the relationship between speech envelope and syllables, and the current study revisits this question by analyzing large speech corpora. It is shown that the center frequency of speech envelope, characterized by the modulation spectrum, reliably correlates with the rate of syllables only when the analysis is pooled over minutes of speech recordings. In contrast, in the time domain, a component of the speech envelope is reliably phase-locked to syllable onsets. Based on a speaker-independent model, the timing of syllable onsets explains about 24% variance of the speech envelope. These results indicate that local features in the speech envelope, instead of the modulation spectrum, are a more reliable acoustic correlate of syllables. △ Less

Submitted 14 January, 2023; originally announced January 2023.

arXiv:2212.12134 [pdf, other]

AMDET: Attention based Multiple Dimensions EEG Transformer for Emotion Recognition

Authors: Yongling Xu, Yang Du, **g Zou, Tianying Zhou, Lushan Xiao, Li Liu, Pengcheng

Abstract: Affective computing is an important branch of artificial intelligence, and with the rapid development of brain computer interface technology, emotion recognition based on EEG signals has received broad attention. It is still a great challenge to effectively explore the multi-dimensional information in the EEG data in spite of a large number of deep learning methods. In this paper, we propose a dee… ▽ More Affective computing is an important branch of artificial intelligence, and with the rapid development of brain computer interface technology, emotion recognition based on EEG signals has received broad attention. It is still a great challenge to effectively explore the multi-dimensional information in the EEG data in spite of a large number of deep learning methods. In this paper, we propose a deep model called Attention-based Multiple Dimensions EEG Transformer (AMDET), which can exploit the complementarity among the spectral-spatial-temporal features of EEG data by employing the multi-dimensional global attention mechanism. We transformed the original EEG data into 3D temporal-spectral-spatial representations and then the AMDET would use spectral-spatial transformer encoder layer to extract effective features in the EEG signal and concentrate on the critical time frame with a temporal attention layer. We conduct extensive experiments on the DEAP, SEED, and SEED-IV datasets to evaluate the performance of AMDET and the results outperform the state-of-the-art baseline on three datasets. Accuracy rates of 97.48%, 96.85%, 97.17%, 87.32% were achieved in the DEAP-Arousal, DEAP-Valence, SEED, and SEED-IV datasets, respectively. We also conduct extensive experiments to explore the possible brain regions that influence emotions and the coupling of EEG signals. AMDET can perform as well even with few channels which are identified by visualizing what learned model focus on. The accuracy could achieve over 90% even with only eight channels and it is of great use and benefit for practical applications. △ Less

Submitted 22 December, 2022; originally announced December 2022.

arXiv:2211.13440 [pdf, other]

Iterative Data Refinement for Self-Supervised MR Image Reconstruction

Authors: Xue Liu, Juan Zou, Xiawu Zheng, Cheng Li, Hairong Zheng, Shanshan Wang

Abstract: Magnetic Resonance Imaging (MRI) has become an important technique in the clinic for the visualization, detection, and diagnosis of various diseases. However, one bottleneck limitation of MRI is the relatively slow data acquisition process. Fast MRI based on k-space undersampling and high-quality image reconstruction has been widely utilized, and many deep learning-based methods have been develope… ▽ More Magnetic Resonance Imaging (MRI) has become an important technique in the clinic for the visualization, detection, and diagnosis of various diseases. However, one bottleneck limitation of MRI is the relatively slow data acquisition process. Fast MRI based on k-space undersampling and high-quality image reconstruction has been widely utilized, and many deep learning-based methods have been developed in recent years. Although promising results have been achieved, most existing methods require fully-sampled reference data for training the deep learning models. Unfortunately, fully-sampled MRI data are difficult if not impossible to obtain in real-world applications. To address this issue, we propose a data refinement framework for self-supervised MR image reconstruction. Specifically, we first analyze the reason of the performance gap between self-supervised and supervised methods and identify that the bias in the training datasets between the two is one major factor. Then, we design an effective self-supervised training data refinement method to reduce this data bias. With the data refinement, an enhanced self-supervised MR image reconstruction framework is developed to prompt accurate MR imaging. We evaluate our method on an in-vivo MRI dataset. Experimental results show that without utilizing any fully sampled MRI data, our self-supervised framework possesses strong capabilities in capturing image details and structures at high acceleration factors. △ Less

Submitted 24 November, 2022; originally announced November 2022.

Comments: 5 pages, 2 figures, 1 table

MSC Class: 68T10 ACM Class: I.4.5

arXiv:2209.09105 [pdf]

Development and Clinical Evaluation of an AI Support Tool for Improving Telemedicine Photo Quality

Authors: Kailas Vodrahalli, Justin Ko, Albert S. Chiou, Roberto Novoa, Abubakar Abid, Michelle Phung, Kiana Yekrang, Paige Petrone, James Zou, Roxana Daneshjou

Abstract: Telemedicine utilization was accelerated during the COVID-19 pandemic, and skin conditions were a common use case. However, the quality of photographs sent by patients remains a major limitation. To address this issue, we developed TrueImage 2.0, an artificial intelligence (AI) model for assessing patient photo quality for telemedicine and providing real-time feedback to patients for photo quality… ▽ More Telemedicine utilization was accelerated during the COVID-19 pandemic, and skin conditions were a common use case. However, the quality of photographs sent by patients remains a major limitation. To address this issue, we developed TrueImage 2.0, an artificial intelligence (AI) model for assessing patient photo quality for telemedicine and providing real-time feedback to patients for photo quality improvement. TrueImage 2.0 was trained on 1700 telemedicine images annotated by clinicians for photo quality. On a retrospective dataset of 357 telemedicine images, TrueImage 2.0 effectively identified poor quality images (Receiver operator curve area under the curve (ROC-AUC) =0.78) and the reason for poor quality (Blurry ROC-AUC=0.84, Lighting issues ROC-AUC=0.70). The performance is consistent across age, gender, and skin tone. Next, we assessed whether patient-TrueImage 2.0 interaction led to an improvement in submitted photo quality through a prospective clinical pilot study with 98 patients. TrueImage 2.0 reduced the number of patients with a poor-quality image by 68.0%. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: 24 pages, 7 figures

arXiv:2208.03904 [pdf, other]

SelfCoLearn: Self-supervised collaborative learning for accelerating dynamic MR imaging

Authors: Juan Zou, Cheng Li, Sen Jia, Ruoyou Wu, Tingrui Pei, Hairong Zheng, Shanshan Wang

Abstract: Lately, deep learning has been extensively investigated for accelerating dynamic magnetic resonance (MR) imaging, with encouraging progresses achieved. However, without fully sampled reference data for training, current approaches may have limited abilities in recovering fine details or structures. To address this challenge, this paper proposes a self-supervised collaborative learning framework (S… ▽ More Lately, deep learning has been extensively investigated for accelerating dynamic magnetic resonance (MR) imaging, with encouraging progresses achieved. However, without fully sampled reference data for training, current approaches may have limited abilities in recovering fine details or structures. To address this challenge, this paper proposes a self-supervised collaborative learning framework (SelfCoLearn) for accurate dynamic MR image reconstruction from undersampled k-space data. The proposed framework is equipped with three important components, namely, dual-network collaborative learning, reunderampling data augmentation and a specially designed co-training loss. The framework is flexible to be integrated with both data-driven networks and model-based iterative un-rolled networks. Our method has been evaluated on in-vivo dataset and compared it to four state-of-the-art methods. Results show that our method possesses strong capabilities in capturing essential and inherent representations for direct reconstructions from the undersampled k-space data and thus enables high-quality and fast dynamic MR imaging. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: 22 pages,9 figures

ACM Class: I.4.5

arXiv:2205.03242 [pdf]

Electrocardiographic Deep Learning for Predicting Post-Procedural Mortality

Authors: David Ouyang, John Theurer, Nathan R. Stein, J. Weston Hughes, Pierre Elias, Bryan He, Neal Yuan, Grant Duffy, Roopinder K. Sandhu, Joseph Ebinger, Patrick Botting, Melvin Jujjavarapu, Brian Claggett, James E. Tooley, Tim Poterucha, Jonathan H. Chen, Michael Nurok, Marco Perez, Adler Perotte, James Y. Zou, Nancy R. Cook, Sumeet S. Chugh, Susan Cheng, Christine M. Albert

Abstract: Background. Pre-operative risk assessments used in clinical practice are limited in their ability to identify risk for post-operative mortality. We hypothesize that electrocardiograms contain hidden risk markers that can help prognosticate post-operative mortality. Methods. In a derivation cohort of 45,969 pre-operative patients (age 59+- 19 years, 55 percent women), a deep learning algorithm was… ▽ More Background. Pre-operative risk assessments used in clinical practice are limited in their ability to identify risk for post-operative mortality. We hypothesize that electrocardiograms contain hidden risk markers that can help prognosticate post-operative mortality. Methods. In a derivation cohort of 45,969 pre-operative patients (age 59+- 19 years, 55 percent women), a deep learning algorithm was developed to leverage waveform signals from pre-operative ECGs to discriminate post-operative mortality. Model performance was assessed in a holdout internal test dataset and in two external hospital cohorts and compared with the Revised Cardiac Risk Index (RCRI) score. Results. In the derivation cohort, there were 1,452 deaths. The algorithm discriminates mortality with an AUC of 0.83 (95% CI 0.79-0.87) surpassing the discrimination of the RCRI score with an AUC of 0.67 (CI 0.61-0.72) in the held out test cohort. Patients determined to be high risk by the deep learning model's risk prediction had an unadjusted odds ratio (OR) of 8.83 (5.57-13.20) for post-operative mortality as compared to an unadjusted OR of 2.08 (CI 0.77-3.50) for post-operative mortality for RCRI greater than 2. The deep learning algorithm performed similarly for patients undergoing cardiac surgery with an AUC of 0.85 (CI 0.77-0.92), non-cardiac surgery with an AUC of 0.83 (0.79-0.88), and catherization or endoscopy suite procedures with an AUC of 0.76 (0.72-0.81). The algorithm similarly discriminated risk for mortality in two separate external validation cohorts from independent healthcare systems with AUCs of 0.79 (0.75-0.83) and 0.75 (0.74-0.76) respectively. Conclusion. The findings demonstrate how a novel deep learning algorithm, applied to pre-operative ECGs, can improve discrimination of post-operative mortality. △ Less

Submitted 30 April, 2022; originally announced May 2022.

arXiv:2204.11640 [pdf, other]

doi 10.1109/TPAMI.2022.3172214

Hybrid ISTA: Unfolding ISTA With Convergence Guarantees Using Free-Form Deep Neural Networks

Authors: Ziyang Zheng, Wenrui Dai, Duoduo Xue, Chenglin Li, Junni Zou, Hongkai Xiong

Abstract: It is promising to solve linear inverse problems by unfolding iterative algorithms (e.g., iterative shrinkage thresholding algorithm (ISTA)) as deep neural networks (DNNs) with learnable parameters. However, existing ISTA-based unfolded algorithms restrict the network architectures for iterative updates with the partial weight coupling structure to guarantee convergence. In this paper, we propose… ▽ More It is promising to solve linear inverse problems by unfolding iterative algorithms (e.g., iterative shrinkage thresholding algorithm (ISTA)) as deep neural networks (DNNs) with learnable parameters. However, existing ISTA-based unfolded algorithms restrict the network architectures for iterative updates with the partial weight coupling structure to guarantee convergence. In this paper, we propose hybrid ISTA to unfold ISTA with both pre-computed and learned parameters by incorporating free-form DNNs (i.e., DNNs with arbitrary feasible and reasonable network architectures), while ensuring theoretical convergence. We first develop HCISTA to improve the efficiency and flexibility of classical ISTA (with pre-computed parameters) without compromising the convergence rate in theory. Furthermore, the DNN-based hybrid algorithm is generalized to popular variants of learned ISTA, dubbed HLISTA, to enable a free architecture of learned parameters with a guarantee of linear convergence. To our best knowledge, this paper is the first to provide a convergence-provable framework that enables free-form DNNs in ISTA-based unfolded algorithms. This framework is general to endow arbitrary DNNs for solving linear inverse problems with convergence guarantees. Extensive experiments demonstrate that hybrid ISTA can reduce the reconstruction error with an improved convergence rate in the tasks of sparse recovery and compressive sensing. △ Less

Submitted 5 May, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

Comments: 109 pages, 16 figures; this is a draft and the final version has been accepted by TPAMI (DOI: 10.1109/TPAMI.2022.3172214)

arXiv:2204.06230 [pdf, ps, other]

Performance Analysis of Wireless Network Aided by Discrete-Phase-Shifter IRS

Authors: Rongen Dong, Yin Teng, Zhongwen Sun, Jun Zou, Mengxing Huang, Jun Li, Feng Shu, Jiangzhou Wang

Abstract: Discrete phase shifters of intelligent reflecting surface (IRS) generates phase quantization error (QE) and degrades the receive performance at the receiver. To make an analysis of the performance loss caused by IRS with phase QE, based on the law of large numbers, the closed-form expressions of signal-to-noise ratio (SNR) performance loss (PL), achievable rate (AR), and bit error rate (BER) are s… ▽ More Discrete phase shifters of intelligent reflecting surface (IRS) generates phase quantization error (QE) and degrades the receive performance at the receiver. To make an analysis of the performance loss caused by IRS with phase QE, based on the law of large numbers, the closed-form expressions of signal-to-noise ratio (SNR) performance loss (PL), achievable rate (AR), and bit error rate (BER) are successively derived under line-of-sight (LoS) channels and Rayleigh channels. Moreover, based on the Taylor series expansion, the approximate simple closed form of PL of IRS with approximate QE is also given. The simulation results show that the performance losses of SNR and AR decrease as the number of quantization bits increase, while they gradually increase with the number of IRS phase shifter elements increase. Regardless of LoS channels or Rayleigh channels, when the number of quantization bits is larger than or equal to 3, the performance losses of SNR and AR are less than 0.23dB and 0.08bits/s/Hz, respectively, and the BER performance degradation is trivial. In particular, the performance loss difference between IRS with QE and IRS with approximate QE is negligible when the number of quantization bits is not less than 2. △ Less

Submitted 13 April, 2022; originally announced April 2022.

arXiv:2203.08807 [pdf]

doi 10.1126/sciadv.abq6147

Disparities in Dermatology AI Performance on a Diverse, Curated Clinical Image Set

Authors: Roxana Daneshjou, Kailas Vodrahalli, Roberto A Novoa, Melissa Jenkins, Weixin Liang, Veronica Rotemberg, Justin Ko, Susan M Swetter, Elizabeth E Bailey, Olivier Gevaert, Pritam Mukherjee, Michelle Phung, Kiana Yekrang, Bradley Fong, Rachna Sahasrabudhe, Johan A. C. Allerup, Utako Okata-Karigane, James Zou, Albert Chiou

Abstract: Access to dermatological care is a major issue, with an estimated 3 billion people lacking access to care globally. Artificial intelligence (AI) may aid in triaging skin diseases. However, most AI models have not been rigorously assessed on images of diverse skin tones or uncommon diseases. To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology I… ▽ More Access to dermatological care is a major issue, with an estimated 3 billion people lacking access to care globally. Artificial intelligence (AI) may aid in triaging skin diseases. However, most AI models have not been rigorously assessed on images of diverse skin tones or uncommon diseases. To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology Images (DDI) dataset-the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones. Using this dataset of 656 images, we show that state-of-the-art dermatology AI models perform substantially worse on DDI, with receiver operator curve area under the curve (ROC-AUC) drop** by 27-36 percent compared to the models' original test results. All the models performed worse on dark skin tones and uncommon diseases, which are represented in the DDI dataset. Additionally, we find that dermatologists, who typically provide visual labels for AI training and test datasets, also perform worse on images of dark skin tones and uncommon diseases compared to ground truth biopsy annotations. Finally, fine-tuning AI models on the well-characterized and diverse DDI images closed the performance gap between light and dark skin tones. Moreover, algorithms fine-tuned on diverse skin tones outperformed dermatologists on identifying malignancy on images of dark skin tones. Our findings identify important weaknesses and biases in dermatology AI that need to be addressed to ensure reliable application to diverse patients and diseases. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2202.01494 [pdf, other]

PARCEL: Physics-based Unsupervised Contrastive Representation Learning for Multi-coil MR Imaging

Authors: Shanshan Wang, Ruoyou Wu, Cheng Li, Juan Zou, Ziyao Zhang, Qiegen Liu, Yan Xi, Hairong Zheng

Abstract: With the successful application of deep learning to magnetic resonance (MR) imaging, parallel imaging techniques based on neural networks have attracted wide attention. However, in the absence of high-quality, fully sampled datasets for training, the performance of these methods is limited. And the interpretability of models is not strong enough. To tackle this issue, this paper proposes a Physics… ▽ More With the successful application of deep learning to magnetic resonance (MR) imaging, parallel imaging techniques based on neural networks have attracted wide attention. However, in the absence of high-quality, fully sampled datasets for training, the performance of these methods is limited. And the interpretability of models is not strong enough. To tackle this issue, this paper proposes a Physics-bAsed unsupeRvised Contrastive rEpresentation Learning (PARCEL) method to speed up parallel MR imaging. Specifically, PARCEL has a parallel framework to contrastively learn two branches of model-based unrolling networks from augmented undersampled multi-coil k-space data. A sophisticated co-training loss with three essential components has been designed to guide the two networks in capturing the inherent features and representations for MR images. And the final MR image is reconstructed with the trained contrastive networks. PARCEL was evaluated on two vivo datasets and compared to five state-of-the-art methods. The results show that PARCEL is able to learn essential representations for accurate MR reconstruction without relying on fully sampled datasets. △ Less

Submitted 14 November, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

arXiv:2201.00097 [pdf, other]

Adversarial Attack via Dual-Stage Network Erosion

Authors: Yexin Duan, Junhua Zou, Xingyu Zhou, Wu Zhang, ** Zhang, Zhisong Pan

Abstract: Deep neural networks are vulnerable to adversarial examples, which can fool deep models by adding subtle perturbations. Although existing attacks have achieved promising results, it still leaves a long way to go for generating transferable adversarial examples under the black-box setting. To this end, this paper proposes to improve the transferability of adversarial examples, and applies dual-stag… ▽ More Deep neural networks are vulnerable to adversarial examples, which can fool deep models by adding subtle perturbations. Although existing attacks have achieved promising results, it still leaves a long way to go for generating transferable adversarial examples under the black-box setting. To this end, this paper proposes to improve the transferability of adversarial examples, and applies dual-stage feature-level perturbations to an existing model to implicitly create a set of diverse models. Then these models are fused by the longitudinal ensemble during the iterations. The proposed method is termed Dual-Stage Network Erosion (DSNE). We conduct comprehensive experiments both on non-residual and residual networks, and obtain more transferable adversarial examples with the computational cost similar to the state-of-the-art method. In particular, for the residual networks, the transferability of the adversarial examples can be significantly improved by biasing the residual block information to the skip connections. Our work provides new insights into the architectural vulnerability of neural networks and presents new challenges to the robustness of neural networks. △ Less

Submitted 31 December, 2021; originally announced January 2022.

arXiv:2111.08006 [pdf, other]

doi 10.1126/sciadv.abq6147

Disparities in Dermatology AI: Assessments Using Diverse Clinical Images

Authors: Roxana Daneshjou, Kailas Vodrahalli, Weixin Liang, Roberto A Novoa, Melissa Jenkins, Veronica Rotemberg, Justin Ko, Susan M Swetter, Elizabeth E Bailey, Olivier Gevaert, Pritam Mukherjee, Michelle Phung, Kiana Yekrang, Bradley Fong, Rachna Sahasrabudhe, James Zou, Albert Chiou

Abstract: More than 3 billion people lack access to care for skin disease. AI diagnostic tools may aid in early skin cancer detection; however most models have not been assessed on images of diverse skin tones or uncommon diseases. To address this, we curated the Diverse Dermatology Images (DDI) dataset - the first publicly available, pathologically confirmed images featuring diverse skin tones. We show tha… ▽ More More than 3 billion people lack access to care for skin disease. AI diagnostic tools may aid in early skin cancer detection; however most models have not been assessed on images of diverse skin tones or uncommon diseases. To address this, we curated the Diverse Dermatology Images (DDI) dataset - the first publicly available, pathologically confirmed images featuring diverse skin tones. We show that state-of-the-art dermatology AI models perform substantially worse on DDI, with ROC-AUC drop** 29-40 percent compared to the models' original results. We find that dark skin tones and uncommon diseases, which are well represented in the DDI dataset, lead to performance drop-offs. Additionally, we show that state-of-the-art robust training methods cannot correct for these biases without diverse training data. Our findings identify important weaknesses and biases in dermatology AI that need to be addressed to ensure reliable application to diverse patients and across all disease. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: Machine Learning for Health (ML4H) - Extended Abstract

arXiv:2106.12511 [pdf]

doi 10.1001/jamacardio.2021.6059

High-Throughput Precision Phenoty** of Left Ventricular Hypertrophy with Cardiovascular Deep Learning

Authors: Grant Duffy, Paul P Cheng, Neal Yuan, Bryan He, Alan C. Kwan, Matthew J. Shun-Shin, Kevin M. Alexander, Joseph Ebinger, Matthew P. Lungren, Florian Rader, David H. Liang, Ingela Schnittger, Euan A. Ashley, James Y. Zou, Jignesh Patel, Ronald Witteles, Susan Cheng, David Ouyang

Abstract: Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and di… ▽ More Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and difficulty differentiating etiologies of LVH. To overcome this challenge, we present EchoNet-LVH - a deep learning workflow that automatically quantifies ventricular hypertrophy with precision equal to human experts and predicts etiology of LVH. Trained on 28,201 echocardiogram videos, our model accurately measures intraventricular wall thickness (mean absolute error [MAE] 1.4mm, 95% CI 1.2-1.5mm), left ventricular diameter (MAE 2.4mm, 95% CI 2.2-2.6mm), and posterior wall thickness (MAE 1.2mm, 95% CI 1.1-1.3mm) and classifies cardiac amyloidosis (area under the curve of 0.83) and hypertrophic cardiomyopathy (AUC 0.98) from other etiologies of LVH. In external datasets from independent domestic and international healthcare systems, EchoNet-LVH accurately quantified ventricular parameters (R2 of 0.96 and 0.90 respectively) and detected cardiac amyloidosis (AUC 0.79) and hypertrophic cardiomyopathy (AUC 0.89) on the domestic external validation site. Leveraging measurements across multiple heart beats, our model can more accurately identify subtle changes in LV geometry and its causal etiologies. Compared to human experts, EchoNet-LVH is fully automated, allowing for reproducible, precise measurements, and lays the foundation for precision diagnosis of cardiac hypertrophy. As a resource to promote further innovation, we also make publicly available a large dataset of 23,212 annotated echocardiogram videos. △ Less

Submitted 23 June, 2021; originally announced June 2021.

arXiv:2106.09910 [pdf, other]

Message Passing in Graph Convolution Networks via Adaptive Filter Banks

Authors: Xing Gao, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong, Pascal Frossard

Abstract: Graph convolution networks, like message passing graph convolution networks (MPGCNs), have been a powerful tool in representation learning of networked data. However, when data is heterogeneous, most architectures are limited as they employ a single strategy to handle multi-channel graph signals and they typically focus on low-frequency information. In this paper, we present a novel graph convolut… ▽ More Graph convolution networks, like message passing graph convolution networks (MPGCNs), have been a powerful tool in representation learning of networked data. However, when data is heterogeneous, most architectures are limited as they employ a single strategy to handle multi-channel graph signals and they typically focus on low-frequency information. In this paper, we present a novel graph convolution operator, termed BankGCN, which keeps benefits of message passing models, but extends their capabilities beyond `low-pass' features. It decomposes multi-channel signals on graphs into subspaces and handles particular information in each subspace with an adapted filter. The filters of all subspaces have different frequency responses and together form a filter bank. Furthermore, each filter in the spectral domain corresponds to a message passing scheme, and diverse schemes are implemented via the filter bank. Importantly, the filter bank and the signal decomposition are jointly learned to adapt to the spectral characteristics of data and to target applications. Furthermore, this is implemented almost without extra parameters in comparison with most existing MPGCNs. Experimental results show that the proposed convolution operator permits to achieve excellent performance in graph classification on a collection of benchmark graph datasets. △ Less

Submitted 18 June, 2021; originally announced June 2021.

arXiv:2105.06634 [pdf, ps, other]

Fast Ambiguous DOA Elimination Method of DOA Measurement for Hybrid Massive MIMO Receiver

Authors: Nuo Chen, Xinyi Jiang, Baihua Shi, Yin Teng, **hui Lu, Feng Shu, Jun Zou, Jun Li, Jiangzhou Wang

Abstract: DOA estimation for massive multiple-input multiple-output (MIMO) system can provide ultra-high-resolution angle estimation. However, due to the high computational complexity and cost of all digital MIMO systems, a hybrid analog digital (HAD) structure MIMO was proposed. In this paper, a fast ambiguous phase elimination method is proposed to solve the problem of direction-finding ambiguity caused b… ▽ More DOA estimation for massive multiple-input multiple-output (MIMO) system can provide ultra-high-resolution angle estimation. However, due to the high computational complexity and cost of all digital MIMO systems, a hybrid analog digital (HAD) structure MIMO was proposed. In this paper, a fast ambiguous phase elimination method is proposed to solve the problem of direction-finding ambiguity caused by the HAD MIMO. Only two-data-blocks are used to realize DOA estimation. Simulation results show that the proposed method can greatly reduce the estimation delay with a slight performance loss. △ Less

Submitted 14 May, 2021; originally announced May 2021.

arXiv:2011.10325 [pdf]

doi 10.1049/cp.2019.0901

Improvement of accuracy for measurement of 100-km fibre latency with Correlation OTDR

Authors: Florian Azendorf, Annika Dochhan, Jim Zou, Bernhard Schmauss, Michael Eiselt

Abstract: We measured the latency of a 100 km fibre link using a Correlation OTDR. Improvements over previous results were achieved by increasing the probe signal rate to 10 Gbit/s, using dispersion compensation gratings, and coupling the receiver time base to an external PPS signal. We measured the latency of a 100 km fibre link using a Correlation OTDR. Improvements over previous results were achieved by increasing the probe signal rate to 10 Gbit/s, using dispersion compensation gratings, and coupling the receiver time base to an external PPS signal. △ Less

Submitted 20 November, 2020; originally announced November 2020.

Comments: This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 762055 (BlueSpace Project)

Journal ref: European Conference on Optical Communication (ECOC) 2019

arXiv:2011.04988 [pdf, other]

AIM 2020 Challenge on Rendering Realistic Bokeh

Authors: Andrey Ignatov, Radu Timofte, Ming Qian, Congyu Qiao, Jiamin Lin, Zhenyu Guo, Chenghua Li, Cong Leng, Jian Cheng, Juewen Peng, Xianrui Luo, Ke Xian, Zi** Wu, Zhiguo Cao, Densen Puthussery, Jiji C V, Hrishikesh P S, Melvin Kuriakose, Saikat Dutta, Sourya Dipta Das, Nisarg A. Shah, Kuldeep Purohit, Praveen Kandula, Maitreya Suin, A. N. Rajagopalan , et al. (10 additional authors not shown)

Abstract: This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using th… ▽ More This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using the Canon 7D DSLR camera. The participants had to render bokeh effect based on only one single frame without any additional data from other cameras or sensors. The target metric used in this challenge combined the runtime and the perceptual quality of the solutions measured in the user study. To ensure the efficiency of the submitted models, we measured their runtime on standard desktop CPUs as well as were running the models on smartphone GPUs. The proposed solutions significantly improved the baseline results, defining the state-of-the-art for practical bokeh effect rendering problem. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: Published in ECCV 2020 Workshop (Advances in Image Manipulation), https://data.vision.ee.ethz.ch/cvl/aim20/

arXiv:2010.08006 [pdf]

doi 10.1038/s41598-021-87762-2

Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray Dataset

Authors: Siyi Tang, Amirata Ghorbani, Rikiya Yamashita, Sameer Rehman, Jared A. Dunnmon, James Zou, Daniel L. Rubin

Abstract: The reliability of machine learning models can be compromised when trained on low quality data. Many large-scale medical imaging datasets contain low quality labels extracted from sources such as medical reports. Moreover, images within a dataset may have heterogeneous quality due to artifacts and biases arising from equipment or measurement errors. Therefore, algorithms that can automatically ide… ▽ More The reliability of machine learning models can be compromised when trained on low quality data. Many large-scale medical imaging datasets contain low quality labels extracted from sources such as medical reports. Moreover, images within a dataset may have heterogeneous quality due to artifacts and biases arising from equipment or measurement errors. Therefore, algorithms that can automatically identify low quality data are highly desired. In this study, we used data Shapley, a data valuation metric, to quantify the value of training data to the performance of a pneumonia detection algorithm in a large chest X-ray dataset. We characterized the effectiveness of data Shapley in identifying low quality versus valuable data for pneumonia detection. We found that removing training data with high Shapley values decreased the pneumonia detection performance, whereas removing data with low Shapley values improved the model performance. Furthermore, there were more mislabeled examples in low Shapley value data and more true pneumonia cases in high Shapley value data. Our results suggest that low Shapley value indicates mislabeled or poor quality images, whereas high Shapley value indicates data that are valuable for pneumonia detection. Our method can serve as a framework for using data Shapley to denoise large-scale medical imaging datasets. △ Less

Submitted 15 October, 2020; originally announced October 2020.

arXiv:2010.07897 [pdf]

Spatial Registration Evaluation of [18F]-MK6240 PET

Authors: James Zou, Aubrey Johnson, Jeanelle France, Srinidhi Bharadwaj, Zeljko Tomljanovic, Yaakov Stern, Adam M. Brickman, Devangere P. Devanand, Jose A. Luchsinger, William C. Kreisl, Frank A. Provenzano

Abstract: Image registration is an important preprocessing step in neuroimaging which allows for the matching of anatomical and functional information between modalities and subjects. This can be challenging if there are gross differences in image geometry or in signal intensity, such as in the case of some molecular PET radioligands, where control subjects display relative lack of signal relative to noise… ▽ More Image registration is an important preprocessing step in neuroimaging which allows for the matching of anatomical and functional information between modalities and subjects. This can be challenging if there are gross differences in image geometry or in signal intensity, such as in the case of some molecular PET radioligands, where control subjects display relative lack of signal relative to noise within intracranial regions, and may have off target binding that may be confused as other regions, and may vary depending on subject. The use of intermediary images or volumes have been shown to aide registration in such cases. To account for this phenomena within our own longitudinal aging cohort, we generated a population specific MRI and PET template from a broad distribution of 30 amyloid negative subjects. We then registered the PET image of each of these subjects, as well as a holdout set of thirty 'template-naive' subjects to their corresponding MRI images using the template image as an intermediate using three different sets of registration parameters and procedures. To evaluate the performance of both conventional registration and our method, we compared these to the registration of the attenuation CT (acquired at time of PET acquisition) to MRI as the reference. We then used our template to directly derive SUVR values without the use of MRI. We found that conventional registration was comparable to an existing CT based standard, and there was no significant difference in errors collectively amongst all methods tested. In addition, there were no significant differences between existing and MR-less tau PET quantification methods. We conclude that a template-based method is a feasible alternative to, or salvage for, direct registration and MR-less quantification; and, may be preferred in cases where there is doubt about the similarity between two image modalities. △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: 19 pages, 8 Figures, 4 Tables

arXiv:2010.02086 [pdf, other]

TrueImage: A Machine Learning Algorithm to Improve the Quality of Telehealth Photos

Authors: Kailas Vodrahalli, Roxana Daneshjou, Roberto A Novoa, Albert Chiou, Justin M Ko, James Zou

Abstract: Telehealth is an increasingly critical component of the health care ecosystem, especially due to the COVID-19 pandemic. Rapid adoption of telehealth has exposed limitations in the existing infrastructure. In this paper, we study and highlight photo quality as a major challenge in the telehealth workflow. We focus on teledermatology, where photo quality is particularly important; the framework prop… ▽ More Telehealth is an increasingly critical component of the health care ecosystem, especially due to the COVID-19 pandemic. Rapid adoption of telehealth has exposed limitations in the existing infrastructure. In this paper, we study and highlight photo quality as a major challenge in the telehealth workflow. We focus on teledermatology, where photo quality is particularly important; the framework proposed here can be generalized to other health domains. For telemedicine, dermatologists request that patients submit images of their lesions for assessment. However, these images are often of insufficient quality to make a clinical diagnosis since patients do not have experience taking clinical photos. A clinician has to manually triage poor quality images and request new images to be submitted, leading to wasted time for both the clinician and the patient. We propose an automated image assessment machine learning pipeline, TrueImage, to detect poor quality dermatology photos and to guide patients in taking better photos. Our experiments indicate that TrueImage can reject 50% of the sub-par quality images, while retaining 80% of good quality images patients send in, despite heterogeneity and limitations in the training data. These promising results suggest that our solution is feasible and can improve the quality of teledermatology care. △ Less

Submitted 1 October, 2020; originally announced October 2020.

Comments: 12 pages, 5 figures, Preprint of an article published in Pacific Symposium on Biocomputing \c{opyright} 2020 World Scientific Publishing Co., Singapore, http://psb.stanford.edu/

arXiv:2007.13854 [pdf, other]

doi 10.1007/978-3-030-27272-2_29

Improving Lesion Segmentation for Diabetic Retinopathy using Adversarial Learning

Authors: Qiqi Xiao, Jiaxu Zou, Muqiao Yang, Alex Gaudio, Kris Kitani, Asim Smailagic, Pedro Costa, Min Xu

Abstract: Diabetic Retinopathy (DR) is a leading cause of blindness in working age adults. DR lesions can be challenging to identify in fundus images, and automatic DR detection systems can offer strong clinical value. Of the publicly available labeled datasets for DR, the Indian Diabetic Retinopathy Image Dataset (IDRiD) presents retinal fundus images with pixel-level annotations of four distinct lesions:… ▽ More Diabetic Retinopathy (DR) is a leading cause of blindness in working age adults. DR lesions can be challenging to identify in fundus images, and automatic DR detection systems can offer strong clinical value. Of the publicly available labeled datasets for DR, the Indian Diabetic Retinopathy Image Dataset (IDRiD) presents retinal fundus images with pixel-level annotations of four distinct lesions: microaneurysms, hemorrhages, soft exudates and hard exudates. We utilize the HEDNet edge detector to solve a semantic segmentation task on this dataset, and then propose an end-to-end system for pixel-level segmentation of DR lesions by incorporating HEDNet into a Conditional Generative Adversarial Network (cGAN). We design a loss function that adds adversarial loss to segmentation loss. Our experiments show that the addition of the adversarial loss improves the lesion segmentation performance over the baseline. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Comments: Accepted to International Conference on Image Analysis and Recognition, ICIAR 2019. Published at https://doi.org/10.1007/978-3-030-27272-2_29 Code: https://github.com/zoujx96/DR-segmentation

arXiv:2006.04422 [pdf]

doi 10.1364/SPPCOM.2018.SpTh4F.3

Real-time 112 Gbit/s DMT for Data Center Interconnects

Authors: Annika Dochhan, Nicklas Eiselt, Jim Zou, Helmut Griesser, Michael H. Eiselt, Jörg-Peter Elbers

Abstract: We report on 112 Gbit/s real-time DMT transmission over up to 60 km, targeted at DCI applications. Chromatic dispersion mitigation by vestigial sideband filtering is compared to the use of dispersion compensating fiber. We report on 112 Gbit/s real-time DMT transmission over up to 60 km, targeted at DCI applications. Chromatic dispersion mitigation by vestigial sideband filtering is compared to the use of dispersion compensating fiber. △ Less

Submitted 8 June, 2020; originally announced June 2020.

Comments: The results were obtained in the SENDATE Secure-DCI project, partly funded by the German ministry of education and research (BMBF) under contract 16KIS0477K, and in the iCirrus project, funded by the European Commission under grant agreement No. 644526

Journal ref: Advanced Photonics 2018 (BGPP, IPR, NP, NOMA, Sensors, Networks, SPPCom, SOF)

arXiv:2003.07536 [pdf, ps, other]

Sequential and Incremental Precoder Design for Joint Transmission Network MIMO Systems with Imperfect Backhaul

Authors: Ming Ding, Jun Zou, Zeng Yang, Hanwen Luo, Wen Chen

Abstract: In this paper, we propose a sequential and incremental precoder design for downlink joint transmission (JT) network MIMO systems with imperfect backhaul links. The objective of our design is to minimize the maximum of the sub-stream mean square errors (MSE), which dominates the average bit error rate (BER) performance of the system. In the proposed scheme,we first optimize the precoder at the serv… ▽ More In this paper, we propose a sequential and incremental precoder design for downlink joint transmission (JT) network MIMO systems with imperfect backhaul links. The objective of our design is to minimize the maximum of the sub-stream mean square errors (MSE), which dominates the average bit error rate (BER) performance of the system. In the proposed scheme,we first optimize the precoder at the serving base station (BS), and then sequentially optimize the precoders of non-serving BSs in the JT set according to the descending order of their probabilities of participating in JT. The BS-wise sequential optimization process can improve the system performance when some BSs have to temporarily quit the JT operations because of poor instant backhaul conditions. Besides, the precoder of an additional BS is derived in an incremental way, i.e., the sequentially optimized precoders of previous BSs are fixed, thus the additional precoder plays an incremental part in the multi-BS JT operations. An iterative algorithm is designed to jointly optimize the sub-stream precoder and sub-stream power allocation for each additional BS in the proposed sequential and incremental optimization scheme. Simulations show that, under the practical backhaul link conditions, our scheme significantly outperforms the autonomous global precoding (AGP) scheme in terms of BER performance. △ Less

Submitted 17 March, 2020; originally announced March 2020.

Comments: TVT

arXiv:1912.11604 [pdf, other]

doi 10.1109/TMM.2019.2962310

Partition-Aware Adaptive Switching Neural Networks for Post-Processing in HEVC

Authors: Weiyao Lin, Xiaoyi He, Xintong Han, Dong Liu, John See, Junni Zou, Hongkai Xiong, Feng Wu

Abstract: This paper addresses neural network based post-processing for the state-of-the-art video coding standard, High Efficiency Video Coding (HEVC). We first propose a partition-aware Convolution Neural Network (CNN) that utilizes the partition information produced by the encoder to assist in the post-processing. In contrast to existing CNN-based approaches, which only take the decoded frame as input, t… ▽ More This paper addresses neural network based post-processing for the state-of-the-art video coding standard, High Efficiency Video Coding (HEVC). We first propose a partition-aware Convolution Neural Network (CNN) that utilizes the partition information produced by the encoder to assist in the post-processing. In contrast to existing CNN-based approaches, which only take the decoded frame as input, the proposed approach considers the coding unit (CU) size information and combines it with the distorted decoded frame such that the artifacts introduced by HEVC are efficiently reduced. We further introduce an adaptive-switching neural network (ASN) that consists of multiple independent CNNs to adaptively handle the variations in content and distortion within compressed-video frames, providing further reduction in visual artifacts. Additionally, an iterative training procedure is proposed to train these independent CNNs attentively on different local patch-wise classes. Experiments on benchmark sequences demonstrate the effectiveness of our partition-aware and adaptive-switching neural networks. The source code can be found at http://min.sjtu.edu.cn/lwydemo/HEVCpostprocessing.html. △ Less

Submitted 25 December, 2019; originally announced December 2019.

Comments: to appear in IEEE Transaction on Multimedia

arXiv:1910.12535 [pdf, ps, other]

Low-Complexity Leakage-based Secure Precise Wireless Transmission with Hybrid Beamforming

Authors: Tong Shen, Yan Lin, Jun Zou, Yongpeng Wu, Feng Shu, Jiangzhou Wang

Abstract: In conventional secure precise wireless transmission (SPWT), fully digital beamforming (FDB) has a high secrecy performance in transmit antenna system, but results in a huge RF-chain circuit budget for medium-scale and large-scale systems. To reduce the complexity, this letter considers a hybrid digital and analog (HDA) structure with random frequency mapped into the RF-chains to achieve SPWT. The… ▽ More In conventional secure precise wireless transmission (SPWT), fully digital beamforming (FDB) has a high secrecy performance in transmit antenna system, but results in a huge RF-chain circuit budget for medium-scale and large-scale systems. To reduce the complexity, this letter considers a hybrid digital and analog (HDA) structure with random frequency mapped into the RF-chains to achieve SPWT. Then, a hybrid SPWT scheme based on maximizing signal-to-leakage-and-noise ratio (SLNR) and artificial-noise-to-leakage-and-noise ratio (ANLNR) (M-SLNR-ANLNR) is proposed. Compared to the FDB scheme, the proposed scheme reduces the circuit budget with low computational complexity and comparable secrecy performance. △ Less

Submitted 28 October, 2019; originally announced October 2019.

Showing 1–50 of 55 results for author: Zou, J