Search | arXiv e-print repository

Optimised ProPainter for Video Diminished Reality Inpainting

Authors: Pengze Li, Lihao Liu, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Abstract: In this paper, part of the DREAMING Challenge - Diminished Reality for Emerging Applications in Medicine through Inpainting, we introduce a refined video inpainting technique optimised from the ProPainter method to meet the specialised demands of medical imaging, specifically in the context of oral and maxillofacial surgery. Our enhanced algorithm employs the zero-shot ProPainter, featuring optimi… ▽ More In this paper, part of the DREAMING Challenge - Diminished Reality for Emerging Applications in Medicine through Inpainting, we introduce a refined video inpainting technique optimised from the ProPainter method to meet the specialised demands of medical imaging, specifically in the context of oral and maxillofacial surgery. Our enhanced algorithm employs the zero-shot ProPainter, featuring optimized parameters and pre-processing, to adeptly manage the complex task of inpainting surgical video sequences, without requiring any training process. It aims to produce temporally coherent and detail-rich reconstructions of occluded regions, facilitating clearer views of operative fields. The efficacy of our approach is evaluated using comprehensive metrics, positioning it as a significant advancement in the application of diminished reality for medical purposes. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted to ISBI 2024

arXiv:2405.17659 [pdf, other]

Enhancing Global Sensitivity and Uncertainty Quantification in Medical Image Reconstruction with Monte Carlo Arbitrary-Masked Mamba

Authors: Jiahao Huang, Liutao Yang, Fanwen Wang, Yang Nan, Weiwen Wu, Chengyan Wang, Kuangyu Shi, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb, Daoqiang Zhang, Guang Yang

Abstract: Deep learning has been extensively applied in medical image reconstruction, where Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has sh… ▽ More Deep learning has been extensively applied in medical image reconstruction, where Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has shown superiority in learning visual representation, which combines the advantages of linear scalability and global sensitivity. In this study, we introduce MambaMIR, an Arbitrary-Masked Mamba-based model with wavelet decomposition for joint medical image reconstruction and uncertainty estimation. A novel Arbitrary Scan Masking (ASM) mechanism "masks out" redundant information to introduce randomness for further uncertainty estimation. Compared to the commonly used Monte Carlo (MC) dropout, our proposed MC-ASM provides an uncertainty map without the need for hyperparameter tuning and mitigates the performance drop typically observed when applying dropout to low-level tasks. For further texture preservation and better perceptual quality, we employ the wavelet transformation into MambaMIR and explore its variant based on the Generative Adversarial Network, namely MambaMIR-GAN. Comprehensive experiments have been conducted for multiple representative medical image reconstruction tasks, demonstrating that the proposed MambaMIR and MambaMIR-GAN outperform other baseline and state-of-the-art methods in different reconstruction tasks, where MambaMIR achieves the best reconstruction fidelity and MambaMIR-GAN has the best perceptual quality. In addition, our MC-ASM provides uncertainty maps as an additional tool for clinicians, while mitigating the typical performance drop caused by the commonly used dropout. △ Less

Submitted 25 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.14338 [pdf, other]

MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models

Authors: Jiuming Liu, **ru Han, Lihao Liu, Angelica I. Aviles-Rivero, Chaokang Jiang, Zhe Liu, Hesheng Wang

Abstract: Point cloud videos effectively capture real-world spatial geometries and temporal dynamics, which are essential for enabling intelligent agents to understand the dynamically changing 3D world we live in. Although static 3D point cloud processing has witnessed significant advancements, designing an effective 4D point cloud video backbone remains challenging, mainly due to the irregular and unordere… ▽ More Point cloud videos effectively capture real-world spatial geometries and temporal dynamics, which are essential for enabling intelligent agents to understand the dynamically changing 3D world we live in. Although static 3D point cloud processing has witnessed significant advancements, designing an effective 4D point cloud video backbone remains challenging, mainly due to the irregular and unordered distribution of points and temporal inconsistencies across frames. Moreover, recent state-of-the-art 4D backbones predominantly rely on transformer-based architectures, which commonly suffer from large computational costs due to their quadratic complexity, particularly when processing long video sequences. To address these challenges, we propose a novel 4D point cloud video understanding backbone based on the recently advanced State Space Models (SSMs). Specifically, our backbone begins by disentangling space and time in raw 4D sequences, and then establishing spatio-temporal correlations using our newly developed Intra-frame Spatial Mamba and Inter-frame Temporal Mamba blocks. The Intra-frame Spatial Mamba module is designed to encode locally similar or related geometric structures within a certain temporal searching stride, which can effectively capture short-term dynamics. Subsequently, these locally correlated tokens are delivered to the Inter-frame Temporal Mamba module, which globally integrates point features across the entire video with linear complexity, further establishing long-range motion dependencies. Experimental results on human action recognition and 4D semantic segmentation tasks demonstrate the superiority of our proposed method. Especially, for long video sequences, our proposed Mamba-based method has an 87.5% GPU memory reduction, 5.36 times speed-up, and much higher accuracy (up to +10.4%) compared with transformer-based counterparts on MSR-Action3D dataset. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2403.12719 [pdf, other]

Bilevel Hypergraph Networks for Multi-Modal Alzheimer's Diagnosis

Authors: Angelica I. Aviles-Rivero, Chun-Wun Cheng, Zhongying Deng, Zoe Kourtzi, Carola-Bibiane Schönlieb

Abstract: Early detection of Alzheimer's disease's precursor stages is imperative for significantly enhancing patient outcomes and quality of life. This challenge is tackled through a semi-supervised multi-modal diagnosis framework. In particular, we introduce a new hypergraph framework that enables higher-order relations between multi-modal data, while utilising minimal labels. We first introduce a bilevel… ▽ More Early detection of Alzheimer's disease's precursor stages is imperative for significantly enhancing patient outcomes and quality of life. This challenge is tackled through a semi-supervised multi-modal diagnosis framework. In particular, we introduce a new hypergraph framework that enables higher-order relations between multi-modal data, while utilising minimal labels. We first introduce a bilevel hypergraph optimisation framework that jointly learns a graph augmentation policy and a semi-supervised classifier. This dual learning strategy is hypothesised to enhance the robustness and generalisation capabilities of the model by fostering new pathways for information propagation. Secondly, we introduce a novel strategy for generating pseudo-labels more effectively via a gradient-driven flow. Our experimental results demonstrate the superior performance of our framework over current techniques in diagnosing Alzheimer's disease. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.09136 [pdf, other]

Biophysics Informed Pathological Regularisation for Brain Tumour Segmentation

Authors: Lipei Zhang, Yanqi Cheng, Lihao Liu, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Abstract: Recent advancements in deep learning have significantly improved brain tumour segmentation techniques; however, the results still lack confidence and robustness as they solely consider image data without biophysical priors or pathological information. Integrating biophysics-informed regularisation is one effective way to change this situation, as it provides an prior regularisation for automated e… ▽ More Recent advancements in deep learning have significantly improved brain tumour segmentation techniques; however, the results still lack confidence and robustness as they solely consider image data without biophysical priors or pathological information. Integrating biophysics-informed regularisation is one effective way to change this situation, as it provides an prior regularisation for automated end-to-end learning. In this paper, we propose a novel approach that designs brain tumour growth Partial Differential Equation (PDE) models as a regularisation with deep learning, operational with any network model. Our method introduces tumour growth PDE models directly into the segmentation process, improving accuracy and robustness, especially in data-scarce scenarios. This system estimates tumour cell density using a periodic activation function. By effectively integrating this estimation with biophysical models, we achieve a better capture of tumour characteristics. This approach not only aligns the segmentation closer to actual biological behaviour but also strengthens the model's performance under limited data conditions. We demonstrate the effectiveness of our framework through extensive experiments on the BraTS 2023 dataset, showcasing significant improvements in both precision and reliability of tumour segmentation. △ Less

Submitted 17 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 11 pages, 4 figures and 1 table

arXiv:2403.07684 [pdf, other]

Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal

Authors: Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero, Yulun Zhang, **g Qin, Lei Zhu

Abstract: Real-world vision tasks frequently suffer from the appearance of unexpected adverse weather conditions, including rain, haze, snow, and raindrops. In the last decade, convolutional neural networks and vision transformers have yielded outstanding results in single-weather video removal. However, due to the absence of appropriate adaptation, most of them fail to generalize to other weather condition… ▽ More Real-world vision tasks frequently suffer from the appearance of unexpected adverse weather conditions, including rain, haze, snow, and raindrops. In the last decade, convolutional neural networks and vision transformers have yielded outstanding results in single-weather video removal. However, due to the absence of appropriate adaptation, most of them fail to generalize to other weather conditions. Although ViWS-Net is proposed to remove adverse weather conditions in videos with a single set of pre-trained weights, it is seriously blinded by seen weather at train-time and degenerates when coming to unseen weather during test-time. In this work, we introduce test-time adaptation into adverse weather removal in videos, and propose the first framework that integrates test-time adaptation into the iterative diffusion reverse process. Specifically, we devise a diffusion-based network with a novel temporal noise model to efficiently explore frame-correlated information in degraded video clips at training stage. During inference stage, we introduce a proxy task named Diffusion Tubelet Self-Calibration to learn the primer distribution of test video stream and optimize the model by approximating the temporal noise model for online adaptation. Experimental results, on benchmark datasets, demonstrate that our Test-Time Adaptation method with Diffusion-based network(Diff-TTA) outperforms state-of-the-art methods in terms of restoring videos degraded by seen weather conditions. Its generalizable capability is also validated with unseen weather conditions in both synthesized and real-world videos. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.18451 [pdf, other]

MambaMIR: An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation

Authors: Jiahao Huang, Liutao Yang, Fanwen Wang, Yang Nan, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb, Daoqiang Zhang, Guang Yang

Abstract: The recent Mamba model has shown remarkable adaptability for visual representation learning, including in medical imaging tasks. This study introduces MambaMIR, a Mamba-based model for medical image reconstruction, as well as its Generative Adversarial Network-based variant, MambaMIR-GAN. Our proposed MambaMIR inherits several advantages, such as linear complexity, global receptive fields, and dyn… ▽ More The recent Mamba model has shown remarkable adaptability for visual representation learning, including in medical imaging tasks. This study introduces MambaMIR, a Mamba-based model for medical image reconstruction, as well as its Generative Adversarial Network-based variant, MambaMIR-GAN. Our proposed MambaMIR inherits several advantages, such as linear complexity, global receptive fields, and dynamic weights, from the original Mamba model. The innovated arbitrary-mask mechanism effectively adapt Mamba to our image reconstruction task, providing randomness for subsequent Monte Carlo-based uncertainty estimation. Experiments conducted on various medical image reconstruction tasks, including fast MRI and SVCT, which cover anatomical regions such as the knee, chest, and abdomen, have demonstrated that MambaMIR and MambaMIR-GAN achieve comparable or superior reconstruction results relative to state-of-the-art methods. Additionally, the estimated uncertainty maps offer further insights into the reliability of the reconstruction quality. The code is publicly available at https://github.com/ayanglab/MambaMIR. △ Less

Submitted 25 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.12694 [pdf, other]

Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling

Authors: Guoqi Yu, **g Zou, Xiaowei Hu, Angelica I. Aviles-Rivero, **g Qin, Shujun Wang

Abstract: Predicting multivariate time series is crucial, demanding precise modeling of intricate patterns, including inter-series dependencies and intra-series variations. Distinctive trend characteristics in each time series pose challenges, and existing methods, relying on basic moving average kernels, may struggle with the non-linear structure and complex trends in real-world data. Given that, we introd… ▽ More Predicting multivariate time series is crucial, demanding precise modeling of intricate patterns, including inter-series dependencies and intra-series variations. Distinctive trend characteristics in each time series pose challenges, and existing methods, relying on basic moving average kernels, may struggle with the non-linear structure and complex trends in real-world data. Given that, we introduce a learnable decomposition strategy to capture dynamic trend information more reasonably. Additionally, we propose a dual attention module tailored to capture inter-series dependencies and intra-series variations simultaneously for better time series forecasting, which is implemented by channel-wise self-attention and autoregressive self-attention. To evaluate the effectiveness of our method, we conducted experiments across eight open-source datasets and compared it with the state-of-the-art methods. Through the comparison results, our Leddam (LEarnable Decomposition and Dual Attention Module) not only demonstrates significant advancements in predictive performance, but also the proposed decomposition strategy can be plugged into other methods with a large performance-boosting, from 11.87% to 48.56% MSE error degradation. △ Less

Submitted 1 July, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.03541 [pdf, other]

HAMLET: Graph Transformer Neural Operator for Partial Differential Equations

Authors: Andrey Bryutkin, Jiahao Huang, Zhongying Deng, Guang Yang, Carola-Bibiane Schönlieb, Angelica Aviles-Rivero

Abstract: We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to… ▽ More We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to PDEs of arbitrary geometries and varied input formats. Notably, HAMLET scales effectively with increasing data complexity and noise, showcasing its robustness. HAMLET is not just tailored to a single type of physical simulation, but can be applied across various domains. Moreover, it boosts model resilience and performance, especially in scenarios with limited data. We demonstrate, through extensive experiments, that our framework is capable of outperforming current techniques for PDEs. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 17 pages, 7 figures, 6 tables

arXiv:2311.18839 [pdf, other]

TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex Traffic Scenarios

Authors: Lihao Liu, Yanqi Cheng, Zhongying Deng, Shujun Wang, Dongdong Chen, Xiaowei Hu, Pietro Liò, Carola-Bibiane Schönlieb, Angelica Aviles-Rivero

Abstract: Multi-object tracking in traffic videos is a crucial research area, offering immense potential for enhancing traffic monitoring accuracy and promoting road safety measures through the utilisation of advanced machine learning algorithms. However, existing datasets for multi-object tracking in traffic videos often feature limited instances or focus on single classes, which cannot well simulate the c… ▽ More Multi-object tracking in traffic videos is a crucial research area, offering immense potential for enhancing traffic monitoring accuracy and promoting road safety measures through the utilisation of advanced machine learning algorithms. However, existing datasets for multi-object tracking in traffic videos often feature limited instances or focus on single classes, which cannot well simulate the challenges encountered in complex traffic scenarios. To address this gap, we introduce TrafficMOT, an extensive dataset designed to encompass diverse traffic situations with complex scenarios. To validate the complexity and challenges presented by TrafficMOT, we conducted comprehensive empirical studies using three different settings: fully-supervised, semi-supervised, and a recent powerful zero-shot foundation model Tracking Anything Model (TAM). The experimental results highlight the inherent complexity of this dataset, emphasising its value in driving advancements in the field of traffic monitoring and multi-object tracking. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: 17 pages, 7 figures

arXiv:2311.13682 [pdf, other]

Single-Shot Plug-and-Play Methods for Inverse Problems

Authors: Yanqi Cheng, Lipei Zhang, Zhenda Shen, Shujun Wang, Lequan Yu, Raymond H. Chan, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Abstract: The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on p… ▽ More The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on pre-trained denoisers using large datasets. In this work, we introduce Single-Shot PnP methods (SS-PnP), shifting the focus to solving inverse problems with minimal data. First, we integrate Single-Shot proximal denoisers into iterative methods, enabling training with single instances. Second, we propose implicit neural priors based on a novel function that preserves relevant frequencies to capture fine details while avoiding the issue of vanishing gradients. We demonstrate, through extensive numerical and visual experiments, that our method leads to better approximations. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.13610 [pdf, other]

TRIDENT: The Nonlinear Trilogy for Implicit Neural Representations

Authors: Zhenda Shen, Yanqi Cheng, Raymond H. Chan, Pietro Liò, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Abstract: Implicit neural representations (INRs) have garnered significant interest recently for their ability to model complex, high-dimensional data without explicit parameterisation. In this work, we introduce TRIDENT, a novel function for implicit neural representations characterised by a trilogy of nonlinearities. Firstly, it is designed to represent high-order features through order compactness. Secon… ▽ More Implicit neural representations (INRs) have garnered significant interest recently for their ability to model complex, high-dimensional data without explicit parameterisation. In this work, we introduce TRIDENT, a novel function for implicit neural representations characterised by a trilogy of nonlinearities. Firstly, it is designed to represent high-order features through order compactness. Secondly, TRIDENT efficiently captures frequency information, a feature called frequency compactness. Thirdly, it has the capability to represent signals or images such that most of its energy is concentrated in a limited spatial region, denoting spatial compactness. We demonstrated through extensive experiments on various inverse problems that our proposed function outperforms existing implicit neural representation functions. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.10092 [pdf, other]

Traffic Video Object Detection using Motion Prior

Authors: Lihao Liu, Yanqi Cheng, Dongdong Chen, **g He, Pietro Liò, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Abstract: Traffic videos inherently differ from generic videos in their stationary camera setup, thus providing a strong motion prior where objects often move in a specific direction over a short time interval. Existing works predominantly employ generic video object detection framework for traffic video object detection, which yield certain advantages such as broad applicability and robustness to diverse s… ▽ More Traffic videos inherently differ from generic videos in their stationary camera setup, thus providing a strong motion prior where objects often move in a specific direction over a short time interval. Existing works predominantly employ generic video object detection framework for traffic video object detection, which yield certain advantages such as broad applicability and robustness to diverse scenarios. However, they fail to harness the strength of motion prior to enhance detection accuracy. In this work, we propose two innovative methods to exploit the motion prior and boost the performance of both fully-supervised and semi-supervised traffic video object detection. Firstly, we introduce a new self-attention module that leverages the motion prior to guide temporal information integration in the fully-supervised setting. Secondly, we utilise the motion prior to develop a pseudo-labelling mechanism to eliminate noisy pseudo labels for the semi-supervised setting. Both of our motion-prior-centred methods consistently demonstrates superior performance, outperforming existing state-of-the-art approaches by a margin of 2% in terms of mAP. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 11 pages, 4 figures

arXiv:2310.20092 [pdf, other]

The Missing U for Efficient Diffusion Models

Authors: Sergio Calvo-Ordonez, Chun-Wun Cheng, Jiahao Huang, Lipei Zhang, Guang Yang, Carola-Bibiane Schonlieb, Angelica I Aviles-Rivero

Abstract: Diffusion Probabilistic Models stand as a critical tool in generative modelling, enabling the generation of complex data distributions. This family of generative models yields record-breaking performance in tasks such as image synthesis, video generation, and molecule design. Despite their capabilities, their efficiency, especially in the reverse process, remains a challenge due to slow convergenc… ▽ More Diffusion Probabilistic Models stand as a critical tool in generative modelling, enabling the generation of complex data distributions. This family of generative models yields record-breaking performance in tasks such as image synthesis, video generation, and molecule design. Despite their capabilities, their efficiency, especially in the reverse process, remains a challenge due to slow convergence rates and high computational costs. In this paper, we introduce an approach that leverages continuous dynamical systems to design a novel denoising network for diffusion models that is more parameter-efficient, exhibits faster convergence, and demonstrates increased noise robustness. Experimenting with Denoising Diffusion Probabilistic Models (DDPMs), our framework operates with approximately a quarter of the parameters, and $\sim$ 30\% of the Floating Point Operations (FLOPs) compared to standard U-Nets in DDPMs. Furthermore, our model is notably faster in inference than the baseline when measured in fair and equal conditions. We also provide a mathematical intuition as to why our proposed reverse process is faster as well as a mathematical discussion of the empirical tradeoffs in the denoising downstream task. Finally, we argue that our method is compatible with existing performance enhancement techniques, enabling further improvements in efficiency, quality, and speed. △ Less

Submitted 5 April, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: 23 pages, 14 figures, Accepted at Transactions of Machine Learning Research (04/2024)

arXiv:2309.13700 [pdf, other]

Video Adverse-Weather-Component Suppression Network via Weather Messenger and Adversarial Backpropagation

Authors: Yijun Yang, Angelica I. Aviles-Rivero, Huazhu Fu, Ye Liu, Weiming Wang, Lei Zhu

Abstract: Although convolutional neural networks (CNNs) have been proposed to remove adverse weather conditions in single images using a single set of pre-trained weights, they fail to restore weather videos due to the absence of temporal information. Furthermore, existing methods for removing adverse weather conditions (e.g., rain, fog, and snow) from videos can only handle one type of adverse weather. In… ▽ More Although convolutional neural networks (CNNs) have been proposed to remove adverse weather conditions in single images using a single set of pre-trained weights, they fail to restore weather videos due to the absence of temporal information. Furthermore, existing methods for removing adverse weather conditions (e.g., rain, fog, and snow) from videos can only handle one type of adverse weather. In this work, we propose the first framework for restoring videos from all adverse weather conditions by develo** a video adverse-weather-component suppression network (ViWS-Net). To achieve this, we first devise a weather-agnostic video transformer encoder with multiple transformer stages. Moreover, we design a long short-term temporal modeling mechanism for weather messenger to early fuse input adjacent video frames and learn weather-specific information. We further introduce a weather discriminator with gradient reversion, to maintain the weather-invariant common information and suppress the weather-specific information in pixel features, by adversarially predicting weather types. Finally, we develop a messenger-driven video transformer decoder to retrieve the residual weather-specific feature, which is spatiotemporally aggregated with hierarchical pixel features and refined to predict the clean target frame of input videos. Experimental results, on benchmark datasets and real-world weather videos, demonstrate that our ViWS-Net outperforms current state-of-the-art methods in terms of restoring videos degraded by any weather condition. △ Less

Submitted 24 September, 2023; originally announced September 2023.

arXiv:2308.01057 [pdf, other]

MammoDG: Generalisable Deep Learning Breaks the Limits of Cross-Domain Multi-Center Breast Cancer Screening

Authors: Yijun Yang, Shujun Wang, Lihao Liu, Sarah Hickman, Fiona J Gilbert, Carola-Bibiane Schönlieb, Angelica I. Aviles-Rivero

Abstract: Breast cancer is a major cause of cancer death among women, emphasising the importance of early detection for improved treatment outcomes and quality of life. Mammography, the primary diagnostic imaging test, poses challenges due to the high variability and patterns in mammograms. Double reading of mammograms is recommended in many screening programs to improve diagnostic accuracy but increases ra… ▽ More Breast cancer is a major cause of cancer death among women, emphasising the importance of early detection for improved treatment outcomes and quality of life. Mammography, the primary diagnostic imaging test, poses challenges due to the high variability and patterns in mammograms. Double reading of mammograms is recommended in many screening programs to improve diagnostic accuracy but increases radiologists' workload. Researchers explore Machine Learning models to support expert decision-making. Stand-alone models have shown comparable or superior performance to radiologists, but some studies note decreased sensitivity with multiple datasets, indicating the need for high generalisation and robustness models. This work devises MammoDG, a novel deep-learning framework for generalisable and reliable analysis of cross-domain multi-center mammography data. MammoDG leverages multi-view mammograms and a novel contrastive mechanism to enhance generalisation capabilities. Extensive validation demonstrates MammoDG's superiority, highlighting the critical importance of domain generalisation for trustworthy mammography analysis in imaging protocol variations. △ Less

Submitted 2 August, 2023; originally announced August 2023.

arXiv:2306.14350 [pdf, other]

CDiffMR: Can We Replace the Gaussian Noise with K-Space Undersampling for Fast MRI?

Authors: Jiahao Huang, Angelica Aviles-Rivero, Carola-Bibiane Schönlieb, Guang Yang

Abstract: Deep learning has shown the capability to substantially accelerate MRI reconstruction while acquiring fewer measurements. Recently, diffusion models have gained burgeoning interests as a novel group of deep learning-based generative methods. These methods seek to sample data points that belong to a target distribution from a Gaussian distribution, which has been successfully extended to MRI recons… ▽ More Deep learning has shown the capability to substantially accelerate MRI reconstruction while acquiring fewer measurements. Recently, diffusion models have gained burgeoning interests as a novel group of deep learning-based generative methods. These methods seek to sample data points that belong to a target distribution from a Gaussian distribution, which has been successfully extended to MRI reconstruction. In this work, we proposed a Cold Diffusion-based MRI reconstruction method called CDiffMR. Different from conventional diffusion models, the degradation operation of our CDiffMR is based on \textit{k}-space undersampling instead of adding Gaussian noise, and the restoration network is trained to harness a de-aliaseing function. We also design starting point and data consistency conditioning strategies to guide and accelerate the reverse process. More intriguingly, the pre-trained CDiffMR model can be reused for reconstruction tasks with different undersampling rates. We demonstrated, through extensive numerical and visual experiments, that the proposed CDiffMR can achieve comparable or even superior reconstruction results than state-of-the-art models. Compared to the diffusion model-based counterpart, CDiffMR reaches readily competing results using only $1.6 \sim 3.4\%$ for inference time. The code is publicly available at https://github.com/ayanglab/CDiffMR. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: 10 pages, 4 figures, accepted by MICCAI 2023

arXiv:2304.00996 [pdf, other]

Deep Learning-based Diffusion Tensor Cardiac Magnetic Resonance Reconstruction: A Comparison Study

Authors: Jiahao Huang, Pedro F. Ferreira, Lichao Wang, Yinzhe Wu, Angelica I. Aviles-Rivero, Carola-Bibiane Schonlieb, Andrew D. Scott, Zohya Khalique, Maria Dwornik, Ramyah Rajakulasingam, Ranil De Silva, Dudley J. Pennell, Sonia Nielles-Vallespin, Guang Yang

Abstract: In vivo cardiac diffusion tensor imaging (cDTI) is a promising Magnetic Resonance Imaging (MRI) technique for evaluating the micro-structure of myocardial tissue in the living heart, providing insights into cardiac function and enabling the development of innovative therapeutic strategies. However, the integration of cDTI into routine clinical practice is challenging due to the technical obstacles… ▽ More In vivo cardiac diffusion tensor imaging (cDTI) is a promising Magnetic Resonance Imaging (MRI) technique for evaluating the micro-structure of myocardial tissue in the living heart, providing insights into cardiac function and enabling the development of innovative therapeutic strategies. However, the integration of cDTI into routine clinical practice is challenging due to the technical obstacles involved in the acquisition, such as low signal-to-noise ratio and long scanning times. In this paper, we investigate and implement three different types of deep learning-based MRI reconstruction models for cDTI reconstruction. We evaluate the performance of these models based on reconstruction quality assessment and diffusion tensor parameter assessment. Our results indicate that the models we discussed in this study can be applied for clinical use at an acceleration factor (AF) of $\times 2$ and $\times 4$, with the D5C5 model showing superior fidelity for reconstruction and the SwinMR model providing higher perceptual scores. There is no statistical difference with the reference for all diffusion tensor parameters at AF $\times 2$ or most DT parameters at AF $\times 4$, and the quality of most diffusion tensor parameter maps are visually acceptable. SwinMR is recommended as the optimal approach for reconstruction at AF $\times 2$ and AF $\times 4$. However, we believed the models discussed in this studies are not prepared for clinical use at a higher AF. At AF $\times 8$, the performance of all models discussed remains limited, with only half of the diffusion tensor parameters being recovered to a level with no statistical difference from the reference. Some diffusion tensor parameter maps even provide wrong and misleading information. △ Less

Submitted 4 April, 2023; v1 submitted 31 March, 2023; originally announced April 2023.

Comments: 15 pages, 8 figures

arXiv:2303.10610 [pdf, other]

DiffMIC: Dual-Guidance Diffusion Network for Medical Image Classification

Authors: Yijun Yang, Huazhu Fu, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb, Lei Zhu

Abstract: Diffusion Probabilistic Models have recently shown remarkable performance in generative image modeling, attracting significant attention in the computer vision community. However, while a substantial amount of diffusion-based research has focused on generative tasks, few studies have applied diffusion models to general medical image classification. In this paper, we propose the first diffusion-bas… ▽ More Diffusion Probabilistic Models have recently shown remarkable performance in generative image modeling, attracting significant attention in the computer vision community. However, while a substantial amount of diffusion-based research has focused on generative tasks, few studies have applied diffusion models to general medical image classification. In this paper, we propose the first diffusion-based model (named DiffMIC) to address general medical image classification by eliminating unexpected noise and perturbations in medical images and robustly capturing semantic representation. To achieve this goal, we devise a dual conditional guidance strategy that conditions each diffusion step with multiple granularities to improve step-wise regional attention. Furthermore, we propose learning the mutual information in each granularity by enforcing Maximum-Mean Discrepancy regularization during the diffusion forward process. We evaluate the effectiveness of our DiffMIC on three medical classification tasks with different image modalities, including placental maturity grading on ultrasound images, skin lesion classification using dermatoscopic images, and diabetic retinopathy grading using fundus images. Our experimental results demonstrate that DiffMIC outperforms state-of-the-art methods by a significant margin, indicating the universality and effectiveness of the proposed model. Our code will be publicly available at https://github.com/scott-yjyang/DiffMIC. △ Less

Submitted 11 July, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

arXiv:2303.10390 [pdf, other]

HGIB: Prognosis for Alzheimer's Disease via Hypergraph Information Bottleneck

Authors: Shujun Wang, Angelica I Aviles-Rivero, Zoe Kourtzi, Carola-Bibiane Schönlieb

Abstract: Alzheimer's disease prognosis is critical for early Mild Cognitive Impairment patients for timely treatment to improve the patient's quality of life. Whilst existing prognosis techniques demonstrate potential results, they are highly limited in terms of using a single modality. Most importantly, they fail in considering a key element for prognosis: not all features extracted at the current moment… ▽ More Alzheimer's disease prognosis is critical for early Mild Cognitive Impairment patients for timely treatment to improve the patient's quality of life. Whilst existing prognosis techniques demonstrate potential results, they are highly limited in terms of using a single modality. Most importantly, they fail in considering a key element for prognosis: not all features extracted at the current moment may contribute to the prognosis prediction several years later. To address the current drawbacks of the literature, we propose a novel hypergraph framework based on an information bottleneck strategy (HGIB). Firstly, our framework seeks to discriminate irrelevant information, and therefore, solely focus on harmonising relevant information for future MCI conversion prediction e.g., two years later). Secondly, our model simultaneously accounts for multi-modal data based on imaging and non-imaging modalities. HGIB uses a hypergraph structure to represent the multi-modality data and accounts for various data modality types. Thirdly, the key of our model is based on a new optimisation scheme. It is based on modelling the principle of information bottleneck into loss functions that can be integrated into our hypergraph neural network. We demonstrate, through extensive experiments on ADNI, that our proposed HGIB framework outperforms existing state-of-the-art hypergraph neural networks for Alzheimer's disease prognosis. We showcase our model even under fewer labels. Finally, we further support the robustness and generalisation capabilities of our framework under both topological and feature perturbations. △ Less

Submitted 18 March, 2023; originally announced March 2023.

arXiv:2303.08113 [pdf, other]

Learning Homeomorphic Image Registration via Conformal-Invariant Hyperelastic Regularisation

Authors: **g Zou, Noémie Debroux, Lihao Liu, **g Qin, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Abstract: Deformable image registration is a fundamental task in medical image analysis and plays a crucial role in a wide range of clinical applications. Recently, deep learning-based approaches have been widely studied for deformable medical image registration and achieved promising results. However, existing deep learning image registration techniques do not theoretically guarantee topology-preserving tr… ▽ More Deformable image registration is a fundamental task in medical image analysis and plays a crucial role in a wide range of clinical applications. Recently, deep learning-based approaches have been widely studied for deformable medical image registration and achieved promising results. However, existing deep learning image registration techniques do not theoretically guarantee topology-preserving transformations. This is a key property to preserve anatomical structures and achieve plausible transformations that can be used in real clinical settings. We propose a novel framework for deformable image registration. Firstly, we introduce a novel regulariser based on conformal-invariant properties in a nonlinear elasticity setting. Our regulariser enforces the deformation field to be smooth, invertible and orientation-preserving. More importantly, we strictly guarantee topology preservation yielding to a clinical meaningful registration. Secondly, we boost the performance of our regulariser through coordinate MLPs, where one can view the to-be-registered images as continuously differentiable entities. We demonstrate, through numerical and visual experiments, that our framework is able to outperform current techniques for image registration. △ Less

Submitted 30 June, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: 13 pages, 3 figures

arXiv:2303.06274 [pdf]

CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting

Authors: Simon Graham, Quoc Dang Vu, Mostafa Jahanifar, Martin Weigert, Uwe Schmidt, Wenhua Zhang, Jun Zhang, Sen Yang, **xi Xiang, Xiyue Wang, Josef Lorenz Rumberger, Elias Baumann, Peter Hirsch, Lihao Liu, Chenyang Hong, Angelica I. Aviles-Rivero, Ayushi Jain, Heeyoung Ahn, Yiyu Hong, Hussam Azzuni, Min Xu, Mohammad Yaqub, Marie-Claire Blache, Benoît Piégu, Bertrand Vernay , et al. (64 additional authors not shown)

Abstract: Nuclear detection, segmentation and morphometric profiling are essential in hel** us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of repro… ▽ More Nuclear detection, segmentation and morphometric profiling are essential in hel** us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of reproducible algorithms for cellular recognition with real-time result inspection on public leaderboards. We conducted an extensive post-challenge analysis based on the top-performing models using 1,658 whole-slide images of colon tissue. With around 700 million detected nuclei per model, associated features were used for dysplasia grading and survival analysis, where we demonstrated that the challenge's improvement over the previous state-of-the-art led to significant boosts in downstream performance. Our findings also suggest that eosinophils and neutrophils play an important role in the tumour microevironment. We release challenge models and WSI-level results to foster the development of further methods for biomarker discovery. △ Less

Submitted 14 March, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

arXiv:2302.10273 [pdf, other]

ViGU: Vision GNN U-Net for Fast MRI

Authors: Jiahao Huang, Angelica Aviles-Rivero, Carola-Bibiane Schonlieb, Guang Yang

Abstract: Deep learning models have been widely applied for fast MRI. The majority of existing deep learning models, e.g., convolutional neural networks, work on data with Euclidean or regular grids structures. However, high-dimensional features extracted from MR data could be encapsulated in non-Euclidean manifolds. This disparity between the go-to assumption of existing models and data requirements limits… ▽ More Deep learning models have been widely applied for fast MRI. The majority of existing deep learning models, e.g., convolutional neural networks, work on data with Euclidean or regular grids structures. However, high-dimensional features extracted from MR data could be encapsulated in non-Euclidean manifolds. This disparity between the go-to assumption of existing models and data requirements limits the flexibility to capture irregular anatomical features in MR data. In this work, we introduce a novel Vision GNN type network for fast MRI called Vision GNN U-Net (ViGU). More precisely, the pixel array is first embedded into patches and then converted into a graph. Secondly, a U-shape network is developed using several graph blocks in symmetrical encoder and decoder paths. Moreover, we show that the proposed ViGU can also benefit from Generative Adversarial Networks yielding to its variant ViGU-GAN. We demonstrate, through numerical and visual experiments, that the proposed ViGU and GAN variant outperform existing CNN and GAN-based methods. Moreover, we show that the proposed network readily competes with approaches based on Transformers while requiring a fraction of the computational cost. More importantly, the graph structure of the network reveals how the network extracts features from MR images, providing intuitive explainability. △ Less

Submitted 23 January, 2023; originally announced February 2023.

Comments: ISBI2023

arXiv:2302.00626 [pdf, other]

Continuous U-Net: Faster, Greater and Noiseless

Authors: Chun-Wun Cheng, Christina Runkel, Lihao Liu, Raymond H Chan, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Abstract: Image segmentation is a fundamental task in image analysis and clinical practice. The current state-of-the-art techniques are based on U-shape type encoder-decoder networks with skip connections, called U-Net. Despite the powerful performance reported by existing U-Net type networks, they suffer from several major limitations. Issues include the hard coding of the receptive field size, compromisin… ▽ More Image segmentation is a fundamental task in image analysis and clinical practice. The current state-of-the-art techniques are based on U-shape type encoder-decoder networks with skip connections, called U-Net. Despite the powerful performance reported by existing U-Net type networks, they suffer from several major limitations. Issues include the hard coding of the receptive field size, compromising the performance and computational cost, as well as the fact that they do not account for inherent noise in the data. They have problems associated with discrete layers, and do not offer any theoretical underpinning. In this work we introduce continuous U-Net, a novel family of networks for image segmentation. Firstly, continuous U-Net is a continuous deep neural network that introduces new dynamic blocks modelled by second order ordinary differential equations. Secondly, we provide theoretical guarantees for our network demonstrating faster convergence, higher robustness and less sensitivity to noise. Thirdly, we derive qualitative measures to tailor-made segmentation tasks. We demonstrate, through extensive numerical and visual results, that our model outperforms existing U-Net blocks for several medical image segmentation benchmarking datasets. △ Less

Submitted 1 February, 2023; originally announced February 2023.

arXiv:2211.09620 [pdf, other]

TrafficCAM: A Versatile Dataset for Traffic Flow Segmentation

Authors: Zhongying Deng, Yanqi Chen, Lihao Liu, Shujun Wang, Rihuan Ke, Carola-Bibiane Schonlieb, Angelica I Aviles-Rivero

Abstract: Traffic flow analysis is revolutionising traffic management. Qualifying traffic flow data, traffic control bureaus could provide drivers with real-time alerts, advising the fastest routes and therefore optimising transportation logistics and reducing congestion. The existing traffic flow datasets have two major limitations. They feature a limited number of classes, usually limited to one type of v… ▽ More Traffic flow analysis is revolutionising traffic management. Qualifying traffic flow data, traffic control bureaus could provide drivers with real-time alerts, advising the fastest routes and therefore optimising transportation logistics and reducing congestion. The existing traffic flow datasets have two major limitations. They feature a limited number of classes, usually limited to one type of vehicle, and the scarcity of unlabelled data. In this paper, we introduce a new benchmark traffic flow image dataset called TrafficCAM. Our dataset distinguishes itself by two major highlights. Firstly, TrafficCAM provides both pixel-level and instance-level semantic labelling along with a large range of types of vehicles and pedestrians. It is composed of a large and diverse set of video sequences recorded in streets from eight Indian cities with stationary cameras. Secondly, TrafficCAM aims to establish a new benchmark for develo** fully-supervised tasks, and importantly, semi-supervised learning techniques. It is the first dataset that provides a vast amount of unlabelled data, hel** to better capture traffic flow qualification under a low cost annotation requirement. More precisely, our dataset has 4,402 image frames with semantic and instance annotations along with 59,944 unlabelled image frames. We validate our new dataset through a large and comprehensive range of experiments on several state-of-the-art approaches under four different settings: fully-supervised semantic and instance segmentation, and semi-supervised semantic and instance segmentation tasks. Our benchmark dataset will be released. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.09593 [pdf, other]

NorMatch: Matching Normalizing Flows with Discriminative Classifiers for Semi-Supervised Learning

Authors: Zhongying Deng, Rihuan Ke, Carola-Bibiane Schonlieb, Angelica I Aviles-Rivero

Abstract: Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data. To better exploit the unlabeled data the latest SSL methods use pseudo-labels predicted from a single discriminative classifier. However, the generated pseudo-labels are inevitably linked to inherent confirmation bias and noise which greatly affects the model performance. In this wo… ▽ More Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data. To better exploit the unlabeled data the latest SSL methods use pseudo-labels predicted from a single discriminative classifier. However, the generated pseudo-labels are inevitably linked to inherent confirmation bias and noise which greatly affects the model performance. In this work we introduce a new framework for SSL named NorMatch. Firstly, we introduce a new uncertainty estimation scheme based on normalizing flows, as an auxiliary classifier, to enforce highly certain pseudo-labels yielding a boost of the discriminative classifiers. Secondly, we introduce a threshold-free sample weighting strategy to exploit better both high and low confidence pseudo-labels. Furthermore, we utilize normalizing flows to model, in an unsupervised fashion, the distribution of unlabeled data. This modelling assumption can further improve the performance of generative classifiers via unlabeled data, and thus, implicitly contributing to training a better discriminative classifier. We demonstrate, through numerical and visual results, that NorMatch achieves state-of-the-art performance on several datasets. △ Less

Submitted 16 February, 2024; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: Accepted to Transactions on Machine Learning Research

arXiv:2211.06885 [pdf, other]

SCOTCH and SODA: A Transformer Video Shadow Detection Framework

Authors: Lihao Liu, Jean Prost, Lei Zhu, Nicolas Papadakis, Pietro Liò, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Abstract: Shadows in videos are difficult to detect because of the large shadow deformation between frames. In this work, we argue that accounting for shadow deformation is essential when designing a video shadow detection method. To this end, we introduce the shadow deformation attention trajectory (SODA), a new type of video self-attention module, specially designed to handle the large shadow deformations… ▽ More Shadows in videos are difficult to detect because of the large shadow deformation between frames. In this work, we argue that accounting for shadow deformation is essential when designing a video shadow detection method. To this end, we introduce the shadow deformation attention trajectory (SODA), a new type of video self-attention module, specially designed to handle the large shadow deformations in videos. Moreover, we present a new shadow contrastive learning mechanism (SCOTCH) which aims at guiding the network to learn a unified shadow representation from massive positive shadow pairs across different videos. We demonstrate empirically the effectiveness of our two contributions in an ablation study. Furthermore, we show that SCOTCH and SODA significantly outperforms existing techniques for video shadow detection. Code is available at the project page: https://lihaoliu-cambridge.github.io/scotch_and_soda/ △ Less

Submitted 26 March, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: Accepted to CVPR 2023

arXiv:2209.08647 [pdf, other]

Why Deep Surgical Models Fail?: Revisiting Surgical Action Triplet Recognition through the Lens of Robustness

Authors: Yanqi Cheng, Lihao Liu, Shujun Wang, Yueming **, Carola-Bibiane Schönlieb, Angelica I. Aviles-Rivero

Abstract: Surgical action triplet recognition provides a better understanding of the surgical scene. This task is of high relevance as it provides the surgeon with context-aware support and safety. The current go-to strategy for improving performance is the development of new network mechanisms. However, the performance of current state-of-the-art techniques is substantially lower than other surgical tasks.… ▽ More Surgical action triplet recognition provides a better understanding of the surgical scene. This task is of high relevance as it provides the surgeon with context-aware support and safety. The current go-to strategy for improving performance is the development of new network mechanisms. However, the performance of current state-of-the-art techniques is substantially lower than other surgical tasks. Why is this happening? This is the question that we address in this work. We present the first study to understand the failure of existing deep learning models through the lens of robustness and explainability. Firstly, we study current existing models under weak and strong $δ-$perturbations via an adversarial optimisation scheme. We then analyse the failure modes via feature based explanations. Our study reveals that the key to improving performance and increasing reliability is in the core and spurious attributes. Our work opens the door to more trustworthy and reliable deep learning models in surgical data science. △ Less

Submitted 20 February, 2023; v1 submitted 18 September, 2022; originally announced September 2022.

arXiv:2204.02399 [pdf, other]

Multi-Modal Hypergraph Diffusion Network with Dual Prior for Alzheimer Classification

Authors: Angelica I. Aviles-Rivero, Christina Runkel, Nicolas Papadakis, Zoe Kourtzi, Carola-Bibiane Schönlieb

Abstract: The automatic early diagnosis of prodromal stages of Alzheimer's disease is of great relevance for patient treatment to improve quality of life. We address this problem as a multi-modal classification task. Multi-modal data provides richer and complementary information. However, existing techniques only consider either lower order relations between the data and single/multi-modal imaging data. In… ▽ More The automatic early diagnosis of prodromal stages of Alzheimer's disease is of great relevance for patient treatment to improve quality of life. We address this problem as a multi-modal classification task. Multi-modal data provides richer and complementary information. However, existing techniques only consider either lower order relations between the data and single/multi-modal imaging data. In this work, we introduce a novel semi-supervised hypergraph learning framework for Alzheimer's disease diagnosis. Our framework allows for higher-order relations among multi-modal imaging and non-imaging data whilst requiring a tiny labelled set. Firstly, we introduce a dual embedding strategy for constructing a robust hypergraph that preserves the data semantics. We achieve this by enforcing perturbation invariance at the image and graph levels using a contrastive based mechanism. Secondly, we present a dynamically adjusted hypergraph diffusion model, via a semi-explicit flow, to improve the predictive uncertainty. We demonstrate, through our experiments, that our framework is able to outperform current techniques for Alzheimer's disease diagnosis. △ Less

Submitted 6 September, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

Journal ref: MICCAI 2022

arXiv:2203.05684 [pdf, other]

PC-SwinMorph: Patch Representation for Unsupervised Medical Image Registration and Segmentation

Authors: Lihao Liu, Zhening Huang, Pietro Liò, Carola-Bibiane Schönlieb, Angelica I. Aviles-Rivero

Abstract: Medical image registration and segmentation are critical tasks for several clinical procedures. Manual realisation of those tasks is time-consuming and the quality is highly dependent on the level of expertise of the physician. To mitigate that laborious task, automatic tools have been developed where the majority of solutions are supervised techniques. However, in medical domain, the strong assum… ▽ More Medical image registration and segmentation are critical tasks for several clinical procedures. Manual realisation of those tasks is time-consuming and the quality is highly dependent on the level of expertise of the physician. To mitigate that laborious task, automatic tools have been developed where the majority of solutions are supervised techniques. However, in medical domain, the strong assumption of having a well-representative ground truth is far from being realistic. To overcome this challenge, unsupervised techniques have been investigated. However, they are still limited in performance and they fail to produce plausible results. In this work, we propose a novel unified unsupervised framework for image registration and segmentation that we called PC-SwinMorph. The core of our framework is two patch-based strategies, where we demonstrate that patch representation is key for performance gain. We first introduce a patch-based contrastive strategy that enforces locality conditions and richer feature representation. Secondly, we utilise a 3D window/shifted-window multi-head self-attention module as a patch stitching strategy to eliminate artifacts from the patch splitting. We demonstrate, through a set of numerical and visual results, that our technique outperforms current state-of-the-art unsupervised techniques. △ Less

Submitted 20 July, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: 10 pages, 7 figures, 2 tables

arXiv:2203.00157 [pdf, other]

Simultaneous Semantic and Instance Segmentation for Colon Nuclei Identification and Counting

Authors: Lihao Liu, Chenyang Hong, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb

Abstract: We address the problem of automated nuclear segmentation, classification, and quantification from Haematoxylin and Eosin stained histology images, which is of great relevance for several downstream computational pathology applications. In this work, we present a solution framed as a simultaneous semantic and instance segmentation framework. Our solution is part of the Colon Nuclei Identification a… ▽ More We address the problem of automated nuclear segmentation, classification, and quantification from Haematoxylin and Eosin stained histology images, which is of great relevance for several downstream computational pathology applications. In this work, we present a solution framed as a simultaneous semantic and instance segmentation framework. Our solution is part of the Colon Nuclei Identification and Counting (CoNIC) Challenge. We first train a semantic and instance segmentation model separately. Our framework uses as backbone HoverNet and Cascade Mask-RCNN models. We then ensemble the results with a custom Non-Maximum Suppression embedding (NMS). In our framework, the semantic model computes a class prediction for the cells whilst the instance model provides a refined segmentation. We demonstrate, through our experimental results, that our model outperforms the provided baselines by a large margin. △ Less

Submitted 15 April, 2022; v1 submitted 28 February, 2022; originally announced March 2022.

Comments: 9 pages; 4 figures

arXiv:2107.10014 [pdf, other]

Delving Into Deep Walkers: A Convergence Analysis of Random-Walk-Based Vertex Embeddings

Authors: Dominik Kloepfer, Angelica I. Aviles-Rivero, Daniel Heydecker

Abstract: Graph vertex embeddings based on random walks have become increasingly influential in recent years, showing good performance in several tasks as they efficiently transform a graph into a more computationally digestible format while preserving relevant information. However, the theoretical properties of such algorithms, in particular the influence of hyperparameters and of the graph structure on th… ▽ More Graph vertex embeddings based on random walks have become increasingly influential in recent years, showing good performance in several tasks as they efficiently transform a graph into a more computationally digestible format while preserving relevant information. However, the theoretical properties of such algorithms, in particular the influence of hyperparameters and of the graph structure on their convergence behaviour, have so far not been well-understood. In this work, we provide a theoretical analysis for random-walks based embeddings techniques. Firstly, we prove that, under some weak assumptions, vertex embeddings derived from random walks do indeed converge both in the single limit of the number of random walks $N \to \infty$ and in the double limit of both $N$ and the length of each random walk $L\to\infty$. Secondly, we derive concentration bounds quantifying the converge rate of the corpora for the single and double limits. Thirdly, we use these results to derive a heuristic for choosing the hyperparameters $N$ and $L$. We validate and illustrate the practical importance of our findings with a range of numerical and visual experiments on several graphs drawn from real-world applications. △ Less

Submitted 21 July, 2021; originally announced July 2021.

arXiv:2106.04527 [pdf, other]

doi 10.1109/TNNLS.2022.3203315

LaplaceNet: A Hybrid Graph-Energy Neural Network for Deep Semi-Supervised Classification

Authors: Philip Sellars, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb

Abstract: Semi-supervised learning has received a lot of recent attention as it alleviates the need for large amounts of labelled data which can often be expensive, requires expert knowledge and be time consuming to collect. Recent developments in deep semi-supervised classification have reached unprecedented performance and the gap between supervised and semi-supervised learning is ever-decreasing. This im… ▽ More Semi-supervised learning has received a lot of recent attention as it alleviates the need for large amounts of labelled data which can often be expensive, requires expert knowledge and be time consuming to collect. Recent developments in deep semi-supervised classification have reached unprecedented performance and the gap between supervised and semi-supervised learning is ever-decreasing. This improvement in performance has been based on the inclusion of numerous technical tricks, strong augmentation techniques and costly optimisation schemes with multi-term loss functions. We propose a new framework, LaplaceNet, for deep semi-supervised classification that has a greatly reduced model complexity. We utilise a hybrid approach where pseudolabels are produced by minimising the Laplacian energy on a graph. These pseudo-labels are then used to iteratively train a neural-network backbone. Our model outperforms state-of-the art methods for deep semi-supervised classification, over several benchmark datasets. Furthermore, we consider the application of strong-augmentations to neural networks theoretically and justify the use of a multi-sampling approach for semi-supervised learning. We demonstrate, through rigorous experimentation, that a multi-sampling augmentation approach improves generalisation and reduces the sensitivity of the network to augmentation. △ Less

Submitted 28 September, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: https://ieeexplore.ieee.org/document/9900300

Journal ref: IEEE Transactions on Neural Networks and Learning Systems 2022

arXiv:2106.03755 [pdf, other]

HERS Superpixels: Deep Affinity Learning for Hierarchical Entropy Rate Segmentation

Authors: Hankui Peng, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb

Abstract: Superpixels serve as a powerful preprocessing tool in numerous computer vision tasks. By using superpixel representation, the number of image primitives can be largely reduced by orders of magnitudes. With the rise of deep learning in recent years, a few works have attempted to feed deeply learned features / graphs into existing classical superpixel techniques. However, none of them are able to pr… ▽ More Superpixels serve as a powerful preprocessing tool in numerous computer vision tasks. By using superpixel representation, the number of image primitives can be largely reduced by orders of magnitudes. With the rise of deep learning in recent years, a few works have attempted to feed deeply learned features / graphs into existing classical superpixel techniques. However, none of them are able to produce superpixels in near real-time, which is crucial to the applicability of superpixels in practice. In this work, we propose a two-stage graph-based framework for superpixel segmentation. In the first stage, we introduce an efficient Deep Affinity Learning (DAL) network that learns pairwise pixel affinities by aggregating multi-scale information. In the second stage, we propose a highly efficient superpixel method called Hierarchical Entropy Rate Segmentation (HERS). Using the learned affinities from the first stage, HERS builds a hierarchical tree structure that can produce any number of highly adaptive superpixels instantaneously. We demonstrate, through visual and numerical experiments, the effectiveness and efficiency of our method compared to various state-of-the-art superpixel methods. △ Less

Submitted 18 November, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

ACM Class: I.4; I.5

arXiv:2101.07945 [pdf, other]

Beyond Fine-tuning: Classifying High Resolution Mammograms using Function-Preserving Transformations

Authors: Tao Wei, Angelica I Aviles-Rivero, Shuo Wang, Yuan Huang, Fiona J Gilbert, Carola-Bibiane Schönlieb, Chang Wen Chen

Abstract: The task of classifying mammograms is very challenging because the lesion is usually small in the high resolution image. The current state-of-the-art approaches for medical image classification rely on using the de-facto method for ConvNets - fine-tuning. However, there are fundamental differences between natural images and medical images, which based on existing evidence from the literature, limi… ▽ More The task of classifying mammograms is very challenging because the lesion is usually small in the high resolution image. The current state-of-the-art approaches for medical image classification rely on using the de-facto method for ConvNets - fine-tuning. However, there are fundamental differences between natural images and medical images, which based on existing evidence from the literature, limits the overall performance gain when designed with algorithmic approaches. In this paper, we propose to go beyond fine-tuning by introducing a novel framework called MorphHR, in which we highlight a new transfer learning scheme. The idea behind the proposed framework is to integrate function-preserving transformations, for any continuous non-linear activation neurons, to internally regularise the network for improving mammograms classification. The proposed solution offers two major advantages over the existing techniques. Firstly and unlike fine-tuning, the proposed approach allows for modifying not only the last few layers but also several of the first ones on a deep ConvNet. By doing this, we can design the network front to be suitable for learning domain specific features. Secondly, the proposed scheme is scalable to hardware. Therefore, one can fit high resolution images on standard GPU memory. We show that by using high resolution images, one prevents losing relevant information. We demonstrate, through numerical and visual experiments, that the proposed approach yields to a significant improvement in the classification performance over state-of-the-art techniques, and is indeed on a par with radiology experts. Moreover and for generalisation purposes, we show the effectiveness of the proposed learning scheme on another large dataset, the ChestX-ray14, surpassing current state-of-the-art techniques. △ Less

Submitted 19 January, 2021; originally announced January 2021.

Comments: 10 pages, 5 figures

arXiv:2012.05703 [pdf, other]

TFPnP: Tuning-free Plug-and-Play Proximal Algorithm with Applications to Inverse Imaging Problems

Authors: Kaixuan Wei, Angelica Aviles-Rivero, **gwei Liang, Ying Fu, Hua Huang, Carola-Bibiane Schönlieb

Abstract: Plug-and-Play (PnP) is a non-convex optimization framework that combines proximal algorithms, for example, the alternating direction method of multipliers (ADMM), with advanced denoising priors. Over the past few years, great empirical success has been obtained by PnP algorithms, especially for the ones that integrate deep learning-based denoisers. However, a key challenge of PnP approaches is the… ▽ More Plug-and-Play (PnP) is a non-convex optimization framework that combines proximal algorithms, for example, the alternating direction method of multipliers (ADMM), with advanced denoising priors. Over the past few years, great empirical success has been obtained by PnP algorithms, especially for the ones that integrate deep learning-based denoisers. However, a key challenge of PnP approaches is the need for manual parameter tweaking as it is essential to obtain high-quality results across the high discrepancy in imaging conditions and varying scene content. In this work, we present a class of tuning-free PnP proximal algorithms that can determine parameters such as denoising strength, termination time, and other optimization-specific parameters automatically. A core part of our approach is a policy network for automated parameter search which can be effectively learned via a mixture of model-free and model-based deep reinforcement learning strategies. We demonstrate, through rigorous numerical and visual experiments, that the learned policy can customize parameters to different settings, and is often more efficient and effective than existing handcrafted criteria. Moreover, we discuss several practical considerations of PnP denoisers, which together with our learned policy yield state-of-the-art results. This advanced performance is prevalent on both linear and nonlinear exemplar inverse imaging problems, and in particular shows promising results on compressed sensing MRI, sparse-view CT, single-photon imaging, and phase retrieval. △ Less

Submitted 18 September, 2021; v1 submitted 18 November, 2020; originally announced December 2020.

Comments: The Journal version (47 pages) of arXiv:2002.09611 (ICML'20 Award Paper); Code is released at https://github.com/Vandermode/TFPnP

arXiv:2012.00827 [pdf, other]

A Three-Stage Self-Training Framework for Semi-Supervised Semantic Segmentation

Authors: Rihuan Ke, Angelica Aviles-Rivero, Saurabh Pandey, Saikumar Reddy, Carola-Bibiane Schönlieb

Abstract: Semantic segmentation has been widely investigated in the community, in which the state of the art techniques are based on supervised models. Those models have reported unprecedented performance at the cost of requiring a large set of high quality segmentation masks. To obtain such annotations is highly expensive and time consuming, in particular, in semantic segmentation where pixel-level annotat… ▽ More Semantic segmentation has been widely investigated in the community, in which the state of the art techniques are based on supervised models. Those models have reported unprecedented performance at the cost of requiring a large set of high quality segmentation masks. To obtain such annotations is highly expensive and time consuming, in particular, in semantic segmentation where pixel-level annotations are required. In this work, we address this problem by proposing a holistic solution framed as a three-stage self-training framework for semi-supervised semantic segmentation. The key idea of our technique is the extraction of the pseudo-masks statistical information to decrease uncertainty in the predicted probability whilst enforcing segmentation consistency in a multi-task fashion. We achieve this through a three-stage solution. Firstly, we train a segmentation network to produce rough pseudo-masks which predicted probability is highly uncertain. Secondly, we then decrease the uncertainty of the pseudo-masks using a multi-task model that enforces consistency whilst exploiting the rich statistical information of the data. We compare our approach with existing methods for semi-supervised semantic segmentation and demonstrate its state-of-the-art performance with extensive experiments. △ Less

Submitted 9 January, 2022; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: Accepted for publication by IEEE Transactions on Image Processing

arXiv:2011.08894 [pdf, other]

Contrastive Registration for Unsupervised Medical Image Segmentation

Authors: Lihao Liu, Angelica I Aviles-Rivero, Carola-Bibiane Schönlieb

Abstract: Medical image segmentation is a relevant task as it serves as the first step for several diagnosis processes, thus it is indispensable in clinical usage. Whilst major success has been reported using supervised techniques, they assume a large and well-representative labelled set. This is a strong assumption in the medical domain where annotations are expensive, time-consuming, and inherent to human… ▽ More Medical image segmentation is a relevant task as it serves as the first step for several diagnosis processes, thus it is indispensable in clinical usage. Whilst major success has been reported using supervised techniques, they assume a large and well-representative labelled set. This is a strong assumption in the medical domain where annotations are expensive, time-consuming, and inherent to human bias. To address this problem, unsupervised techniques have been proposed in the literature yet it is still an open problem due to the difficulty of learning any transformation pattern. In this work, we present a novel optimisation model framed into a new CNN-based contrastive registration architecture for unsupervised medical image segmentation. The core of our approach is to exploit image-level registration and feature-level from a contrastive learning mechanism, to perform registration-based segmentation. Firstly, we propose an architecture to capture the image-to-image transformation pattern via registration for unsupervised medical image segmentation. Secondly, we embed a contrastive learning mechanism into the registration architecture to enhance the discriminating capacity of the network in the feature-level. We show that our proposed technique mitigates the major drawbacks of existing unsupervised techniques. We demonstrate, through numerical and visual experiments, that our technique substantially outperforms the current state-of-the-art unsupervised segmentation methods on two major medical image datasets. △ Less

Submitted 20 July, 2022; v1 submitted 17 November, 2020; originally announced November 2020.

Comments: 12 pages, 8 figures, 4 tables

arXiv:2010.00378 [pdf, other]

GraphXCOVID: Explainable Deep Graph Diffusion Pseudo-Labelling for Identifying COVID-19 on Chest X-rays

Authors: Angelica I Aviles-Rivero, Philip Sellars, Carola-Bibiane Schönlieb, Nicolas Papadakis

Abstract: Can one learn to diagnose COVID-19 under extreme minimal supervision? Since the outbreak of the novel COVID-19 there has been a rush for develo** Artificial Intelligence techniques for expert-level disease identification on Chest X-ray data. In particular, the use of deep supervised learning has become the go-to paradigm. However, the performance of such models is heavily dependent on the availa… ▽ More Can one learn to diagnose COVID-19 under extreme minimal supervision? Since the outbreak of the novel COVID-19 there has been a rush for develo** Artificial Intelligence techniques for expert-level disease identification on Chest X-ray data. In particular, the use of deep supervised learning has become the go-to paradigm. However, the performance of such models is heavily dependent on the availability of a large and representative labelled dataset. The creation of which is a heavily expensive and time consuming task, and especially imposes a great challenge for a novel disease. Semi-supervised learning has shown the ability to match the incredible performance of supervised models whilst requiring a small fraction of the labelled examples. This makes the semi-supervised paradigm an attractive option for identifying COVID-19. In this work, we introduce a graph based deep semi-supervised framework for classifying COVID-19 from chest X-rays. Our framework introduces an optimisation model for graph diffusion that reinforces the natural relation among the tiny labelled set and the vast unlabelled data. We then connect the diffusion prediction output as pseudo-labels that are used in an iterative scheme in a deep net. We demonstrate, through our experiments, that our model is able to outperform the current leading supervised model with a tiny fraction of the labelled examples. Finally, we provide attention maps to accommodate the radiologist's mental model, better fitting their perceptual and cognitive abilities. These visualisation aims to assist the radiologist in judging whether the diagnostic is correct or not, and in consequence to accelerate the decision. △ Less

Submitted 4 July, 2021; v1 submitted 30 September, 2020; originally announced October 2020.

arXiv:2008.06388 [pdf]

doi 10.1038/s42256-021-00307-0

Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans

Authors: Michael Roberts, Derek Driggs, Matthew Thorpe, Julian Gilbey, Michael Yeung, Stephan Ursprung, Angelica I. Aviles-Rivero, Christian Etmann, Cathal McCague, Lucian Beer, Jonathan R. Weir-McCall, Zhongzhao Teng, Effrossyni Gkrania-Klotsas, James H. F. Rudd, Evis Sala, Carola-Bibiane Schönlieb

Abstract: Machine learning methods offer great promise for fast and accurate detection and prognostication of COVID-19 from standard-of-care chest radiographs (CXR) and computed tomography (CT) images. Many articles have been published in 2020 describing new machine learning-based models for both of these tasks, but it is unclear which are of potential clinical utility. In this systematic review, we search… ▽ More Machine learning methods offer great promise for fast and accurate detection and prognostication of COVID-19 from standard-of-care chest radiographs (CXR) and computed tomography (CT) images. Many articles have been published in 2020 describing new machine learning-based models for both of these tasks, but it is unclear which are of potential clinical utility. In this systematic review, we search EMBASE via OVID, MEDLINE via PubMed, bioRxiv, medRxiv and arXiv for published papers and preprints uploaded from January 1, 2020 to October 3, 2020 which describe new machine learning models for the diagnosis or prognosis of COVID-19 from CXR or CT images. Our search identified 2,212 studies, of which 415 were included after initial screening and, after quality screening, 61 studies were included in this systematic review. Our review finds that none of the models identified are of potential clinical use due to methodological flaws and/or underlying biases. This is a major weakness, given the urgency with which validated COVID-19 models are needed. To address this, we give many recommendations which, if followed, will solve these issues and lead to higher quality model development and well documented manuscripts. △ Less

Submitted 5 January, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

Comments: 35 pages, 3 figures, 2 tables, updated to the period 1 January 2020 - 3 October 2020

Journal ref: Nature Machine Intelligence 3, 199-217 (2021)

arXiv:2003.06451 [pdf, other]

The GraphNet Zoo: An All-in-One Graph Based Deep Semi-Supervised Framework for Medical Image Classification

Authors: Marianne de Vriendt, Philip Sellars, Angelica I Aviles-Rivero

Abstract: We consider the problem of classifying a medical image dataset when we have a limited amounts of labels. This is very common yet challenging setting as labelled data is expensive, time consuming to collect and may require expert knowledge. The current classification go-to of deep supervised learning is unable to cope with such a problem setup. However, using semi-supervised learning, one can produ… ▽ More We consider the problem of classifying a medical image dataset when we have a limited amounts of labels. This is very common yet challenging setting as labelled data is expensive, time consuming to collect and may require expert knowledge. The current classification go-to of deep supervised learning is unable to cope with such a problem setup. However, using semi-supervised learning, one can produce accurate classifications using a significantly reduced amount of labelled data. Therefore, semi-supervised learning is perfectly suited for medical image classification. However, there has almost been no uptake of semi-supervised methods in the medical domain. In this work, we propose an all-in-one framework for deep semi-supervised classification focusing on graph based approaches, which up to our knowledge it is the first time that an approach with minimal labels has been shown to such an unprecedented scale with medical data. We introduce the concept of hybrid models by defining a classifier as a combination between an energy-based model and a deep net. Our energy functional is built on the Dirichlet energy based on the graph p-Laplacian. Our framework includes energies based on the $\ell_1$ and $\ell_2$ norms. We then connected this energy model to a deep net to generate a much richer feature space to construct a stronger graph. Our framework can be set to be adapted to any complex dataset. We demonstrate, through extensive numerical comparisons, that our approach readily compete with fully-supervised state-of-the-art techniques for the applications of Malaria Cells, Mammograms and Chest X-ray classification whilst using only 20% of labels. △ Less

Submitted 26 June, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

arXiv:2002.09611 [pdf, other]

Tuning-free Plug-and-Play Proximal Algorithm for Inverse Imaging Problems

Authors: Kaixuan Wei, Angelica Aviles-Rivero, **gwei Liang, Ying Fu, Carola-Bibiane Schönlieb, Hua Huang

Abstract: Plug-and-play (PnP) is a non-convex framework that combines ADMM or other proximal algorithms with advanced denoiser priors. Recently, PnP has achieved great empirical success, especially with the integration of deep learning-based denoisers. However, a key problem of PnP based approaches is that they require manual parameter tweaking. It is necessary to obtain high-quality results across the high… ▽ More Plug-and-play (PnP) is a non-convex framework that combines ADMM or other proximal algorithms with advanced denoiser priors. Recently, PnP has achieved great empirical success, especially with the integration of deep learning-based denoisers. However, a key problem of PnP based approaches is that they require manual parameter tweaking. It is necessary to obtain high-quality results across the high discrepancy in terms of imaging conditions and varying scene content. In this work, we present a tuning-free PnP proximal algorithm, which can automatically determine the internal parameters including the penalty parameter, the denoising strength and the terminal time. A key part of our approach is to develop a policy network for automatic search of parameters, which can be effectively learned via mixed model-free and model-based deep reinforcement learning. We demonstrate, through numerical and visual experiments, that the learned policy can customize different parameters for different states, and often more efficient and effective than existing handcrafted criteria. Moreover, we discuss the practical considerations of the plugged denoisers, which together with our learned policy yield state-of-the-art results. This is prevalent on both linear and nonlinear exemplary inverse imaging problems, and in particular, we show promising results on Compressed Sensing MRI and phase retrieval. △ Less

Submitted 18 November, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

arXiv:2001.05317 [pdf, other]

CycleCluster: Modernising Clustering Regularisation for Deep Semi-Supervised Classification

Authors: Philip Sellars, Angelica Aviles-Rivero, Carola Bibiane Schönlieb

Abstract: Given the potential difficulties in obtaining large quantities of labelled data, many works have explored the use of deep semi-supervised learning, which uses both labelled and unlabelled data to train a neural network architecture. The vast majority of SSL approaches focus on implementing the low-density separation assumption or consistency assumption, the idea that decision boundaries should lie… ▽ More Given the potential difficulties in obtaining large quantities of labelled data, many works have explored the use of deep semi-supervised learning, which uses both labelled and unlabelled data to train a neural network architecture. The vast majority of SSL approaches focus on implementing the low-density separation assumption or consistency assumption, the idea that decision boundaries should lie in low density regions. However, they have implemented this assumption by making local changes to the decision boundary at each data point, ignoring the global structure of the data. In this work, we explore an alternative approach using the global information present in the clustered data to update our decision boundaries. We propose a novel framework, CycleCluster, for deep semi-supervised classification. Our core optimisation is driven by a new clustering based regularisation along with a graph based pseudo-labels and a shared deep network. Demonstrating that direct implementation of the cluster assumption is a viable alternative to the popular consistency based regularisation. We demonstrate the predictive capability of our technique through a careful set of numerical results. △ Less

Submitted 1 September, 2021; v1 submitted 15 January, 2020; originally announced January 2020.

arXiv:1912.07764 [pdf, other]

Dim the Lights! -- Low-Rank Prior Temporal Data for Specular-Free Video Recovery

Authors: Samar M. Alsaleh, Angelica I. Aviles-Rivero, Noemie Debroux, James K. Hahn

Abstract: The appearance of an object is significantly affected by the illumination conditions in the environment. This is more evident with strong reflective objects as they suffer from more dominant specular reflections, causing information loss and discontinuity in the image domain. In this paper, we present a novel framework for specular-free video recovery with special emphasis on dealing with complex… ▽ More The appearance of an object is significantly affected by the illumination conditions in the environment. This is more evident with strong reflective objects as they suffer from more dominant specular reflections, causing information loss and discontinuity in the image domain. In this paper, we present a novel framework for specular-free video recovery with special emphasis on dealing with complex motions coming from objects or camera. Our solution is a twostep approach that allows for both detection and restoration of the damaged regions on video data. We first propose a spatially adaptive detection term that searches for the damage areas. We then introduce a variational solution for specular-free video recovery that allows exploiting spatio-temporal correlations by representing prior data in a low-rank form. We demonstrate that our solution prevents major drawbacks of existing approaches while improving the performance in both detection accuracy and inpainting quality. Finally, we show that our approach can be applied to other problems such as object removal. △ Less

Submitted 16 December, 2019; originally announced December 2019.

Comments: 22 pages, 6 figures

arXiv:1912.07648 [pdf, other]

Rethinking Medical Image Reconstruction via Shape Prior, Going Deeper and Faster: Deep Joint Indirect Registration and Reconstruction

Authors: Jiulong Liu, Angelica I. Aviles-Rivero, Hui Ji, Carola-Bibiane Schönlieb

Abstract: Indirect image registration is a promising technique to improve image reconstruction quality by providing a shape prior for the reconstruction task. In this paper, we propose a novel hybrid method that seeks to reconstruct high quality images from few measurements whilst requiring low computational cost. With this purpose, our framework intertwines indirect registration and reconstruction tasks is… ▽ More Indirect image registration is a promising technique to improve image reconstruction quality by providing a shape prior for the reconstruction task. In this paper, we propose a novel hybrid method that seeks to reconstruct high quality images from few measurements whilst requiring low computational cost. With this purpose, our framework intertwines indirect registration and reconstruction tasks is a single functional. It is based on two major novelties. Firstly, we introduce a model based on deep nets to solve the indirect registration problem, in which the inversion and registration map**s are recurrently connected through a fixed-point interaction based sparse optimisation. Secondly, we introduce specific inversion blocks, that use the explicit physical forward operator, to map the acquired measurements to the image reconstruction. We also introduce registration blocks based deep nets to predict the registration parameters and warp transformation accurately and efficiently. We demonstrate, through extensive numerical and visual experiments, that our framework outperforms significantly classic reconstruction schemes and other bi-task method; this in terms of both image quality and computational time. Finally, we show generalisation capabilities of our approach by demonstrating their performance on fast Magnetic Resonance Imaging (MRI), sparse view computed tomography (CT) and low dose CT with measurements much below the Nyquist limit. △ Less

Submitted 16 December, 2019; originally announced December 2019.

arXiv:1912.03623 [pdf, other]

Single image reflection removal via learning with multi-image constraints

Authors: Yingda Yin, Qingnan Fan, Dongdong Chen, Yujie Wang, Angelica Aviles-Rivero, Ruoteng Li, Carola-Bibiane Schnlieb, Baoquan Chen

Abstract: Reflections are very common phenomena in our daily photography, which distract people's attention from the scene behind the glass. The problem of removing reflection artifacts is important but challenging due to its ill-posed nature. The traditional approaches solve an optimization problem over the constraints induced from multiple images, at the expense of large computation costs. Recent learning… ▽ More Reflections are very common phenomena in our daily photography, which distract people's attention from the scene behind the glass. The problem of removing reflection artifacts is important but challenging due to its ill-posed nature. The traditional approaches solve an optimization problem over the constraints induced from multiple images, at the expense of large computation costs. Recent learning-based approaches have demonstrated a significant improvement in both performance and running time for single image reflection removal, but are limited as they require a large number of synthetic reflection/clean image pairs for direct supervision to approximate the ground truth, at the risk of overfitting in the synthetic image domain and degrading in the real image domain. In this paper, we propose a novel learning-based solution that combines the advantages of the aforementioned approaches and overcomes their drawbacks. Our algorithm works by learning a deep neural network to optimize the target with joint constraints enhanced among multiple input images during the training phase, but is able to eliminate reflections only from a single input for evaluation. Our algorithm runs in real-time and achieves state-of-the-art reflection removal performance on real images. We further propose a strong network backbone that disentangles the background and reflection information into separate latent codes, which are embedded into a shared one-branch deep neural network for both background and reflection predictions. The proposed backbone experimentally performs better than the other common network implementations, and provides insightful knowledge to understand the reflection removal task. △ Less

Submitted 27 August, 2023; v1 submitted 8 December, 2019; originally announced December 2019.

arXiv:1910.04794 [pdf, other]

doi 10.1016/j.patcog.2020.107705

Dynamic Spectral Residual Superpixels

Authors: Jianchao Zhang, Angelica I. Aviles-Rivero, Daniel Heydecker, Xiaosheng Zhuang, Raymond Chan, Carola-Bibiane Schönlieb

Abstract: We consider the problem of segmenting an image into superpixels in the context of $k$-means clustering, in which we wish to decompose an image into local, homogeneous regions corresponding to the underlying objects. Our novel approach builds upon the widely used Simple Linear Iterative Clustering (SLIC), and incorporate a measure of objects' structure based on the spectral residual of an image. Ba… ▽ More We consider the problem of segmenting an image into superpixels in the context of $k$-means clustering, in which we wish to decompose an image into local, homogeneous regions corresponding to the underlying objects. Our novel approach builds upon the widely used Simple Linear Iterative Clustering (SLIC), and incorporate a measure of objects' structure based on the spectral residual of an image. Based on this combination, we propose a modified initialisation scheme and search metric, which helps keeps fine-details. This combination leads to better adherence to object boundaries, while preventing unnecessary segmentation of large, uniform areas, while remaining computationally tractable in comparison to other methods. We demonstrate through numerical and visual experiments that our approach outperforms the state-of-the-art techniques. △ Less

Submitted 2 April, 2021; v1 submitted 10 October, 2019; originally announced October 2019.

Journal ref: Zhang, J., Aviles-Rivero, A. I., Heydecker, D., Zhuang, X., Chan, R., & Schonlieb, C. B. (2021). Dynamic spectral residual superpixels. Pattern Recognition, 112, 107705

arXiv:1907.10085 [pdf, other]

GraphX$^{NET}-$ Chest X-Ray Classification Under Extreme Minimal Supervision

Authors: Angelica I. Aviles-Rivero, Nicolas Papadakis, Ruoteng Li, Philip Sellars, Qingnan Fan, Robby T. Tan, Carola-Bibiane Schönlieb

Abstract: The task of classifying X-ray data is a problem of both theoretical and clinical interest. Whilst supervised deep learning methods rely upon huge amounts of labelled data, the critical problem of achieving a good classification accuracy when an extremely small amount of labelled data is available has yet to be tackled. In this work, we introduce a novel semi-supervised framework for X-ray classifi… ▽ More The task of classifying X-ray data is a problem of both theoretical and clinical interest. Whilst supervised deep learning methods rely upon huge amounts of labelled data, the critical problem of achieving a good classification accuracy when an extremely small amount of labelled data is available has yet to be tackled. In this work, we introduce a novel semi-supervised framework for X-ray classification which is based on a graph-based optimisation model. To the best of our knowledge, this is the first method that exploits graph-based semi-supervised learning for X-ray data classification. Furthermore, we introduce a new multi-class classification functional with carefully selected class priors which allows for a smooth solution that strengthens the synergy between the limited number of labels and the huge amount of unlabelled data. We demonstrate, through a set of numerical and visual experiments, that our method produces highly competitive results on the ChestX-ray14 data set whilst drastically reducing the need for annotated data. △ Less

Submitted 3 July, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

Comments: MICCAI 2019

arXiv:1906.08635 [pdf, other]

Energy Models for Better Pseudo-Labels: Improving Semi-Supervised Classification with the 1-Laplacian Graph Energy

Authors: Angelica I. Aviles-Rivero, Nicolas Papadakis, Ruoteng Li, Philip Sellars, Samar M Alsaleh, Robby T Tan, Carola-Bibiane Schönlieb

Abstract: Semi-supervised classification is a great focus of interest, as in real-world scenarios obtaining labels is expensive, time-consuming and might require expert knowledge. This has motivated the fast development of semi-supervised techniques, whose performance is on a par with or better than supervised approaches. A current major challenge for semi-supervised techniques is how to better handle the n… ▽ More Semi-supervised classification is a great focus of interest, as in real-world scenarios obtaining labels is expensive, time-consuming and might require expert knowledge. This has motivated the fast development of semi-supervised techniques, whose performance is on a par with or better than supervised approaches. A current major challenge for semi-supervised techniques is how to better handle the network calibration and confirmation bias problems for improving performance. In this work, we argue that energy models are an effective alternative to such problems. With this motivation in mind, we propose a hybrid framework for semi-supervised classification called CREPE model (1-Lapla$\mathbf{C}$ian g$\mathbf{R}$aph $\mathbf{E}$nergy for $\mathbf{P}$seudo-lab$\mathbf{E}$ls). Firstly, we introduce a new energy model based on the non-smooth $\ell_1$ norm of the normalised graph 1-Laplacian. Our functional enforces a sufficiently smooth solution and strengthens the intrinsic relation between the labelled and unlabelled data. Secondly, we provide a theoretical analysis for our proposed scheme and show that the solution trajectory does converge to a non-constant steady point. Thirdly, we derive the connection of our energy model for pseudo-labelling. We show that our energy model produces more meaningful pseudo-labels than the ones generated directly by a deep network. We extensively evaluate our framework, through numerical and visual experiments, using six benchmarking datasets for natural and medical images. We demonstrate that our technique reports state-of-the-art results for semi-supervised classification. △ Less

Submitted 25 October, 2021; v1 submitted 20 June, 2019; originally announced June 2019.

arXiv:1903.06548 [pdf, other]

doi 10.1109/TGRS.2019.2961599

Superpixel Contracted Graph-Based Learning for Hyperspectral Image Classification

Authors: Philip Sellars, Angelica Aviles-Rivero, Carola-Bibiane Schönlieb

Abstract: A central problem in hyperspectral image classification is obtaining high classification accuracy when using a limited amount of labelled data. In this paper we present a novel graph-based framework, which aims to tackle this problem in the presence of large scale data input. Our approach utilises a novel superpixel method, specifically designed for hyperspectral data, to define meaningful local r… ▽ More A central problem in hyperspectral image classification is obtaining high classification accuracy when using a limited amount of labelled data. In this paper we present a novel graph-based framework, which aims to tackle this problem in the presence of large scale data input. Our approach utilises a novel superpixel method, specifically designed for hyperspectral data, to define meaningful local regions in an image, which with high probability share the same classification label. We then extract spectral and spatial features from these regions and use these to produce a contracted weighted graph-representation, where each node represents a region rather than a pixel. Our graph is then fed into a graph-based semi-supervised classifier which gives the final classification. We show that using superpixels in a graph representation is an effective tool for speeding up graphical classifiers applied to hyperspectral images. We demonstrate through exhaustive quantitative and qualitative results that our proposed method produces accurate classifications when an incredibly small amount of labelled data is used. We show that our approach mitigates the major drawbacks of existing approaches, resulting in our approach outperforming several comparative state-of-the-art techniques. △ Less

Submitted 19 March, 2019; v1 submitted 14 March, 2019; originally announced March 2019.

Comments: 11 pages

Showing 1–50 of 54 results for author: Aviles-Rivero, A