-
OUCopula: Bi-Channel Multi-Label Copula-Enhanced Adapter-Based CNN for Myopia Screening Based on OU-UWF Images
Authors:
Yang Li,
Qiuyi Huang,
Chong Zhong,
Danjuan Yang,
Meiyan Li,
A. H. Welsh,
Aiyi Liu,
Bo Fu,
Catherien C. Liu,
Xingtao Zhou
Abstract:
Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging is potentially significant for ophthalmic outcomes. Current multidisciplinary research between ophthalmology and deep learning (DL) concentrates primarily on disease classification and diagnosis using single-eye images, largely ignoring joint modeling and prediction for Oculus Uterque (OU, both eyes). Inspired by the complex…
▽ More
Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging is potentially significant for ophthalmic outcomes. Current multidisciplinary research between ophthalmology and deep learning (DL) concentrates primarily on disease classification and diagnosis using single-eye images, largely ignoring joint modeling and prediction for Oculus Uterque (OU, both eyes). Inspired by the complex relationships between OU and the high correlation between the (continuous) outcome labels (Spherical Equivalent and Axial Length), we propose a framework of copula-enhanced adapter convolutional neural network (CNN) learning with OU UWF fundus images (OUCopula) for joint prediction of multiple clinical scores. We design a novel bi-channel multi-label CNN that can (1) take bi-channel image inputs subject to both high correlation and heterogeneity (by sharing the same backbone network and employing adapters to parameterize the channel-wise discrepancy), and (2) incorporate correlation information between continuous output labels (using a copula). Solid experiments show that OUCopula achieves satisfactory performance in myopia score prediction compared to backbone models. Moreover, OUCopula can far exceed the performance of models constructed for single-eye inputs. Importantly, our study also hints at the potential extension of the bi-channel model to a multi-channel paradigm and the generalizability of OUCopula across various backbone CNNs.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
AdvMIL: Adversarial Multiple Instance Learning for the Survival Analysis on Whole-Slide Images
Authors:
Pei Liu,
Lu** Ji,
Feng Ye,
Bo Fu
Abstract:
The survival analysis on histological whole-slide images (WSIs) is one of the most important means to estimate patient prognosis. Although many weakly-supervised deep learning models have been developed for gigapixel WSIs, their potential is generally restricted by classical survival analysis rules and fully-supervised learning requirements. As a result, these models provide patients only with a c…
▽ More
The survival analysis on histological whole-slide images (WSIs) is one of the most important means to estimate patient prognosis. Although many weakly-supervised deep learning models have been developed for gigapixel WSIs, their potential is generally restricted by classical survival analysis rules and fully-supervised learning requirements. As a result, these models provide patients only with a completely-certain point estimation of time-to-event, and they could only learn from the labeled WSI data currently at a small scale. To tackle these problems, we propose a novel adversarial multiple instance learning (AdvMIL) framework. This framework is based on adversarial time-to-event modeling, and integrates the multiple instance learning (MIL) that is much necessary for WSI representation learning. It is a plug-and-play one, so that most existing MIL-based end-to-end methods can be easily upgraded by applying this framework, gaining the improved abilities of survival distribution estimation and semi-supervised learning. Our extensive experiments show that AdvMIL not only could often bring performance improvement to mainstream WSI survival analysis methods at a relatively low computational cost, but also enables these methods to effectively utilize unlabeled data via semi-supervised learning. Moreover, it is observed that AdvMIL could help improving the robustness of models against patch occlusion and two representative image noises. The proposed AdvMIL framework could promote the research of survival analysis in computational pathology with its novel adversarial MIL paradigm.
△ Less
Submitted 5 April, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report
Authors:
Andrey Ignatov,
Grigory Malivenko,
Radu Timofte,
Lukasz Treszczotko,
Xin Chang,
Piotr Ksiazek,
Michal Lopuszynski,
Maciej Pioro,
Rafal Rudnicki,
Maciej Smyl,
Yujie Ma,
Zhenyu Li,
Zehui Chen,
Jialei Xu,
Xianming Liu,
Junjun Jiang,
XueChao Shi,
Difan Xu,
Yanan Li,
Xiaotao Wang,
Lei Lei,
Ziyu Zhang,
Yicheng Wang,
Zilong Huang,
Guozhong Luo
, et al. (14 additional authors not shown)
Abstract:
Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth es…
▽ More
Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth estimation solutions that can show a real-time performance on IoT platforms and smartphones. For this, the participants used a large-scale RGB-to-depth dataset that was collected with the ZED stereo camera capable to generated depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the Raspberry Pi 4 platform, where the developed solutions were able to generate VGA resolution depth maps at up to 27 FPS while achieving high fidelity results. All models developed in the challenge are also compatible with any Android or Linux-based mobile devices, their detailed description is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Multi-task Learning for Monocular Depth and Defocus Estimations with Real Images
Authors:
Renzhi He,
Hualin Hong,
Boya Fu,
Fei Liu
Abstract:
Monocular depth estimation and defocus estimation are two fundamental tasks in computer vision. Most existing methods treat depth estimation and defocus estimation as two separate tasks, ignoring the strong connection between them. In this work, we propose a multi-task learning network consisting of an encoder with two decoders to estimate the depth and defocus map from a single focused image. Thr…
▽ More
Monocular depth estimation and defocus estimation are two fundamental tasks in computer vision. Most existing methods treat depth estimation and defocus estimation as two separate tasks, ignoring the strong connection between them. In this work, we propose a multi-task learning network consisting of an encoder with two decoders to estimate the depth and defocus map from a single focused image. Through the multi-task network, the depth estimation facilitates the defocus estimation to get better results in the weak texture region and the defocus estimation facilitates the depth estimation by the strong physical connection between the two maps. We set up a dataset (named ALL-in-3D dataset) which is the first all-real image dataset consisting of 100K sets of all-in-focus images, focused images with focus depth, depth maps, and defocus maps. It enables the network to learn features and solid physical connections between the depth and real defocus images. Experiments demonstrate that the network learns more solid features from the real focused images than the synthetic focused images. Benefiting from this multi-task structure where different tasks facilitate each other, our depth and defocus estimations achieve significantly better performance than other state-of-art algorithms. The code and dataset will be publicly available at https://github.com/cubhe/MDDNet.
△ Less
Submitted 21 August, 2022;
originally announced August 2022.
-
DSCA: A Dual-Stream Network with Cross-Attention on Whole-Slide Image Pyramids for Cancer Prognosis
Authors:
Pei Liu,
Bo Fu,
Feng Ye,
Rui Yang,
Bin Xu,
Lu** Ji
Abstract:
The cancer prognosis on gigapixel Whole-Slide Images (WSIs) has always been a challenging task. To further enhance WSI visual representations, existing methods have explored image pyramids, instead of single-resolution images, in WSIs. In spite of this, they still face two major problems: high computational cost and the unnoticed semantical gap in multi-resolution feature fusion. To tackle these p…
▽ More
The cancer prognosis on gigapixel Whole-Slide Images (WSIs) has always been a challenging task. To further enhance WSI visual representations, existing methods have explored image pyramids, instead of single-resolution images, in WSIs. In spite of this, they still face two major problems: high computational cost and the unnoticed semantical gap in multi-resolution feature fusion. To tackle these problems, this paper proposes to efficiently exploit WSI pyramids from a new perspective, the dual-stream network with cross-attention (DSCA). Our key idea is to utilize two sub-streams to process the WSI patches with two resolutions, where a square pooling is devised in a high-resolution stream to significantly reduce computational costs, and a cross-attention-based method is proposed to properly handle the fusion of dual-stream features. We validate our DSCA on three publicly-available datasets with a total number of 3,101 WSIs from 1,911 patients. Our experiments and ablation studies verify that (i) the proposed DSCA could outperform existing state-of-the-art methods in cancer prognosis, by an average C-Index improvement of around 4.6%; (ii) our DSCA network is more efficient in computation -- it has more learnable parameters (6.31M vs. 860.18K) but less computational costs (2.51G vs. 4.94G), compared to a typical existing multi-resolution network. (iii) the key components of DSCA, dual-stream and cross-attention, indeed contribute to our model's performance, gaining an average C-Index rise of around 2.0% while maintaining a relatively-small computational load. Our DSCA could serve as an alternative and effective tool for WSI-based cancer prognosis.
△ Less
Submitted 28 March, 2023; v1 submitted 12 June, 2022;
originally announced June 2022.
-
GMSS: Graph-Based Multi-Task Self-Supervised Learning for EEG Emotion Recognition
Authors:
Yang Li,
Ji Chen,
Fu Li,
Boxun Fu,
Hao Wu,
Youshuo Ji,
Yi** Zhou,
Yi Niu,
Guangming Shi,
Wenming Zheng
Abstract:
Previous electroencephalogram (EEG) emotion recognition relies on single-task learning, which may lead to overfitting and learned emotion features lacking generalization. In this paper, a graph-based multi-task self-supervised learning model (GMSS) for EEG emotion recognition is proposed. GMSS has the ability to learn more general representations by integrating multiple self-supervised tasks, incl…
▽ More
Previous electroencephalogram (EEG) emotion recognition relies on single-task learning, which may lead to overfitting and learned emotion features lacking generalization. In this paper, a graph-based multi-task self-supervised learning model (GMSS) for EEG emotion recognition is proposed. GMSS has the ability to learn more general representations by integrating multiple self-supervised tasks, including spatial and frequency jigsaw puzzle tasks, and contrastive learning tasks. By learning from multiple tasks simultaneously, GMSS can find a representation that captures all of the tasks thereby decreasing the chance of overfitting on the original task, i.e., emotion recognition task. In particular, the spatial jigsaw puzzle task aims to capture the intrinsic spatial relationships of different brain regions. Considering the importance of frequency information in EEG emotional signals, the goal of the frequency jigsaw puzzle task is to explore the crucial frequency bands for EEG emotion recognition. To further regularize the learned features and encourage the network to learn inherent representations, contrastive learning task is adopted in this work by map** the transformed data into a common feature space. The performance of the proposed GMSS is compared with several popular unsupervised and supervised methods. Experiments on SEED, SEED-IV, and MPED datasets show that the proposed model has remarkable advantages in learning more discriminative and general features for EEG emotional signals.
△ Less
Submitted 11 April, 2022;
originally announced May 2022.
-
Point Spread Function Estimation of Defocus
Authors:
Renzhi He,
Yan Zhuang,
Boya Fu,
Fei Liu
Abstract:
This Point spread function (PSF) plays a crucial role in many computational imaging applications, such as shape from focus/defocus, depth estimation, and fluorescence microscopy. However, the mathematical model of the defocus process is still unclear. In this work, we develop an alternative method to estimate the precise mathematical model of the point spread function to describe the defocus proce…
▽ More
This Point spread function (PSF) plays a crucial role in many computational imaging applications, such as shape from focus/defocus, depth estimation, and fluorescence microscopy. However, the mathematical model of the defocus process is still unclear. In this work, we develop an alternative method to estimate the precise mathematical model of the point spread function to describe the defocus process. We first derive the mathematical algorithm for the PSF which is used to generate the simulated focused images for different focus depth. Then we compute the loss function of the similarity between the simulated focused images and real focused images where we design a novel and efficient metric based on the defocus histogram to evaluate the difference between the focused images. After we solve the minimum value of the loss function, it means we find the optimal parameters for the PSF. We also construct a hardware system consisting of a focusing system and a structured light system to acquire the all-in-focus image, the focused image with corresponding focus depth, and the depth map in the same view. The three types of images, as a dataset, are used to obtain the precise PSF. Our experiments on standard planes and actual objects show that the proposed algorithm can accurately describe the defocus process. The accuracy of our algorithm is further proved by evaluating the difference among the actual focused images, the focused image generated by our algorithm, the focused image generated by others. The results show that the loss of our algorithm is 40% less than others on average.
△ Less
Submitted 19 September, 2022; v1 submitted 6 March, 2022;
originally announced March 2022.
-
The Effect of Wearing a Face Mask on Face Image Quality
Authors:
Biying Fu,
Florian Kirchbuchner,
Naser Damer
Abstract:
Due to the COVID-19 situation, face masks have become a main part of our daily life. Wearing mouth-and-nose protection has been made a mandate in many public places, to prevent the spread of the COVID-19 virus. However, face masks affect the performance of face recognition, since a large area of the face is covered. The effect of wearing a face mask on the different components of the face recognit…
▽ More
Due to the COVID-19 situation, face masks have become a main part of our daily life. Wearing mouth-and-nose protection has been made a mandate in many public places, to prevent the spread of the COVID-19 virus. However, face masks affect the performance of face recognition, since a large area of the face is covered. The effect of wearing a face mask on the different components of the face recognition system in a collaborative environment is a problem that is still to be fully studied. This work studies, for the first time, the effect of wearing a face mask on face image quality by utilising state-of-the-art face image quality assessment methods of different natures. This aims at providing better understanding on the effect of face masks on the operation of face recognition as a whole system. In addition, we further studied the effect of simulated masks on face image utility in comparison to real face masks. We discuss the correlation between the mask effect on face image quality and that on the face verification performance by automatic systems and human experts, indicating a consistent trend between both factors. The evaluation is conducted on the database containing (1) no-masked faces, (2) real face masks, and (3) simulated face masks, by synthetically generating digital facial masks on no-masked faces. Finally, a visual interpretation of the face areas contributing to the quality score of a selected set of quality assessment methods is provided to give a deeper insight into the difference of network decisions in masked and non-masked faces, among other variations.
△ Less
Submitted 2 November, 2021; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Grigory Malivenko,
David Plowman,
Samarth Shukla,
Radu Timofte,
Ziyu Zhang,
Yicheng Wang,
Zilong Huang,
Guozhong Luo,
Gang Yu,
Bin Fu,
Yiran Wang,
Xingyi Li,
Min Shi,
Ke Xian,
Zhiguo Cao,
**-Hua Du,
Pei-Lin Wu,
Chao Ge,
Jiaoyang Yao,
Fangwen Tu,
Bo Li,
Jung Eun Yoo,
Kwanggyoon Seo,
Jialei Xu
, et al. (13 additional authors not shown)
Abstract:
Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based d…
▽ More
Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based depth estimation solutions that can demonstrate a nearly real-time performance on smartphones and IoT platforms. For this, the participants were provided with a new large-scale dataset containing RGB-depth image pairs obtained with a dedicated stereo ZED camera producing high-resolution depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the popular Raspberry Pi 4 platform with a mobile ARM-based Broadcom chipset. The proposed solutions can generate VGA resolution depth maps at up to 10 FPS on the Raspberry Pi 4 while achieving high fidelity results, and are compatible with any Android or Linux-based mobile devices. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Weak Texture Information Map Guided Image Super-resolution with Deep Residual Networks
Authors:
Bo Fu,
Liyan Wang,
Yuechu Wu,
Yufeng Wu,
Shilin Fu,
Yonggong Ren
Abstract:
Single image super-resolution (SISR) is an image processing task which obtains high-resolution (HR) image from a low-resolution (LR) image. Recently, due to the capability in feature extraction, a series of deep learning methods have brought important crucial improvement for SISR. However, we observe that no matter how deeper the networks are designed, they usually do not have good generalization…
▽ More
Single image super-resolution (SISR) is an image processing task which obtains high-resolution (HR) image from a low-resolution (LR) image. Recently, due to the capability in feature extraction, a series of deep learning methods have brought important crucial improvement for SISR. However, we observe that no matter how deeper the networks are designed, they usually do not have good generalization ability, which leads to the fact that almost all of existing SR methods have poor performances on restoration of the weak texture details. To solve these problems, we propose a weak texture information map guided image super-resolution with deep residual networks. It contains three sub-networks, one main network which extracts the main features and fuses weak texture details, another two auxiliary networks extract the weak texture details fallen in the main network. Two part of networks work cooperatively, the auxiliary networks predict and integrates week texture information into the main network, which is conducive to the main network learning more inconspicuous details. Experiments results demonstrate that our method's performs achieve the state-of-the-art quantitatively. Specifically, the image super-resolution results of our method own more weak texture details.
△ Less
Submitted 18 March, 2020; v1 submitted 1 March, 2020;
originally announced March 2020.