-
UpStory: the Uppsala Storytelling dataset
Authors:
Marc Fraile,
Natalia Calvo-Barajas,
Anastasia Sophia Apeiron,
Giovanna Varni,
Joakim Lindblad,
Nataša Sladoje,
Ginevra Castellano
Abstract:
Friendship and rapport play an important role in the formation of constructive social interactions, and have been widely studied in educational settings due to their impact on student outcomes. Given the growing interest in automating the analysis of such phenomena through Machine Learning (ML), access to annotated interaction datasets is highly valuable. However, no dataset on dyadic child-child…
▽ More
Friendship and rapport play an important role in the formation of constructive social interactions, and have been widely studied in educational settings due to their impact on student outcomes. Given the growing interest in automating the analysis of such phenomena through Machine Learning (ML), access to annotated interaction datasets is highly valuable. However, no dataset on dyadic child-child interactions explicitly capturing rapport currently exists. Moreover, despite advances in the automatic analysis of human behaviour, no previous work has addressed the prediction of rapport in child-child dyadic interactions in educational settings. We present UpStory -- the Uppsala Storytelling dataset: a novel dataset of naturalistic dyadic interactions between primary school aged children, with an experimental manipulation of rapport. Pairs of children aged 8-10 participate in a task-oriented activity: designing a story together, while being allowed free movement within the play area. We promote balanced collection of different levels of rapport by using a within-subjects design: self-reported friendships are used to pair each child twice, either minimizing or maximizing pair separation in the friendship network. The dataset contains data for 35 pairs, totalling 3h 40m of audio and video recordings. It includes two video sources covering the play area, as well as separate voice recordings for each child. An anonymized version of the dataset is made publicly available, containing per-frame head pose, body pose, and face features; as well as per-pair information, including the level of rapport. Finally, we provide ML baselines for the prediction of rapport.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Let it shine: Autofluorescence of Papanicolaou-stain improves AI-based cytological oral cancer detection
Authors:
Wenyi Lian,
Joakim Lindblad,
Christina Runow Stark,
Jan-Michaél Hirsch,
Nataša Sladoje
Abstract:
Oral cancer is a global health challenge. It is treatable if detected early, but it is often fatal in late stages. There is a shift from the invasive and time-consuming tissue sampling and histological examination, toward non-invasive brush biopsies and cytological examination. Reliable computer-assisted methods are essential for cost-effective and accurate cytological analysis, but the lack of de…
▽ More
Oral cancer is a global health challenge. It is treatable if detected early, but it is often fatal in late stages. There is a shift from the invasive and time-consuming tissue sampling and histological examination, toward non-invasive brush biopsies and cytological examination. Reliable computer-assisted methods are essential for cost-effective and accurate cytological analysis, but the lack of detailed cell-level annotations impairs model effectiveness. This study aims to improve AI-based oral cancer detection using multimodal imaging and deep fusion. We combine brightfield and fluorescence whole slide microscopy imaging to analyze Papanicolaou-stained liquid-based cytology slides of brush biopsies collected from both healthy and cancer patients. Due to limited cytological annotations, we utilize a weakly supervised deep learning approach using only patient-level labels. We evaluate various multimodal fusion strategies, including early, late, and three recent intermediate fusion methods. Our results show: (i) fluorescence imaging of Papanicolaou-stained samples provides substantial diagnostic information; (ii) multimodal fusion enhances classification and cancer detection accuracy over single-modality methods. Intermediate fusion is the leading method among the studied approaches. Specifically, the Co-Attention Fusion Network (CAFNet) model excels with an F1 score of 83.34% and accuracy of 91.79%, surpassing human performance on the task. Additional tests highlight the need for precise image registration to optimize multimodal analysis benefits. This study advances cytopathology by combining deep learning and multimodal imaging to enhance early, non-invasive detection of oral cancer, improving diagnostic accuracy and streamlining clinical workflows. The developed pipeline is also applicable in other cytological settings. Our codes and dataset are available online for further research.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Can representation learning for multimodal image registration be improved by supervision of intermediate layers?
Authors:
Elisabeth Wetzer,
Joakim Lindblad,
Nataša Sladoje
Abstract:
Multimodal imaging and correlative analysis typically require image alignment. Contrastive learning can generate representations of multimodal images, reducing the challenging task of multimodal image registration to a monomodal one. Previously, additional supervision on intermediate layers in contrastive learning has improved biomedical image classification. We evaluate if a similar approach impr…
▽ More
Multimodal imaging and correlative analysis typically require image alignment. Contrastive learning can generate representations of multimodal images, reducing the challenging task of multimodal image registration to a monomodal one. Previously, additional supervision on intermediate layers in contrastive learning has improved biomedical image classification. We evaluate if a similar approach improves representations learned for registration to boost registration performance. We explore three approaches to add contrastive supervision to the latent features of the bottleneck layer in the U-Nets encoding the multimodal images and evaluate three different critic functions. Our results show that representations learned without additional supervision on latent features perform best in the downstream task of registration on two public biomedical datasets. We investigate the performance drop by exploiting recent insights in contrastive learning in classification and self-supervised learning. We visualize the spatial relations of the learned representations by means of multidimensional scaling, and show that additional supervision on the bottleneck layer can lead to partial dimensional collapse of the intermediate embedding space.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
End-to-end Multiple Instance Learning with Gradient Accumulation
Authors:
Axel Andersson,
Nadezhda Koriakina,
Nataša Sladoje,
Joakim Lindblad
Abstract:
Being able to learn on weakly labeled data, and provide interpretability, are two of the main reasons why attention-based deep multiple instance learning (ABMIL) methods have become particularly popular for classification of histopathological images. Such image data usually come in the form of gigapixel-sized whole-slide-images (WSI) that are cropped into smaller patches (instances). However, the…
▽ More
Being able to learn on weakly labeled data, and provide interpretability, are two of the main reasons why attention-based deep multiple instance learning (ABMIL) methods have become particularly popular for classification of histopathological images. Such image data usually come in the form of gigapixel-sized whole-slide-images (WSI) that are cropped into smaller patches (instances). However, the sheer size of the data makes training of ABMIL models challenging. All the instances from one WSI cannot be processed at once by conventional GPUs. Existing solutions compromise training by relying on pre-trained models, strategic sampling or selection of instances, or self-supervised learning. We propose a training strategy based on gradient accumulation that enables direct end-to-end training of ABMIL models without being limited by GPU memory. We conduct experiments on both QMNIST and Imagenette to investigate the performance and training time, and compare with the conventional memory-expensive baseline and a recent sampled-based approach. This memory-efficient approach, although slower, reaches performance indistinguishable from the memory-expensive baseline.
△ Less
Submitted 8 March, 2022;
originally announced March 2022.
-
Oral cancer detection and interpretation: Deep multiple instance learning versus conventional deep single instance learning
Authors:
Nadezhda Koriakina,
Nataša Sladoje,
Vladimir Bašić,
Joakim Lindblad
Abstract:
The current medical standard for setting an oral cancer (OC) diagnosis is histological examination of a tissue sample from the oral cavity. This process is time consuming and more invasive than an alternative approach of acquiring a brush sample followed by cytological analysis. Skilled cytotechnologists are able to detect changes due to malignancy, however, to introduce this approach into clinica…
▽ More
The current medical standard for setting an oral cancer (OC) diagnosis is histological examination of a tissue sample from the oral cavity. This process is time consuming and more invasive than an alternative approach of acquiring a brush sample followed by cytological analysis. Skilled cytotechnologists are able to detect changes due to malignancy, however, to introduce this approach into clinical routine is associated with challenges such as a lack of experts and labour-intensive work. To design a trustworthy OC detection system that would assist cytotechnologists, we are interested in AI-based methods that reliably can detect cancer given only per-patient labels (minimizing annotation bias), and also provide information on which cells are most relevant for the diagnosis (enabling supervision and understanding). We, therefore, perform a comparison of a conventional single instance learning (SIL) approach and a modern multiple instance learning (MIL) method suitable for OC detection and interpretation, utilizing three different neural network architectures. To facilitate systematic evaluation of the considered approaches, we introduce a synthetic PAP-QMNIST dataset, that serves as a model of OC data, while offering access to per-instance ground truth. Our study indicates that on PAP-QMNIST, the SIL performs better, on average, than the MIL approach. Performance at the bag level on real-world cytological data is similar for both methods, yet the single instance approach performs better on average. Visual examination by cytotechnologist indicates that the methods manage to identify cells which deviate from normality, including malignant cells as well as those suspicious for dysplasia. We share the code as open source at https://github.com/MIDA-group/OralCancerMILvsSIL
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image Representations
Authors:
Eva Breznik,
Elisabeth Wetzer,
Joakim Lindblad,
Nataša Sladoje
Abstract:
In tissue characterization and cancer diagnostics, multimodal imaging has emerged as a powerful technique. Thanks to computational advances, large datasets can be exploited to discover patterns in pathologies and improve diagnosis. However, this requires efficient and scalable image retrieval methods. Cross-modality image retrieval is particularly challenging, since images of similar (or even the…
▽ More
In tissue characterization and cancer diagnostics, multimodal imaging has emerged as a powerful technique. Thanks to computational advances, large datasets can be exploited to discover patterns in pathologies and improve diagnosis. However, this requires efficient and scalable image retrieval methods. Cross-modality image retrieval is particularly challenging, since images of similar (or even the same) content captured by different modalities might share few common structures. We propose a new application-independent content-based image retrieval (CBIR) system for reverse (sub-)image search across modalities, which combines deep learning to generate representations (embedding the different modalities in a common space) with classical feature extraction and bag-of-words models for efficient and reliable retrieval. We illustrate its advantages through a replacement study, exploring a number of feature extractors and learned representations, as well as through comparison to recent (cross-modality) CBIR methods. For the task of (sub-)image retrieval on a (publicly available) dataset of brightfield and second harmonic generation microscopy images, the results show that our approach is superior to all tested alternatives. We discuss the shortcomings of the compared methods and observe the importance of equivariance and invariance properties of the learned representations and feature extractors in the CBIR pipeline. Code is available at: \url{https://github.com/MIDA-group/CrossModal_ImgRetrieval}.
△ Less
Submitted 20 March, 2023; v1 submitted 10 January, 2022;
originally announced January 2022.
-
Cross-Sim-NGF: FFT-Based Global Rigid Multimodal Alignment of Image Volumes using Normalized Gradient Fields
Authors:
Johan Öfverstedt,
Joakim Lindblad,
Nataša Sladoje
Abstract:
Multimodal image alignment involves finding spatial correspondences between volumes varying in appearance and structure. Automated alignment methods are often based on local optimization that can be highly sensitive to their initialization. We propose a global optimization method for rigid multimodal 3D image alignment, based on a novel efficient algorithm for computing similarity of normalized gr…
▽ More
Multimodal image alignment involves finding spatial correspondences between volumes varying in appearance and structure. Automated alignment methods are often based on local optimization that can be highly sensitive to their initialization. We propose a global optimization method for rigid multimodal 3D image alignment, based on a novel efficient algorithm for computing similarity of normalized gradient fields (NGF) in the frequency domain. We validate the method experimentally on a dataset comprised of 20 brain volumes acquired in four modalities (T1w, Flair, CT, [18F] FDG PET), synthetically displaced with known transformations. The proposed method exhibits excellent performance on all six possible modality combinations, and outperforms all four reference methods by a large margin. The method is fast; a 3.4Mvoxel global rigid alignment requires approximately 40 seconds of computation, and the proposed algorithm outperforms a direct algorithm for the same task by more than three orders of magnitude. Open-source implementation is provided.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
Fast computation of mutual information in the frequency domain with applications to global multimodal image alignment
Authors:
Johan Öfverstedt,
Joakim Lindblad,
Nataša Sladoje
Abstract:
Multimodal image alignment is the process of finding spatial correspondences between images formed by different imaging techniques or under different conditions, to facilitate heterogeneous data fusion and correlative analysis. The information-theoretic concept of mutual information (MI) is widely used as a similarity measure to guide multimodal alignment processes, where most works have focused o…
▽ More
Multimodal image alignment is the process of finding spatial correspondences between images formed by different imaging techniques or under different conditions, to facilitate heterogeneous data fusion and correlative analysis. The information-theoretic concept of mutual information (MI) is widely used as a similarity measure to guide multimodal alignment processes, where most works have focused on local maximization of MI that typically works well only for small displacements; this points to a need for global maximization of MI, which has previously been computationally infeasible due to the high run-time complexity of existing algorithms. We propose an efficient algorithm for computing MI for all discrete displacements (formalized as the cross-mutual information function (CMIF)), which is based on cross-correlation computed in the frequency domain. We show that the algorithm is equivalent to a direct method while asymptotically superior in terms of run-time. Furthermore, we propose a method for multimodal image alignment for transformation models with few degrees of freedom (e.g. rigid) based on the proposed CMIF-algorithm. We evaluate the efficacy of the proposed method on three distinct benchmark datasets, of aerial images, cytological images, and histological images, and we observe excellent success-rates (in recovering known rigid transformations), overall outperforming alternative methods, including local optimization of MI as well as several recent deep learning-based approaches. We also evaluate the run-times of a GPU implementation of the proposed algorithm and observe speed-ups from 100 to more than 10,000 times for realistic image sizes compared to a GPU implementation of a direct method. Code is shared as open-source at \url{github.com/MIDA-group/globalign}.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Is Image-to-Image Translation the Panacea for Multimodal Image Registration? A Comparative Study
Authors:
Jiahao Lu,
Johan Öfverstedt,
Joakim Lindblad,
Nataša Sladoje
Abstract:
Despite current advancement in the field of biomedical image processing, propelled by the deep learning revolution, multimodal image registration, due to its several challenges, is still often performed manually by specialists. The recent success of image-to-image (I2I) translation in computer vision applications and its growing use in biomedical areas provide a tempting possibility of transformin…
▽ More
Despite current advancement in the field of biomedical image processing, propelled by the deep learning revolution, multimodal image registration, due to its several challenges, is still often performed manually by specialists. The recent success of image-to-image (I2I) translation in computer vision applications and its growing use in biomedical areas provide a tempting possibility of transforming the multimodal registration problem into a, potentially easier, monomodal one. We conduct an empirical study of the applicability of modern I2I translation methods for the task of rigid registration of multimodal biomedical and medical 2D and 3D images. We compare the performance of four Generative Adversarial Network (GAN)-based I2I translation methods and one contrastive representation learning method, subsequently combined with two representative monomodal registration methods, to judge the effectiveness of modality translation for multimodal image registration. We evaluate these method combinations on four publicly available multimodal (2D and 3D) datasets and compare them with the performance of registration achieved by several well-known approaches acting directly on multimodal image data. Our results suggest that, although I2I translation may be helpful when the modalities to register are clearly correlated, registration of modalities which express distinctly different properties of the sample is not well handled by the I2I translation approach. The evaluated representation learning method, which aims to find abstract image-like representations of the information shared between the modalities, manages better, and so does the Mutual Information maximisation approach, acting directly on the original multimodal images. We share our complete experimental setup as open-source (https://github.com/MIDA-group/MultiRegEval).
△ Less
Submitted 31 October, 2022; v1 submitted 30 March, 2021;
originally announced March 2021.
-
INSPIRE: Intensity and spatial information-based deformable image registration
Authors:
Johan Öfverstedt,
Joakim Lindblad,
Nataša Sladoje
Abstract:
We present INSPIRE, a top-performing general-purpose method for deformable image registration. INSPIRE brings distance measures which combine intensity and spatial information into an elastic B-splines-based transformation model and incorporates an inverse inconsistency penalization supporting symmetric registration performance. We introduce several theoretical and algorithmic solutions which prov…
▽ More
We present INSPIRE, a top-performing general-purpose method for deformable image registration. INSPIRE brings distance measures which combine intensity and spatial information into an elastic B-splines-based transformation model and incorporates an inverse inconsistency penalization supporting symmetric registration performance. We introduce several theoretical and algorithmic solutions which provide high computational efficiency and thereby applicability of the proposed framework in a wide range of real scenarios. We show that INSPIRE delivers highly accurate, as well as stable and robust registration results. We evaluate the method on a 2D dataset created from retinal images, characterized by presence of networks of thin structures. Here INSPIRE exhibits excellent performance, substantially outperforming the widely used reference methods. {We also evaluate INSPIRE on the Fundus Image Registration Dataset (FIRE), which consists of 134 pairs of separately acquired retinal images. INSPIRE exhibits excellent performance on the FIRE dataset, substantially outperforming several domain-specific methods.} We also evaluate the method on four benchmark datasets of 3D magnetic resonance images of brains, for a total of 2088 pairwise registrations. A comparison with 17 other state-of-the-art methods reveals that INSPIRE provides the best overall performance. Code is available at http://github.com/MIDA-group/inspire
△ Less
Submitted 2 February, 2023; v1 submitted 13 December, 2020;
originally announced December 2020.
-
CoMIR: Contrastive Multimodal Image Representation for Registration
Authors:
Nicolas Pielawski,
Elisabeth Wetzer,
Johan Öfverstedt,
Jiahao Lu,
Carolina Wählby,
Joakim Lindblad,
Nataša Sladoje
Abstract:
We propose contrastive coding to learn shared, dense image representations, referred to as CoMIRs (Contrastive Multimodal Image Representations). CoMIRs enable the registration of multimodal images where existing registration methods often fail due to a lack of sufficiently similar image structures. CoMIRs reduce the multimodal registration problem to a monomodal one, in which general intensity-ba…
▽ More
We propose contrastive coding to learn shared, dense image representations, referred to as CoMIRs (Contrastive Multimodal Image Representations). CoMIRs enable the registration of multimodal images where existing registration methods often fail due to a lack of sufficiently similar image structures. CoMIRs reduce the multimodal registration problem to a monomodal one, in which general intensity-based, as well as feature-based, registration algorithms can be applied. The method involves training one neural network per modality on aligned images, using a contrastive loss based on noise-contrastive estimation (InfoNCE). Unlike other contrastive coding methods, used for, e.g., classification, our approach generates image-like representations that contain the information shared between modalities. We introduce a novel, hyperparameter-free modification to InfoNCE, to enforce rotational equivariance of the learnt representations, a property essential to the registration task. We assess the extent of achieved rotational equivariance and the stability of the representations with respect to weight initialization, training set, and hyperparameter settings, on a remote sensing dataset of RGB and near-infrared images. We evaluate the learnt representations through registration of a biomedical dataset of bright-field and second-harmonic generation microscopy images; two modalities with very little apparent correlation. The proposed approach based on CoMIRs significantly outperforms registration of representations created by GAN-based image-to-image translation, as well as a state-of-the-art, application-specific method which takes additional knowledge about the data into account. Code is available at: https://github.com/MIDA-group/CoMIR.
△ Less
Submitted 23 October, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
A Deep Learning based Pipeline for Efficient Oral Cancer Screening on Whole Slide Images
Authors:
Jiahao Lu,
Nataša Sladoje,
Christina Runow Stark,
Eva Darai Ramqvist,
Jan-Michaél Hirsch,
Joakim Lindblad
Abstract:
Oral cancer incidence is rapidly increasing worldwide. The most important determinant factor in cancer survival is early diagnosis. To facilitate large scale screening, we propose a fully automated pipeline for oral cancer detection on whole slide cytology images. The pipeline consists of fully convolutional regression-based nucleus detection, followed by per-cell focus selection, and CNN based cl…
▽ More
Oral cancer incidence is rapidly increasing worldwide. The most important determinant factor in cancer survival is early diagnosis. To facilitate large scale screening, we propose a fully automated pipeline for oral cancer detection on whole slide cytology images. The pipeline consists of fully convolutional regression-based nucleus detection, followed by per-cell focus selection, and CNN based classification. Our novel focus selection step provides fast per-cell focus decisions at human-level accuracy. We demonstrate that the pipeline provides efficient cancer classification of whole slide cytology images, improving over previous results both in terms of accuracy and feasibility. The complete source code is available at https://github.com/MIDA-group/OralScreen.
△ Less
Submitted 15 April, 2020; v1 submitted 23 October, 2019;
originally announced October 2019.
-
Stochastic Distance Transform
Authors:
Johan Öfverstedt,
Joakim Lindblad,
Nataša Sladoje
Abstract:
The distance transform (DT) and its many variations are ubiquitous tools for image processing and analysis. In many imaging scenarios, the images of interest are corrupted by noise. This has a strong negative impact on the accuracy of the DT, which is highly sensitive to spurious noise points. In this study, we consider images represented as discrete random sets and observe statistics of DT comput…
▽ More
The distance transform (DT) and its many variations are ubiquitous tools for image processing and analysis. In many imaging scenarios, the images of interest are corrupted by noise. This has a strong negative impact on the accuracy of the DT, which is highly sensitive to spurious noise points. In this study, we consider images represented as discrete random sets and observe statistics of DT computed on such representations. We, thus, define a stochastic distance transform (SDT), which has an adjustable robustness to noise. Both a stochastic Monte Carlo method and a deterministic method for computing the SDT are proposed and compared. Through a series of empirical tests, we demonstrate that the SDT is effective not only in improving the accuracy of the computed distances in the presence of noise, but also in improving the performance of template matching and watershed segmentation of partially overlap** objects, which are examples of typical applications where DTs are utilized.
△ Less
Submitted 18 October, 2018;
originally announced October 2018.
-
Ensemble of Convolutional Neural Networks for Dermoscopic Images Classification
Authors:
Tomáš Majtner,
Buda Bajić,
Sule Yildirim,
Jon Yngve Hardeberg,
Joakim Lindblad,
Nataša Sladoje
Abstract:
In this report, we are presenting our automated prediction system for disease classification within dermoscopic images. The proposed solution is based on deep learning, where we employed transfer learning strategy on VGG16 and GoogLeNet architectures. The key feature of our solution is preprocessing based primarily on image augmentation and colour normalization. The solution was evaluated on Task…
▽ More
In this report, we are presenting our automated prediction system for disease classification within dermoscopic images. The proposed solution is based on deep learning, where we employed transfer learning strategy on VGG16 and GoogLeNet architectures. The key feature of our solution is preprocessing based primarily on image augmentation and colour normalization. The solution was evaluated on Task 3: Lesion Diagnosis of the ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection.
△ Less
Submitted 15 August, 2018;
originally announced August 2018.
-
Fast and Robust Symmetric Image Registration Based on Distances Combining Intensity and Spatial Information
Authors:
Johan Öfverstedt,
Joakim Lindblad,
Nataša Sladoje
Abstract:
Intensity-based image registration approaches rely on similarity measures to guide the search for geometric correspondences with high affinity between images. The properties of the used measure are vital for the robustness and accuracy of the registration. In this study a symmetric, intensity interpolation-free, affine registration framework based on a combination of intensity and spatial informat…
▽ More
Intensity-based image registration approaches rely on similarity measures to guide the search for geometric correspondences with high affinity between images. The properties of the used measure are vital for the robustness and accuracy of the registration. In this study a symmetric, intensity interpolation-free, affine registration framework based on a combination of intensity and spatial information is proposed. The excellent performance of the framework is demonstrated on a combination of synthetic tests, recovering known transformations in the presence of noise, and real applications in biomedical and medical image registration, for both 2D and 3D images. The method exhibits greater robustness and higher accuracy than similarity measures in common use, when inserted into a standard gradient-based registration framework available as part of the open source Insight Segmentation and Registration Toolkit (ITK). The method is also empirically shown to have a low computational cost, making it practical for real applications. Source code is available.
△ Less
Submitted 21 February, 2019; v1 submitted 30 July, 2018;
originally announced July 2018.