Search | arXiv e-print repository

doi 10.1007/s13218-023-00820-x

Building an AI Support Tool for Real-time Ulcerative Colitis Diagnosis

Authors: Bjørn Leth Møller, Bobby Zhao Sheng Lo, Johan Burisch, Flemming Bendtsen, Ida Vind, Bulat Ibragimov, Christian Igel

Abstract: Ulcerative Colitis (UC) is a chronic inflammatory bowel disease decreasing life quality through symptoms such as bloody diarrhoea and abdominal pain. Endoscopy is a cornerstone of diagnosis and monitoring of UC. The Mayo endoscopic subscore (MES) index is the standard for measuring UC severity during endoscopic evaluation. However, the MES is subject to high inter-observer variability leading to m… ▽ More Ulcerative Colitis (UC) is a chronic inflammatory bowel disease decreasing life quality through symptoms such as bloody diarrhoea and abdominal pain. Endoscopy is a cornerstone of diagnosis and monitoring of UC. The Mayo endoscopic subscore (MES) index is the standard for measuring UC severity during endoscopic evaluation. However, the MES is subject to high inter-observer variability leading to misdiagnosis and suboptimal treatment. We propose using a machine-learning based MES classification system to support the endoscopic process and to mitigate the observer-variability. The system runs real-time in the clinic and augments doctors' decision-making during the endoscopy. This project report outlines the process of designing, creating and evaluating our system. We describe our initial evaluation, which is a combination of a standard non-clinical model test and a first clinical test of the system on a real patient. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2301.06190 [pdf, other]

BuildSeg: A General Framework for the Segmentation of Buildings

Authors: Lei Li, Tianfang Zhang, Stefan Oehmcke, Fabian Gieseke, Christian Igel

Abstract: Building segmentation from aerial images and 3D laser scanning (LiDAR) is a challenging task due to the diversity of backgrounds, building textures, and image quality. While current research using different types of convolutional and transformer networks has considerably improved the performance on this task, even more accurate segmentation methods for buildings are desirable for applications such… ▽ More Building segmentation from aerial images and 3D laser scanning (LiDAR) is a challenging task due to the diversity of backgrounds, building textures, and image quality. While current research using different types of convolutional and transformer networks has considerably improved the performance on this task, even more accurate segmentation methods for buildings are desirable for applications such as automatic map**. In this study, we propose a general framework termed \emph{BuildSeg} employing a generic approach that can be quickly applied to segment buildings. Different data sources were combined to increase generalization performance. The approach yields good results for different data sources as shown by experiments on high-resolution multi-spectral and LiDAR imagery of cities in Norway, Denmark and France. We applied ConvNeXt and SegFormer based models on the high resolution aerial image dataset from the MapAI-competition. The methods achieved an IOU of 0.7902 and a boundary IOU of 0.6185. We used post-processing to account for the rectangular shape of the objects. This increased the boundary IOU from 0.6185 to 0.6189. △ Less

Submitted 15 January, 2023; originally announced January 2023.

arXiv:2212.09088 [pdf, other]

LR-CSNet: Low-Rank Deep Unfolding Network for Image Compressive Sensing

Authors: Tianfang Zhang, Lei Li, Christian Igel, Stefan Oehmcke, Fabian Gieseke, Zhenming Peng

Abstract: Deep unfolding networks (DUNs) have proven to be a viable approach to compressive sensing (CS). In this work, we propose a DUN called low-rank CS network (LR-CSNet) for natural image CS. Real-world image patches are often well-represented by low-rank approximations. LR-CSNet exploits this property by adding a low-rank prior to the CS optimization task. We derive a corresponding iterative optimizat… ▽ More Deep unfolding networks (DUNs) have proven to be a viable approach to compressive sensing (CS). In this work, we propose a DUN called low-rank CS network (LR-CSNet) for natural image CS. Real-world image patches are often well-represented by low-rank approximations. LR-CSNet exploits this property by adding a low-rank prior to the CS optimization task. We derive a corresponding iterative optimization procedure using variable splitting, which is then translated to a new DUN architecture. The architecture uses low-rank generation modules (LRGMs), which learn low-rank matrix factorizations, as well as gradient descent and proximal map**s (GDPMs), which are proposed to extract high-frequency features to refine image details. In addition, the deep features generated at each reconstruction stage in the DUN are transferred between stages to boost the performance. Our extensive experiments on three widely considered datasets demonstrate the promising performance of LR-CSNet compared to state-of-the-art methods in natural image CS. △ Less

Submitted 18 December, 2022; originally announced December 2022.

arXiv:2205.10643 [pdf, other]

doi 10.1109/JSTSP.2022.3207050

Self-Supervised Speech Representation Learning: A Review

Authors: Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe

Abstract: Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a… ▽ More Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Such methods have shown success in natural language processing and computer vision domains, achieving new levels of performance while reducing the number of labels required for many downstream scenarios. Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods. Other approaches rely on multi-modal data for pre-training, mixing text or visual data streams with speech. Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years. This review presents approaches for self-supervised speech representation learning and their connection to other research areas. Since many current methods focus solely on automatic speech recognition as a downstream task, we review recent efforts on benchmarking learned representations to extend the application beyond speech recognition. △ Less

Submitted 27 October, 2022; v1 submitted 21 May, 2022; originally announced May 2022.

arXiv:2203.01829 [pdf, other]

A Brief Overview of Unsupervised Neural Speech Representation Learning

Authors: Lasse Borgholt, Jakob Drachmann Havtorn, Joakim Edin, Lars Maaløe, Christian Igel

Abstract: Unsupervised representation learning for speech processing has matured greatly in the last few years. Work in computer vision and natural language processing has paved the way, but speech data offers unique challenges. As a result, methods from other domains rarely translate directly. We review the development of unsupervised representation learning for speech over the last decade. We identify two… ▽ More Unsupervised representation learning for speech processing has matured greatly in the last few years. Work in computer vision and natural language processing has paved the way, but speech data offers unique challenges. As a result, methods from other domains rarely translate directly. We review the development of unsupervised representation learning for speech over the last decade. We identify two primary model categories: self-supervised methods and probabilistic latent variable models. We describe the models and develop a comprehensive taxonomy. Finally, we discuss and compare models from the two categories. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: The 2nd Workshop on Self-supervised Learning for Audio and Speech Processing (SAS) at AAAI

arXiv:2111.14842 [pdf, other]

Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

Authors: Lasse Borgholt, Jakob Drachmann Havtorn, Mostafa Abdou, Joakim Edin, Lars Maaløe, Anders Søgaard, Christian Igel

Abstract: Spoken language understanding (SLU) tasks are usually solved by first transcribing an utterance with automatic speech recognition (ASR) and then feeding the output to a text-based model. Recent advances in self-supervised representation learning for speech data have focused on improving the ASR component. We investigate whether representation learning for speech has matured enough to replace ASR i… ▽ More Spoken language understanding (SLU) tasks are usually solved by first transcribing an utterance with automatic speech recognition (ASR) and then feeding the output to a text-based model. Recent advances in self-supervised representation learning for speech data have focused on improving the ASR component. We investigate whether representation learning for speech has matured enough to replace ASR in SLU. We compare learned speech features from wav2vec 2.0, state-of-the-art ASR transcripts, and the ground truth text as input for a novel speech-based named entity recognition task, a cardiac arrest detection task on real-world emergency calls and two existing SLU benchmarks. We show that learned speech features are superior to ASR transcripts on three classification tasks. For machine translation, ASR transcripts are still the better choice. We highlight the intrinsic robustness of wav2vec 2.0 representations to out-of-vocabulary words as key to better performance. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: Under review as a conference paper at ICASSP 2022

arXiv:2102.09928 [pdf, other]

doi 10.21437/Interspeech.2020-1750

Do End-to-End Speech Recognition Models Care About Context?

Authors: Lasse Borgholt, Jakob Drachmann Havtorn, Željko Agić, Anders Søgaard, Lars Maaløe, Christian Igel

Abstract: The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual… ▽ More The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual information in the audio input. We find that the AED model is indeed more context sensitive, but that the gap can be closed by adding self-attention to the CTC model. Furthermore, the two models perform similarly when contextual information is constrained. Finally, in contrast to previous research, our results show that the CTC model is highly competitive on WSJ and LibriSpeech without the help of an external language model. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Comments: Published in the proceedings of INTERSPEECH 2020, pp. 4352-4356

arXiv:2102.00850 [pdf, other]

On Scaling Contrastive Representations for Low-Resource Speech Recognition

Authors: Lasse Borgholt, Tycho Max Sylvester Tax, Jakob Drachmann Havtorn, Lars Maaløe, Christian Igel

Abstract: Recent advances in self-supervised learning through contrastive training have shown that it is possible to learn a competitive speech recognition system with as little as 10 minutes of labeled data. However, these systems are computationally expensive since they require pre-training followed by fine-tuning in a large parameter space. We explore the performance of such systems without fine-tuning b… ▽ More Recent advances in self-supervised learning through contrastive training have shown that it is possible to learn a competitive speech recognition system with as little as 10 minutes of labeled data. However, these systems are computationally expensive since they require pre-training followed by fine-tuning in a large parameter space. We explore the performance of such systems without fine-tuning by training a state-of-the-art speech recognizer on the fixed representations from the computationally demanding wav2vec 2.0 framework. We find performance to decrease without fine-tuning and, in the extreme low-resource setting, wav2vec 2.0 is inferior to its predecessor. In addition, we find that wav2vec 2.0 representations live in a low dimensional subspace and that decorrelating the features of the representations can stabilize training of the automatic speech recognizer. Finally, we propose a bidirectional extension to the original wav2vec framework that consistently improves performance. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2006.15057 [pdf, other]

A Loss Function for Generative Neural Networks Based on Watson's Perceptual Model

Authors: Steffen Czolbe, Oswin Krause, Ingemar Cox, Christian Igel

Abstract: To train Variational Autoencoders (VAEs) to generate realistic imagery requires a loss function that reflects human perception of image similarity. We propose such a loss function based on Watson's perceptual model, which computes a weighted distance in frequency space and accounts for luminance and contrast masking. We extend the model to color images, increase its robustness to translation by us… ▽ More To train Variational Autoencoders (VAEs) to generate realistic imagery requires a loss function that reflects human perception of image similarity. We propose such a loss function based on Watson's perceptual model, which computes a weighted distance in frequency space and accounts for luminance and contrast masking. We extend the model to color images, increase its robustness to translation by using the Fourier Transform, remove artifacts due to splitting the image into blocks, and make it differentiable. In experiments, VAEs trained with the new loss function generated realistic, high-quality image samples. Compared to using the Euclidean distance and the Structural Similarity Index, the images were less blurry; compared to deep neural network based losses, the new approach required less computational resources and generated images with less artifacts. △ Less

Submitted 6 January, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

Comments: Published at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

arXiv:2004.14003 [pdf, other]

The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge: A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset

Authors: Arjun D. Desai, Francesco Caliva, Claudia Iriondo, Naji Khosravan, Aliasghar Mortazi, Sachin Jambawalikar, Drew Torigian, Jutta Ellermann, Mehmet Akcakaya, Ulas Bagci, Radhika Tibrewala, Io Flament, Matthew O`Brien, Sharmila Majumdar, Mathias Perslev, Akshay Pai, Christian Igel, Erik B. Dam, Sibaji Gaj, Mingrui Yang, Kunio Nakamura, Xiaojuan Li, Cem M. Deniz, Vladimir Juras, Ravinder Regatte , et al. (4 additional authors not shown)

Abstract: Purpose: To organize a knee MRI segmentation challenge for characterizing the semantic and clinical efficacy of automatic segmentation methods relevant for monitoring osteoarthritis progression. Methods: A dataset partition consisting of 3D knee MRI from 88 subjects at two timepoints with ground-truth articular (femoral, tibial, patellar) cartilage and meniscus segmentations was standardized. Ch… ▽ More Purpose: To organize a knee MRI segmentation challenge for characterizing the semantic and clinical efficacy of automatic segmentation methods relevant for monitoring osteoarthritis progression. Methods: A dataset partition consisting of 3D knee MRI from 88 subjects at two timepoints with ground-truth articular (femoral, tibial, patellar) cartilage and meniscus segmentations was standardized. Challenge submissions and a majority-vote ensemble were evaluated using Dice score, average symmetric surface distance, volumetric overlap error, and coefficient of variation on a hold-out test set. Similarities in network segmentations were evaluated using pairwise Dice correlations. Articular cartilage thickness was computed per-scan and longitudinally. Correlation between thickness error and segmentation metrics was measured using Pearson's coefficient. Two empirical upper bounds for ensemble performance were computed using combinations of model outputs that consolidated true positives and true negatives. Results: Six teams (T1-T6) submitted entries for the challenge. No significant differences were observed across all segmentation metrics for all tissues (p=1.0) among the four top-performing networks (T2, T3, T4, T6). Dice correlations between network pairs were high (>0.85). Per-scan thickness errors were negligible among T1-T4 (p=0.99) and longitudinal changes showed minimal bias (<0.03mm). Low correlations (<0.41) were observed between segmentation metrics and thickness error. The majority-vote ensemble was comparable to top performing networks (p=1.0). Empirical upper bound performances were similar for both combinations (p=1.0). Conclusion: Diverse networks learned to segment the knee similarly where high segmentation accuracy did not correlate to cartilage thickness accuracy. Voting ensembles did not outperform individual networks but may help regularize individual models. △ Less

Submitted 26 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

Comments: Submitted to Radiology: Artificial Intelligence; Fixed typos

arXiv:1911.01764 [pdf, other]

doi 10.1007/978-3-030-32245-8_4

One Network to Segment Them All: A General, Lightweight System for Accurate 3D Medical Image Segmentation

Authors: Mathias Perslev, Erik Bjørnager Dam, Akshay Pai, Christian Igel

Abstract: Many recent medical segmentation systems rely on powerful deep learning models to solve highly specific tasks. To maximize performance, it is standard practice to evaluate numerous pipelines with varying model topologies, optimization parameters, pre- & postprocessing steps, and even model cascades. It is often not clear how the resulting pipeline transfers to different tasks. We propose a simple… ▽ More Many recent medical segmentation systems rely on powerful deep learning models to solve highly specific tasks. To maximize performance, it is standard practice to evaluate numerous pipelines with varying model topologies, optimization parameters, pre- & postprocessing steps, and even model cascades. It is often not clear how the resulting pipeline transfers to different tasks. We propose a simple and thoroughly evaluated deep learning framework for segmentation of arbitrary medical image volumes. The system requires no task-specific information, no human interaction and is based on a fixed model topology and a fixed hyperparameter set, eliminating the process of model selection and its inherent tendency to cause method-level over-fitting. The system is available in open source and does not require deep learning expertise to use. Without task-specific modifications, the system performed better than or similar to highly specialized deep learning methods across 3 separate segmentation tasks. In addition, it ranked 5-th and 6-th in the first and second round of the 2018 Medical Segmentation Decathlon comprising another 10 tasks. The system relies on multi-planar data augmentation which facilitates the application of a single 2D architecture based on the familiar U-Net. Multi-planar training combines the parameter efficiency of a 2D fully convolutional neural network with a systematic train- and test-time augmentation scheme, which allows the 2D model to learn a representation of the 3D image volume that fosters generalization. △ Less

Submitted 5 November, 2019; originally announced November 2019.

Journal ref: Medical Image Computing and Computer Assisted Intervention (MICCAI), LNCS 11765, pp. 30-38, Springer, 2019

arXiv:1910.11162 [pdf, other]

U-Time: A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging

Authors: Mathias Perslev, Michael Hejselbak Jensen, Sune Darkner, Poul Jørgen Jennum, Christian Igel

Abstract: Neural networks are becoming more and more popular for the analysis of physiological time-series. The most successful deep learning systems in this domain combine convolutional and recurrent layers to extract useful features to model temporal relations. Unfortunately, these recurrent models are difficult to tune and optimize. In our experience, they often require task-specific modifications, which… ▽ More Neural networks are becoming more and more popular for the analysis of physiological time-series. The most successful deep learning systems in this domain combine convolutional and recurrent layers to extract useful features to model temporal relations. Unfortunately, these recurrent models are difficult to tune and optimize. In our experience, they often require task-specific modifications, which makes them challenging to use for non-experts. We propose U-Time, a fully feed-forward deep learning approach to physiological time series segmentation developed for the analysis of sleep data. U-Time is a temporal fully convolutional network based on the U-Net architecture that was originally proposed for image segmentation. U-Time maps sequential inputs of arbitrary length to sequences of class labels on a freely chosen temporal scale. This is done by implicitly classifying every individual time-point of the input signal and aggregating these classifications over fixed intervals to form the final predictions. We evaluated U-Time for sleep stage classification on a large collection of sleep electroencephalography (EEG) datasets. In all cases, we found that U-Time reaches or outperforms current state-of-the-art deep learning models while being much more robust in the training process and without requiring architecture or hyperparameter adaptation across tasks. △ Less

Submitted 24 October, 2019; originally announced October 2019.

Comments: To appear in Advances in Neural Information Processing Systems (NeurIPS), 2019

arXiv:1908.07355 [pdf, other]

Knowledge distillation for semi-supervised domain adaptation

Authors: Mauricio Orbes-Arteaga, Jorge Cardoso, Lauge Sørensen, Christian Igel, Sebastien Ourselin, Marc Modat, Mads Nielsen, Akshay Pai

Abstract: In the absence of sufficient data variation (e.g., scanner and protocol variability) in annotated data, deep neural networks (DNNs) tend to overfit during training. As a result, their performance is significantly lower on data from unseen sources compared to the performance on data from the same source as the training data. Semi-supervised domain adaptation methods can alleviate this problem by tu… ▽ More In the absence of sufficient data variation (e.g., scanner and protocol variability) in annotated data, deep neural networks (DNNs) tend to overfit during training. As a result, their performance is significantly lower on data from unseen sources compared to the performance on data from the same source as the training data. Semi-supervised domain adaptation methods can alleviate this problem by tuning networks to new target domains without the need for annotated data from these domains. Adversarial domain adaptation (ADA) methods are a popular choice that aim to train networks in such a way that the features generated are domain agnostic. However, these methods require careful dataset-specific selection of hyperparameters such as the complexity of the discriminator in order to achieve a reasonable performance. We propose to use knowledge distillation (KD) -- an efficient way of transferring knowledge between different DNNs -- for semi-supervised domain adaption of DNNs. It does not require dataset-specific hyperparameter tuning, making it generally applicable. The proposed method is compared to ADA for segmentation of white matter hyperintensities (WMH) in magnetic resonance imaging (MRI) scans generated by scanners that are not a part of the training set. Compared with both the baseline DNN (trained on source domain only and without any adaption to target domain) and with using ADA for semi-supervised domain adaptation, the proposed method achieves significantly higher WMH dice scores. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Comments: MLCN MICCAI workshop

Showing 1–13 of 13 results for author: Igel, C