Search | arXiv e-print repository

doi 10.1038/s41698-024-00539-4

An interpretable machine learning system for colorectal cancer diagnosis from pathology slides

Authors: Pedro C. Neto, Diana Montezuma, Sara P. Oliveira, Domingos Oliveira, João Fraga, Ana Monteiro, João Monteiro, Liliana Ribeiro, Sofia Gonçalves, Stefan Reinhard, Inti Zlobec, Isabel M. Pinto, Jaime S. Cardoso

Abstract: Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an app… ▽ More Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an approach to leverage a small subset of fully annotated samples, and a prototype with explainable predictions, active learning features and parallelisation. Noting some problems in the literature, this study is conducted with one of the largest WSI colorectal samples dataset with approximately 10,500 WSIs. Of these samples, 900 are testing samples. Furthermore, the robustness of the proposed method is assessed with two additional external datasets (TCGA and PAIP) and a dataset of samples collected directly from the proposed prototype. Our proposed method predicts, for the patch-based tiles, a class based on the severity of the dysplasia and uses that information to classify the whole slide. It is trained with an interpretable mixed-supervision scheme to leverage the domain knowledge introduced by pathologists through spatial annotations. The mixed-supervision scheme allowed for an intelligent sampling strategy effectively evaluated in several different scenarios without compromising the performance. On the internal dataset, the method shows an accuracy of 93.44% and a sensitivity between positive (low-grade and high-grade dysplasia) and non-neoplastic samples of 0.996. On the external test samples varied with TCGA being the most challenging dataset with an overall accuracy of 84.91% and a sensitivity of 0.996. △ Less

Submitted 30 April, 2024; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: Accepted at npj Precision Oncology. Available at: https://www.nature.com/articles/s41698-024-00539-4

Journal ref: npj Precis. Onc. 8, 56 (2024)

arXiv:2001.09239 [pdf, other]

Multi-task self-supervised learning for Robust Speech Recognition

Authors: Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio

Abstract: Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require ma… ▽ More Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require manual annotations as ground truth). PASE was shown to capture relevant speech information, including speaker voice-print and phonemes. This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments. To this end, we employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances. We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks. Finally, we refine the set of workers used in self-supervision to encourage better cooperation. Results on TIMIT, DIRHA and CHiME-5 show that PASE+ significantly outperforms both the previous version of PASE as well as common acoustic features. Interestingly, PASE+ learns transferable representations suitable for highly mismatched acoustic conditions. △ Less

Submitted 17 April, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

Comments: In Proc. of ICASSP 2020

arXiv:1911.03604 [pdf, other]

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

Authors: Alex Bie, Bharat Venkitesh, Joao Monteiro, Md. Akmal Haidar, Mehdi Rezagholizadeh

Abstract: While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices. That being said, in this paper, we work on simplifying and compressing Transformer-based encoder-decoder architectures for the end-to-end ASR task.… ▽ More While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices. That being said, in this paper, we work on simplifying and compressing Transformer-based encoder-decoder architectures for the end-to-end ASR task. We empirically introduce a more compact Speech-Transformer by investigating the impact of discarding particular modules on the performance of the model. Moreover, we evaluate reducing the numerical precision of our network's weights and activations while maintaining the performance of the full-precision model. Our experiments show that we can reduce the number of parameters of the full-precision model and then further compress the model 4x by fully quantizing to 8-bit fixed point precision. △ Less

Submitted 24 March, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: Submitted to IEEE Signal Processing Letters Minor changes in Section 3

arXiv:1906.08823 [pdf, other]

Cross-Subject Statistical Shift Estimation for Generalized Electroencephalography-based Mental Workload Assessment

Authors: Isabela Albuquerque, João Monteiro, Olivier Rosanne, Abhishek Tiwari, Jean-François Gagnon, Tiago H. Falk

Abstract: Assessment of mental workload in real-world conditions is key to ensure the performance of workers executing tasks that demand sustained attention. Previous literature has employed electroencephalography (EEG) to this end despite having observed that EEG correlates of mental workload vary across subjects and physical strain, thus making it difficult to devise models capable of simultaneously prese… ▽ More Assessment of mental workload in real-world conditions is key to ensure the performance of workers executing tasks that demand sustained attention. Previous literature has employed electroencephalography (EEG) to this end despite having observed that EEG correlates of mental workload vary across subjects and physical strain, thus making it difficult to devise models capable of simultaneously presenting reliable performance across users. Domain adaptation consists of a set of strategies that aim at allowing for improving machine learning systems performance on unseen data at training time. Such methods, however, might rely on assumptions over the considered data distributions, which typically do not hold for applications of EEG data. Motivated by this observation, in this work we propose a strategy to estimate two types of discrepancies between multiple data distributions, namely marginal and conditional shifts, observed on data collected from different subjects. Besides shedding light on the assumptions that hold for a particular dataset, the estimates of statistical shifts obtained with the proposed approach can be used for investigating other aspects of a machine learning pipeline, such as quantitatively assessing the effectiveness of domain adaptation strategies. In particular, we consider EEG data collected from individuals performing mental tasks while running on a treadmill and pedaling on a stationary bike and explore the effects of different normalization strategies commonly used to mitigate cross-subject variability. We show the effects that different normalization schemes have on statistical shifts and their relationship with the accuracy of mental workload prediction as assessed on unseen participants at training time. △ Less

Submitted 22 September, 2021; v1 submitted 20 June, 2019; originally announced June 2019.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:1811.03063 [pdf, other]

Generative Adversarial Speaker Embedding Networks for Domain Robust End-to-End Speaker Verification

Authors: Gautam Bhattacharya, Joao Monteiro, Jahangir Alam, Patrick Kenny

Abstract: This article presents a novel approach for learning domain-invariant speaker embeddings using Generative Adversarial Networks. The main idea is to confuse a domain discriminator so that is can't tell if embeddings are from the source or target domains. We train several GAN variants using our proposed framework and apply them to the speaker verification task. On the challenging NIST-SRE 2016 datase… ▽ More This article presents a novel approach for learning domain-invariant speaker embeddings using Generative Adversarial Networks. The main idea is to confuse a domain discriminator so that is can't tell if embeddings are from the source or target domains. We train several GAN variants using our proposed framework and apply them to the speaker verification task. On the challenging NIST-SRE 2016 dataset, we are able to match the performance of a strong baseline x-vector system. In contrast to the the baseline systems which are dependent on dimensionality reduction (LDA) and an external classifier (PLDA), our proposed speaker embeddings can be scored using simple cosine distance. This is achieved by optimizing our models end-to-end, using an angular margin loss function. Furthermore, we are able to significantly boost verification performance by averaging our different GAN models at the score level, achieving a relative improvement of 7.2% over the baseline. △ Less

Submitted 7 November, 2018; originally announced November 2018.

Comments: Submitted to ICASSP 2019

arXiv:1204.5677 [pdf]

Automatic Generation of C-code or PLD Circuits under SFC Graphical Environment

Authors: C. Ferreira, S. Monteiro, J. Monteiro

Abstract: This paper proposes a framework for automatic development of control systems from a high level specification based in Grafcet formalism. Grafcet, or Sequential Function Charts (SFC), is a special class of Petri Nets and is becoming the standard representation for sequential control systems. The proposed framework accepts a graphical (through ISaGRAPH) or textual behavioural specification of the co… ▽ More This paper proposes a framework for automatic development of control systems from a high level specification based in Grafcet formalism. Grafcet, or Sequential Function Charts (SFC), is a special class of Petri Nets and is becoming the standard representation for sequential control systems. The proposed framework accepts a graphical (through ISaGRAPH) or textual behavioural specification of the control system to be implemented. It follows the usual procedure in software specification: the first step is to formally validate the initial specification. Then the initial specification is translated through automated processes into an implementation. At the moment there are two possible output languages: C and Palasm [1]. The target processor for the C code language are microcontroller based systems that require extended time constrains and access to external peripherals. The goal of including PLD's is the possibility of automatically design mixed hardware and software systems. △ Less

Submitted 15 November, 2011; originally announced April 2012.

Comments: IEEE International Symposium on Industrial Electronics (ISIE'97)

Showing 1–6 of 6 results for author: Monteiro, J