Search | arXiv e-print repository

Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation

Authors: Hamideh Kerdegari, Kyle Higgins, Dennis Veselkov, Ivan Laponogov, Inese Polaka, Miguel Coimbra, Junior Andrea Pescino, Marcis Leja, Mario Dinis-Ribeiro, Tania Fleitas Kanonnikoff, Kirill Veselkov

Abstract: The integration of artificial intelligence (AI) in medical diagnostics represents a significant advancement in managing upper gastrointestinal (GI) cancer, a major cause of global cancer mortality. Specifically for gastric cancer (GC), chronic inflammation causes changes in the mucosa such as atrophy, intestinal metaplasia (IM), dysplasia and ultimately cancer. Early detection through endoscopic r… ▽ More The integration of artificial intelligence (AI) in medical diagnostics represents a significant advancement in managing upper gastrointestinal (GI) cancer, a major cause of global cancer mortality. Specifically for gastric cancer (GC), chronic inflammation causes changes in the mucosa such as atrophy, intestinal metaplasia (IM), dysplasia and ultimately cancer. Early detection through endoscopic regular surveillance is essential for better outcomes. Foundation models (FM), which are machine or deep learning models trained on diverse data and applicable to broad use cases, offer a promising solution to enhance the accuracy of endoscopy and its subsequent pathology image analysis. This review explores the recent advancements, applications, and challenges associated with FM in endoscopy and pathology imaging. We started by elucidating the core principles and architectures underlying these models, including their training methodologies and the pivotal role of large-scale data in develo** their predictive capabilities. Moreover, this work discusses emerging trends and future research directions, emphasizing the integration of multimodal data, the development of more robust and equitable models, and the potential for real-time diagnostic support. This review aims to provide a roadmap for researchers and practitioners in navigating the complexities of incorporating FM into clinical practice for prevention/management of GC cases, thereby improving patient outcomes. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2310.12658 [pdf, other]

phyloDB: A framework for large-scale phylogenetic analysis

Authors: Bruno Lourenço, Cátia Vaz, Miguel E. Coimbra, Alexandre P. Francisco

Abstract: phyloDB is a modular and extensible framework for large-scale phylogenetic analyses, which are essential for understanding epidemics evolution. It relies on the Neo4j graph database for data storage and processing, providing a schema and an API for representing and querying phylogenetic data. Custom algorithms are also supported, allowing to perform heavy computations directly over the data, and t… ▽ More phyloDB is a modular and extensible framework for large-scale phylogenetic analyses, which are essential for understanding epidemics evolution. It relies on the Neo4j graph database for data storage and processing, providing a schema and an API for representing and querying phylogenetic data. Custom algorithms are also supported, allowing to perform heavy computations directly over the data, and to store results in the database. Multiple computation results are stored as multilayer networks, promoting and facilitating comparative analyses, as well as avoiding unnecessary ab initio computations. The experimental evaluation results showcase that phyloDB is efficient and scalable with respect to both API operations and algorithms execution. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: arXiv admin note: text overlap with arXiv:2012.13363

arXiv:2209.13385 [pdf, ps, other]

doi 10.1109/JBHI.2023.3275039

Beyond Heart Murmur Detection: Automatic Murmur Grading from Phonocardiogram

Authors: Andoni Elola, Elisabete Aramendi, Jorge Oliveira, Francesco Renna, Miguel T. Coimbra, Matthew A. Reyna, Reza Sameni, Gari D. Clifford, Ali Bahrami Rad

Abstract: Objective: Murmurs are abnormal heart sounds, identified by experts through cardiac auscultation. The murmur grade, a quantitative measure of the murmur intensity, is strongly correlated with the patient's clinical condition. This work aims to estimate each patient's murmur grade (i.e., absent, soft, loud) from multiple auscultation location phonocardiograms (PCGs) of a large population of pediatr… ▽ More Objective: Murmurs are abnormal heart sounds, identified by experts through cardiac auscultation. The murmur grade, a quantitative measure of the murmur intensity, is strongly correlated with the patient's clinical condition. This work aims to estimate each patient's murmur grade (i.e., absent, soft, loud) from multiple auscultation location phonocardiograms (PCGs) of a large population of pediatric patients from a low-resource rural area. Methods: The Mel spectrogram representation of each PCG recording is given to an ensemble of 15 convolutional residual neural networks with channel-wise attention mechanisms to classify each PCG recording. The final murmur grade for each patient is derived based on the proposed decision rule and considering all estimated labels for available recordings. The proposed method is cross-validated on a dataset consisting of 3456 PCG recordings from 1007 patients using a stratified ten-fold cross-validation. Additionally, the method was tested on a hidden test set comprised of 1538 PCG recordings from 442 patients. Results: The overall cross-validation performances for patient-level murmur gradings are 86.3% and 81.6% in terms of the unweighted average of sensitivities and F1-scores, respectively. The sensitivities (and F1-scores) for absent, soft, and loud murmurs are 90.7% (93.6%), 75.8% (66.8%), and 92.3% (84.2%), respectively. On the test set, the algorithm achieves an unweighted average of sensitivities of 80.4% and an F1-score of 75.8%. Conclusions: This study provides a potential approach for algorithmic pre-screening in low-resource settings with relatively high expert screening costs. Significance: The proposed method represents a significant step beyond detection of murmurs, providing characterization of intensity which may provide a enhanced classification of clinical outcomes. △ Less

Submitted 13 April, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

arXiv:2108.00813 [pdf, other]

doi 10.1109/JBHI.2021.3137048

The CirCor DigiScope Dataset: From Murmur Detection to Murmur Classification

Authors: Jorge Oliveira, Francesco Renna, Paulo Dias Costa, Marcelo Nogueira, Cristina Oliveira, Carlos Ferreira, Alipio Jorge, Sandra Mattos, Thamine Hatem, Thiago Tavares, Andoni Elola, Ali Bahrami Rad, Reza Sameni, Gari D Clifford, Miguel T. Coimbra

Abstract: Cardiac auscultation is one of the most cost-effective techniques used to detect and identify many heart conditions. Computer-assisted decision systems based on auscultation can support physicians in their decisions. Unfortunately, the application of such systems in clinical trials is still minimal since most of them only aim to detect the presence of extra or abnormal waves in the phonocardiogram… ▽ More Cardiac auscultation is one of the most cost-effective techniques used to detect and identify many heart conditions. Computer-assisted decision systems based on auscultation can support physicians in their decisions. Unfortunately, the application of such systems in clinical trials is still minimal since most of them only aim to detect the presence of extra or abnormal waves in the phonocardiogram signal, i.e., only a binary ground truth variable (normal vs abnormal) is provided. This is mainly due to the lack of large publicly available datasets, where a more detailed description of such abnormal waves (e.g., cardiac murmurs) exists. To pave the way to more effective research on healthcare recommendation systems based on auscultation, our team has prepared the currently largest pediatric heart sound dataset. A total of 5282 recordings have been collected from the four main auscultation locations of 1568 patients, in the process, 215780 heart sounds have been manually annotated. Furthermore, and for the first time, each cardiac murmur has been manually annotated by an expert annotator according to its timing, shape, pitch, grading, and quality. In addition, the auscultation locations where the murmur is present were identified as well as the auscultation location where the murmur is detected more intensively. Such detailed description for a relatively large number of heart sounds may pave the way for new machine learning algorithms with a real-world application for the detection and analysis of murmur waves for diagnostic purposes. △ Less

Submitted 24 December, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: 12 pages, 6 tables, 8 figures, in IEEE Journal of Biomedical and Health Informatics

arXiv:2003.07981 [pdf, other]

doi 10.1111/itor.13138

Segmentation and Optimal Region Selection of Physiological Signals using Deep Neural Networks and Combinatorial Optimization

Authors: Jorge Oliveira, Margarida Carvalho, Diogo Marcelo Nogueira, Miguel Coimbra

Abstract: Physiological signals, such as the electrocardiogram and the phonocardiogram are very often corrupted by noisy sources. Usually, artificial intelligent algorithms analyze the signal regardless of its quality. On the other hand, physicians use a completely orthogonal strategy. They do not assess the entire recording, instead they search for a segment where the fundamental and abnormal waves are eas… ▽ More Physiological signals, such as the electrocardiogram and the phonocardiogram are very often corrupted by noisy sources. Usually, artificial intelligent algorithms analyze the signal regardless of its quality. On the other hand, physicians use a completely orthogonal strategy. They do not assess the entire recording, instead they search for a segment where the fundamental and abnormal waves are easily detected, and only then a prognostic is attempted. Inspired by this fact, a new algorithm that automatically selects an optimal segment for a post-processing stage, according to a criteria defined by the user is proposed. In the process, a Neural Network is used to compute the output state probability distribution for each sample. Using the aforementioned quantities, a graph is designed, whereas state transition constraints are physically imposed into the graph and a set of constraints are used to retrieve a subset of the recording that maximizes the likelihood function, proposed by the user. The developed framework is tested and validated in two applications. In both cases, the system performance is boosted significantly, e.g in heart sound segmentation, sensitivity increases 2.4% when compared to the standard approaches in the literature. △ Less

Submitted 17 March, 2020; originally announced March 2020.

Journal ref: Intl. Trans. in Op. Res., 30: 601-618 (2023)

arXiv:1911.11624 [pdf, other]

doi 10.1186/s40537-021-00443-9

An analysis of the graph processing landscape

Authors: Miguel E. Coimbra, Alexandre P. Francisco, Luís Veiga

Abstract: The value of graph-based big data can be unlocked by exploring the topology and metrics of the networks they represent, and the computational approaches to this exploration take on many forms. The use-case of performing global computations over a graph, it is first ingested into a graph processing system from one of many digital representations. Extracting information from graphs involves processi… ▽ More The value of graph-based big data can be unlocked by exploring the topology and metrics of the networks they represent, and the computational approaches to this exploration take on many forms. The use-case of performing global computations over a graph, it is first ingested into a graph processing system from one of many digital representations. Extracting information from graphs involves processing all their elements globally, and can be done with single-machine systems (with varying approaches to hardware usage), distributed systems (either homogeneous or heterogeneous groups of machines) and systems dedicated to high-performance computing (HPC). We provide an overview of different aspects of the graph processing landscape and describe classes of systems based on a set of dimensions we detail. The dimensions we detail encompass paradigms to express graph processing, different types of systems to use, coordination and communication models in distributed graph processing, partitioning techniques and different definitions related to the potential for a graph to be updated. This survey is aimed at both the experienced software engineer or researcher as well as the newcomer looking for an understanding of the landscape of solutions (and their limitations) for graph processing. △ Less

Submitted 16 February, 2021; v1 submitted 26 November, 2019; originally announced November 2019.

Comments: 42 pages, 5 figures, 2 tables

Journal ref: Journal of Big Data, 8 (2021), 1-41

arXiv:1911.03195 [pdf, other]

On dynamic succinct graph representations

Authors: Miguel E. Coimbra, Alexandre P. Francisco, Luís M. S. Russo, Guillermo de Bernardo, Susana Ladra, Gonzalo Navarro

Abstract: We address the problem of representing dynamic graphs using $k^2$-trees. The $k^2$-tree data structure is one of the succinct data structures proposed for representing static graphs, and binary relations in general. It relies on compact representations of bit vectors. Hence, by relying on compact representations of dynamic bit vectors, we can also represent dynamic graphs. In this paper we follow… ▽ More We address the problem of representing dynamic graphs using $k^2$-trees. The $k^2$-tree data structure is one of the succinct data structures proposed for representing static graphs, and binary relations in general. It relies on compact representations of bit vectors. Hence, by relying on compact representations of dynamic bit vectors, we can also represent dynamic graphs. In this paper we follow instead the ideas by Munro {\em et al.}, and we present an alternative implementation for representing dynamic graphs using $k^2$-trees. Our experimental results show that this new implementation is competitive in practice. △ Less

Submitted 6 December, 2019; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941

arXiv:1810.02781 [pdf, other]

VeilGraph: Streaming Graph Approximations

Authors: Miguel E. Coimbra, Sérgio Esteves, Alexandre P. Francisco, Luís Veiga

Abstract: Graphs are found in a plethora of domains, including online social networks, the World Wide Web and the study of epidemics, to name a few. With the advent of greater volumes of information and the need for continuously updated results under temporal constraints, it is necessary to explore novel approaches that further enable performance improvements. In the scope of stream processing over graphs… ▽ More Graphs are found in a plethora of domains, including online social networks, the World Wide Web and the study of epidemics, to name a few. With the advent of greater volumes of information and the need for continuously updated results under temporal constraints, it is necessary to explore novel approaches that further enable performance improvements. In the scope of stream processing over graphs, we research the trade-offs between result accuracy and the speedup of approximate computation techniques. We see this as a natural path towards these performance improvements. Herein we present \name, through which we conducted our research. We showcase an innovative model for approximate graph processing, implemented in \texttt{Apache Flink}. We analyze our model and evaluate it with the case study of the PageRank algorithm \cite{pageRank}, perhaps the most famous measure of vertex centrality used to rank websites in search engine results. %In light of our model, we discuss the challenges driven by relations between result accuracy and potential performance gains. Our experiments, even when set up for favoring \texttt{Flink} for comparability, show that \name can improve performance up to 3X speedups, while achieving result quality above 95\% when compared to results of the traditional version of PageRank without any summarization or approximation techniques. △ Less

Submitted 17 December, 2019; v1 submitted 5 October, 2018; originally announced October 2018.

Comments: 10 pages, 3 algorithm, 7 figures, 1 table, 5 equations

arXiv:1703.10628 [pdf, other]

Study on Resource Efficiency of Distributed Graph Processing

Authors: Miguel E. Coimbra, Alexandre P. Francisco, Luis Veiga

Abstract: Graphs may be used to represent many different problem domains -- a concrete example is that of detecting communities in social networks, which are represented as graphs. With big data and more sophisticated applications becoming widespread in recent years, graph processing has seen an emergence of requirements pertaining data volume and volatility. This multidisciplinary study presents a review o… ▽ More Graphs may be used to represent many different problem domains -- a concrete example is that of detecting communities in social networks, which are represented as graphs. With big data and more sophisticated applications becoming widespread in recent years, graph processing has seen an emergence of requirements pertaining data volume and volatility. This multidisciplinary study presents a review of relevant distributed graph processing systems. Herein they are presented in groups defined by common traits (distributed processing paradigm, type of graph operations, among others), with an overview of each system's strengths and weaknesses. The set of systems is then narrowed down to a set of two, upon which quantitative analysis was performed. For this quantitative comparison of systems, focus was cast on evaluating the performance of algorithms for the problem of detecting communities. To help further understand the evaluations performed, a background is provided on graph clustering. △ Less

Submitted 30 March, 2017; originally announced March 2017.

Report number: INESC-ID Lisboa Technical Report 17/2016, Dec. 2016

arXiv:1703.10446 [pdf, other]

Gelly-Scheduling: Distributed Graph Processing for Network Service Placement in Community Networks

Authors: Miguel E. Coimbra, Mennan Selimi, Alexandre P. Francisco, Felix Freitag, Luís Veiga

Abstract: Community networks (CNs) have seen an increase in the last fifteen years. Their members contact nodes which operate Internet proxies, web servers, user file storage and video streaming services, to name a few. Detecting communities of nodes with properties (such as co-location) and assessing node eligibility for service placement is thus a key-factor in optimizing the experience of users. We prese… ▽ More Community networks (CNs) have seen an increase in the last fifteen years. Their members contact nodes which operate Internet proxies, web servers, user file storage and video streaming services, to name a few. Detecting communities of nodes with properties (such as co-location) and assessing node eligibility for service placement is thus a key-factor in optimizing the experience of users. We present a novel solution for the problem of service placement as a two-phase approach, based on: 1) community finding using a scalable graph label propagation technique and 2) a decentralized election procedure to address the multi-objective challenge of optimizing service placement in CNs. Herein we: i) highlight the applicability of leader election heuristics which are important for service placement in community networks and scheduler-dependent scenarios; ii) present a parallel and distributed solution designed as a scalable alternative for the problem of service placement, which has mostly seen computational approaches based on centralization and sequential execution. △ Less

Submitted 18 January, 2018; v1 submitted 30 March, 2017; originally announced March 2017.

Report number: INESC-ID Lisboa Technical Report 4/2017, Feb 2017

Showing 1–10 of 10 results for author: Coimbra, M