-
Bayesian ECG reconstruction using denoising diffusion generative models
Authors:
Gabriel V. Cardoso,
Lisa Bedin,
Josselin Duchateau,
RĂ©mi Dubois,
Eric Moulines
Abstract:
In this work, we propose a denoising diffusion generative model (DDGM) trained with healthy electrocardiogram (ECG) data that focuses on ECG morphology and inter-lead dependence. Our results show that this innovative generative model can successfully generate realistic ECG signals. Furthermore, we explore the application of recent breakthroughs in solving linear inverse Bayesian problems using DDG…
▽ More
In this work, we propose a denoising diffusion generative model (DDGM) trained with healthy electrocardiogram (ECG) data that focuses on ECG morphology and inter-lead dependence. Our results show that this innovative generative model can successfully generate realistic ECG signals. Furthermore, we explore the application of recent breakthroughs in solving linear inverse Bayesian problems using DDGM. This approach enables the development of several important clinical tools. These include the calculation of corrected QT intervals (QTc), effective noise suppression of ECG signals, recovery of missing ECG leads, and identification of anomalous readings, enabling significant advances in cardiac health monitoring and diagnosis.
△ Less
Submitted 18 December, 2023;
originally announced January 2024.
-
Integrating the PanDA Workload Management System with the Vera C. Rubin Observatory
Authors:
Edward Karavakis,
Wen Guan,
Zhaoyu Yang,
Tadashi Maeno,
Torre Wenaus,
Jennifer Adelman-McCarthy,
Fernando Barreiro Megino,
Kaushik De,
Richard Dubois,
Michelle Gower,
Tim Jenness,
Alexei Klimentov,
Tatiana Korchuganova,
Mikolaj Kowalik,
Fa-Hui Lin,
Paul Nilsson,
Sergey Padolski,
Wei Yang,
Shuwei Ye
Abstract:
The Vera C. Rubin Observatory will produce an unprecedented astronomical data set for studies of the deep and dynamic universe. Its Legacy Survey of Space and Time (LSST) will image the entire southern sky every three to four days and produce tens of petabytes of raw image data and associated calibration data over the course of the experiment's run. More than 20 terabytes of data must be stored ev…
▽ More
The Vera C. Rubin Observatory will produce an unprecedented astronomical data set for studies of the deep and dynamic universe. Its Legacy Survey of Space and Time (LSST) will image the entire southern sky every three to four days and produce tens of petabytes of raw image data and associated calibration data over the course of the experiment's run. More than 20 terabytes of data must be stored every night, and annual campaigns to reprocess the entire dataset since the beginning of the survey will be conducted over ten years. The Production and Distributed Analysis (PanDA) system was evaluated by the Rubin Observatory Data Management team and selected to serve the Observatory's needs due to its demonstrated scalability and flexibility over the years, for its Directed Acyclic Graph (DAG) support, its support for multi-site processing, and its highly scalable complex workflows via the intelligent Data Delivery Service (iDDS). PanDA is also being evaluated for prompt processing where data must be processed within 60 seconds after image capture. This paper will briefly describe the Rubin Data Management system and its Data Facilities (DFs). Finally, it will describe in depth the work performed in order to integrate the PanDA system with the Rubin Observatory to be able to run the Rubin Science Pipelines using PanDA.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subty** models
Authors:
Bilel Guetarni,
Feryal Windal,
Halim Benhabiles,
Marianne Petit,
Romain Dubois,
Emmanuelle Leteurtre,
Dominique Collard
Abstract:
Determining lymphoma subtypes is a crucial step for better patient treatment targeting to potentially increase their survival chances. In this context, the existing gold standard diagnosis method, which relies on gene expression technology, is highly expensive and time-consuming, making it less accessibility. Although alternative diagnosis methods based on IHC (immunohistochemistry) technologies e…
▽ More
Determining lymphoma subtypes is a crucial step for better patient treatment targeting to potentially increase their survival chances. In this context, the existing gold standard diagnosis method, which relies on gene expression technology, is highly expensive and time-consuming, making it less accessibility. Although alternative diagnosis methods based on IHC (immunohistochemistry) technologies exist (recommended by the WHO), they still suffer from similar limitations and are less accurate. Whole Slide Image (WSI) analysis using deep learning models has shown promising potential for cancer diagnosis, that could offer cost-effective and faster alternatives to existing methods. In this work, we propose a vision transformer-based framework for distinguishing DLBCL (Diffuse Large B-Cell Lymphoma) cancer subtypes from high-resolution WSIs. To this end, we introduce a multi-modal architecture to train a classifier model from various WSI modalities. We then leverage this model through a knowledge distillation process to efficiently guide the learning of a mono-modal classifier. Our experimental study conducted on a lymphoma dataset of 157 patients shows the promising performance of our mono-modal classification model, outperforming six recent state-of-the-art methods. In addition, the power-law curve, estimated on our experimental data, suggests that with more training data from a reasonable number of additional patients, our model could achieve competitive diagnosis accuracy with IHC technologies. Furthermore, the efficiency of our framework is confirmed through an additional experimental study on an external breast cancer dataset (BCI dataset).
△ Less
Submitted 29 May, 2024; v1 submitted 2 August, 2023;
originally announced August 2023.
-
The STOIC2021 COVID-19 AI challenge: applying reusable training methodologies to private data
Authors:
Luuk H. Boulogne,
Julian Lorenz,
Daniel Kienzle,
Robin Schon,
Katja Ludwig,
Rainer Lienhart,
Simon Jegou,
Guang Li,
Cong Chen,
Qi Wang,
Derik Shi,
Mayug Maniparambil,
Dominik Muller,
Silvan Mertes,
Niklas Schroter,
Fabio Hellmann,
Miriam Elia,
Ine Dirks,
Matias Nicolas Bossa,
Abel Diaz Berenguer,
Tanmoy Mukherjee,
Jef Vandemeulebroucke,
Hichem Sahli,
Nikos Deligiannis,
Panagiotis Gonidakis
, et al. (13 additional authors not shown)
Abstract:
Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training m…
▽ More
Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training methodologies. With T3, challenge organizers train a codebase provided by the participants on sequestered training data. T3 was implemented in the STOIC2021 challenge, with the goal of predicting from a computed tomography (CT) scan whether subjects had a severe COVID-19 infection, defined as intubation or death within one month. STOIC2021 consisted of a Qualification phase, where participants developed challenge solutions using 2000 publicly available CT scans, and a Final phase, where participants submitted their training methodologies with which solutions were trained on CT scans of 9724 subjects. The organizers successfully trained six of the eight Final phase submissions. The submitted codebases for training and running inference were released publicly. The winning solution obtained an area under the receiver operating characteristic curve for discerning between severe and non-severe COVID-19 of 0.815. The Final phase solutions of all finalists improved upon their Qualification phase solutions.HSUXJM-TNZF9CHSUXJM-TNZF9C
△ Less
Submitted 25 June, 2023; v1 submitted 18 June, 2023;
originally announced June 2023.
-
SONYC: A System for the Monitoring, Analysis and Mitigation of Urban Noise Pollution
Authors:
Juan Pablo Bello,
Claudio Silva,
Oded Nov,
R. Luke DuBois,
Anish Arora,
Justin Salamon,
Charles Mydlarz,
Harish Doraiswamy
Abstract:
We present the Sounds of New York City (SONYC) project, a smart cities initiative focused on develo** a cyber-physical system for the monitoring, analysis and mitigation of urban noise pollution. Noise pollution is one of the topmost quality of life issues for urban residents in the U.S. with proven effects on health, education, the economy, and the environment. Yet, most cities lack the resourc…
▽ More
We present the Sounds of New York City (SONYC) project, a smart cities initiative focused on develo** a cyber-physical system for the monitoring, analysis and mitigation of urban noise pollution. Noise pollution is one of the topmost quality of life issues for urban residents in the U.S. with proven effects on health, education, the economy, and the environment. Yet, most cities lack the resources to continuously monitor noise and understand the contribution of individual sources, the tools to analyze patterns of noise pollution at city-scale, and the means to empower city agencies to take effective, data-driven action for noise mitigation. The SONYC project advances novel technological and socio-technical solutions that help address these needs.
SONYC includes a distributed network of both sensors and people for large-scale noise monitoring. The sensors use low-cost, low-power technology, and cutting-edge machine listening techniques, to produce calibrated acoustic measurements and recognize individual sound sources in real time. Citizen science methods are used to help urban residents connect to city agencies and each other, understand their noise footprint, and facilitate reporting and self-regulation. Crucially, SONYC utilizes big data solutions to analyze, retrieve and visualize information from sensors and citizens, creating a comprehensive acoustic model of the city that can be used to identify significant patterns of noise pollution. These data can be used to drive the strategic application of noise code enforcement by city agencies to optimize the reduction of noise pollution. The entire system, integrating cyber, physical and social infrastructure, forms a closed loop of continuous sensing, analysis and actuation on the environment.
SONYC provides a blueprint for the mitigation of noise pollution that can potentially be applied to other cities in the US and abroad.
△ Less
Submitted 18 May, 2018; v1 submitted 2 May, 2018;
originally announced May 2018.
-
Towards the selection of patients requiring ICD implantation by automatic classification from Holter monitoring indices
Authors:
Charles-Henri Cappelaere,
R. Dubois,
P. Roussel,
G. Dreyfus
Abstract:
The purpose of this study is to optimize the selection of prophylactic cardioverter defibrillator implantation candidates. Currently, the main criterion for implantation is a low Left Ventricular Ejection Fraction (LVEF) whose specificity is relatively poor. We designed two classifiers aimed to predict, from long term ECG recordings (Holter), whether a low-LVEF patient is likely or not to undergo…
▽ More
The purpose of this study is to optimize the selection of prophylactic cardioverter defibrillator implantation candidates. Currently, the main criterion for implantation is a low Left Ventricular Ejection Fraction (LVEF) whose specificity is relatively poor. We designed two classifiers aimed to predict, from long term ECG recordings (Holter), whether a low-LVEF patient is likely or not to undergo ventricular arrhythmia in the next six months. One classifier is a single hidden layer neural network whose variables are the most relevant features extracted from Holter recordings, and the other classifier has a structure that capitalizes on the physiological decomposition of the arrhythmogenic factors into three disjoint groups: the myocardial substrate, the triggers and the autonomic nervous system (ANS). In this ad hoc network, the features were assigned to each group; one neural network classifier per group was designed and its complexity was optimized. The outputs of the classifiers were fed to a single neuron that provided the required probability estimate. The latter was thresholded for final discrimination A dataset composed of 186 pre-implantation 30-mn Holter recordings of patients equipped with an implantable cardioverter defibrillator (ICD) in primary prevention was used in order to design and test this classifier. 44 out of 186 patients underwent at least one treated ventricular arrhythmia during the six-month follow-up period. Performances of the designed classifier were evaluated using a cross-test strategy that consists in splitting the database into several combinations of a training set and a test set. The average arrhythmia prediction performances of the ad-hoc classifier are NPV = 77% $\pm$ 13% and PPV = 31% $\pm$ 19% (Negative Predictive Value $\pm$ std, Positive Predictive Value $\pm$ std). According to our study, improving prophylactic ICD-implantation candidate selection by automatic classification from ECG features may be possible, but the availability of a sizable dataset appears to be essential to decrease the number of False Negatives.
△ Less
Submitted 16 January, 2014;
originally announced January 2014.