-
EDEN : An Event DEtection Network for the annotation of Breast Cancer recurrences in administrative claims data
Authors:
Elise Dumas,
Anne-Sophie Hamy,
Sophie Houzard,
Eva Hernandez,
Aullène Toussaint,
Julien Guerin,
Laetitia Chanas,
Victoire de Castelbajac,
Mathilde Saint-Ghislain,
Beatriz Grandal,
Eric Daoud,
Fabien Reyal,
Chloé-Agathe Azencott
Abstract:
While the emergence of large administrative claims data provides opportunities for research, their use remains limited by the lack of clinical annotations relevant to disease outcomes, such as recurrence in breast cancer (BC). Several challenges arise from the annotation of such endpoints in administrative claims, including the need to infer both the occurrence and the date of the recurrence, the…
▽ More
While the emergence of large administrative claims data provides opportunities for research, their use remains limited by the lack of clinical annotations relevant to disease outcomes, such as recurrence in breast cancer (BC). Several challenges arise from the annotation of such endpoints in administrative claims, including the need to infer both the occurrence and the date of the recurrence, the right-censoring of data, or the importance of time intervals between medical visits. Deep learning approaches have been successfully used to label temporal medical sequences, but no method is currently able to handle simultaneously right-censoring and visit temporality to detect survival events in medical sequences. We propose EDEN (Event DEtection Network), a time-aware Long-Short-Term-Memory network for survival analyses, and its custom loss function. Our method outperforms several state-of-the-art approaches on real-world BC datasets. EDEN constitutes a powerful tool to annotate disease recurrence from administrative claims, thus paving the way for the massive use of such data in BC research.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Machine learning and genomics: precision medicine vs. patient privacy
Authors:
Chloé-Agathe Azencott
Abstract:
Machine learning can have major societal impact in computational biology applications. In particular, it plays a central role in the development of precision medicine, whereby treatment is tailored to the clinical or genetic features of the patient. However, these advances require collecting and sharing among researchers large amounts of genomic data, which generates much concern about privacy. Re…
▽ More
Machine learning can have major societal impact in computational biology applications. In particular, it plays a central role in the development of precision medicine, whereby treatment is tailored to the clinical or genetic features of the patient. However, these advances require collecting and sharing among researchers large amounts of genomic data, which generates much concern about privacy. Researchers, study participants and governing bodies should be aware of the ways in which the privacy of participants might be compromised, as well as of the large body of research on technical solutions to these issues. We review how breaches in patient privacy can occur, present recent developments in computational data protection, and discuss how they can be combined with legal and ethical perspectives to provide secure frameworks for genomic data sharing.
△ Less
Submitted 23 May, 2018; v1 submitted 28 February, 2018;
originally announced February 2018.
-
Network-Guided Biomarker Discovery
Authors:
Chloé-Agathe Azencott
Abstract:
Identifying measurable genetic indicators (or biomarkers) of a specific condition of a biological system is a key element of precision medicine. Indeed it allows to tailor diagnostic, prognostic and treatment choice to individual characteristics of a patient. In machine learning terms, biomarker discovery can be framed as a feature selection problem on whole-genome data sets. However, classical fe…
▽ More
Identifying measurable genetic indicators (or biomarkers) of a specific condition of a biological system is a key element of precision medicine. Indeed it allows to tailor diagnostic, prognostic and treatment choice to individual characteristics of a patient. In machine learning terms, biomarker discovery can be framed as a feature selection problem on whole-genome data sets. However, classical feature selection methods are usually underpowered to process these data sets, which contain orders of magnitude more features than samples. This can be addressed by making the assumption that genetic features that are linked on a biological network are more likely to work jointly towards explaining the phenotype of interest. We review here three families of methods for feature selection that integrate prior knowledge in the form of networks.
△ Less
Submitted 15 December, 2016; v1 submitted 27 July, 2016;
originally announced July 2016.