Search | arXiv e-print repository

arXiv:2207.07458 [pdf]

Joint Application of the Target Trial Causal Framework and Machine Learning Modeling to Optimize Antibiotic Therapy: Use Case on Acute Bacterial Skin and Skin Structure Infections due to Methicillin-resistant Staphylococcus aureus

Authors: Inyoung Jun, Simone Marini, Christina A. Boucher, J. Glenn Morris, Jiang Bian, Mattia Prosperi

Abstract: Bacterial infections are responsible for high mortality worldwide. Antimicrobial resistance underlying the infection, and multifaceted patient's clinical status can hamper the correct choice of antibiotic treatment. Randomized clinical trials provide average treatment effect estimates but are not ideal for risk stratification and optimization of therapeutic choice, i.e., individualized treatment e… ▽ More Bacterial infections are responsible for high mortality worldwide. Antimicrobial resistance underlying the infection, and multifaceted patient's clinical status can hamper the correct choice of antibiotic treatment. Randomized clinical trials provide average treatment effect estimates but are not ideal for risk stratification and optimization of therapeutic choice, i.e., individualized treatment effects (ITE). Here, we leverage large-scale electronic health record data, collected from Southern US academic clinics, to emulate a clinical trial, i.e., 'target trial', and develop a machine learning model of mortality prediction and ITE estimation for patients diagnosed with acute bacterial skin and skin structure infection (ABSSSI) due to methicillin-resistant Staphylococcus aureus (MRSA). ABSSSI-MRSA is a challenging condition with reduced treatment options - vancomycin is the preferred choice, but it has non-negligible side effects. First, we use propensity score matching to emulate the trial and create a treatment randomized (vancomycin vs. other antibiotics) dataset. Next, we use this data to train various machine learning methods (including boosted/LASSO logistic regression, support vector machines, and random forest) and choose the best model in terms of area under the receiver characteristic (AUC) through bootstrap validation. Lastly, we use the models to calculate ITE and identify possible averted deaths by therapy change. The out-of-bag tests indicate that SVM and RF are the most accurate, with AUC of 81% and 78%, respectively, but BLR/LASSO is not far behind (76%). By calculating the counterfactuals using the BLR/LASSO, vancomycin increases the risk of death, but it shows a large variation (odds ratio 1.2, 95% range 0.4-3.8) and the contribution to outcome probability is modest. Instead, the RF exhibits stronger changes in ITE, suggesting more complex treatment heterogeneity. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Comments: This is the Proceedings of the KDD workshop on Applied Data Science for Healthcare (DSHealth 2022), which was held on Washington D.C, August 14 2022

arXiv:2107.03383 [pdf, other]

Assessing putative bias in prediction of anti-microbial resistance from real-world genoty** data under explicit causal assumptions

Authors: Mattia Prosperi, Simone Marini, Christina Boucher, Jiang Bian

Abstract: Whole genome sequencing (WGS) is quickly becoming the customary means for identification of antimicrobial resistance (AMR) due to its ability to obtain high resolution information about the genes and mechanisms that are causing resistance and driving pathogen mobility. By contrast, traditional phenotypic (antibiogram) testing cannot easily elucidate such information. Yet development of AMR predict… ▽ More Whole genome sequencing (WGS) is quickly becoming the customary means for identification of antimicrobial resistance (AMR) due to its ability to obtain high resolution information about the genes and mechanisms that are causing resistance and driving pathogen mobility. By contrast, traditional phenotypic (antibiogram) testing cannot easily elucidate such information. Yet development of AMR prediction tools from genotype-phenotype data can be biased, since sampling is non-randomized. Sample provenience, period of collection, and species representation can confound the association of genetic traits with AMR. Thus, prediction models can perform poorly on new data with sampling distribution shifts. In this work -- under an explicit set of causal assumptions -- we evaluate the effectiveness of propensity-based rebalancing and confounding adjustment on AMR prediction using genotype-phenotype AMR data from the Pathosystems Resource Integration Center (PATRIC). We select bacterial genotypes (encoded as k-mer signatures, i.e. DNA fragments of length k), country, year, species, and AMR phenotypes for the tetracycline drug class, preparing test data with recent genomes coming from a single country. We test boosted logistic regression (BLR) and random forests (RF) with/without bias-handling. On 10,936 instances, we find evidence of species, location and year imbalance with respect to the AMR phenotype. The crude versus bias-adjusted change in effect of genetic signatures on AMR varies but only moderately (selecting the top 20,000 out of 40+ million k-mers). The area under the receiver operating characteristic (AUROC) of the RF (0.95) is comparable to that of BLR (0.94) on both out-of-bag samples from bootstrap and the external test (n=1,085), where AUROCs do not decrease. We observe a 1%-5% gain in AUROC with bias-handling compared to the sole use of genetic signatures. ... △ Less

Submitted 23 July, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

Comments: In DSHealth '21] Joint KDD 2021 Health Day and 2021 KDD Workshop on Applied Data Science for Healthcare, Aug 14--18, 2021, Virtual, 5 pages

arXiv:1905.10708 [pdf, other]

doi 10.1109/IJCNN.2019.8851907

Underwater Fish Detection with Weak Multi-Domain Supervision

Authors: Dmitry A. Konovalov, Alzayat Saleh, Michael Bradley, Mangalam Sankupellay, Simone Marini, Marcus Sheaves

Abstract: Given a sufficiently large training dataset, it is relatively easy to train a modern convolution neural network (CNN) as a required image classifier. However, for the task of fish classification and/or fish detection, if a CNN was trained to detect or classify particular fish species in particular background habitats, the same CNN exhibits much lower accuracy when applied to new/unseen fish specie… ▽ More Given a sufficiently large training dataset, it is relatively easy to train a modern convolution neural network (CNN) as a required image classifier. However, for the task of fish classification and/or fish detection, if a CNN was trained to detect or classify particular fish species in particular background habitats, the same CNN exhibits much lower accuracy when applied to new/unseen fish species and/or fish habitats. Therefore, in practice, the CNN needs to be continuously fine-tuned to improve its classification accuracy to handle new project-specific fish species or habitats. In this work we present a labelling-efficient method of training a CNN-based fish-detector (the Xception CNN was used as the base) on relatively small numbers (4,000) of project-domain underwater fish/no-fish images from 20 different habitats. Additionally, 17,000 of known negative (that is, missing fish) general-domain (VOC2012) above-water images were used. Two publicly available fish-domain datasets supplied additional 27,000 of above-water and underwater positive/fish images. By using this multi-domain collection of images, the trained Xception-based binary (fish/not-fish) classifier achieved 0.17% false-positives and 0.61% false-negatives on the project's 20,000 negative and 16,000 positive holdout test images, respectively. The area under the ROC curve (AUC) was 99.94%. △ Less

Submitted 1 November, 2019; v1 submitted 25 May, 2019; originally announced May 2019.

Comments: Published in the 2019 International Joint Conference on Neural Networks (IJCNN-2019), Budapest, Hungary, July 14-19, 2019, https://www.ijcnn.org/ , https://ieeexplore.ieee.org/document/8851907

Journal ref: 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 2019, pp. 1-8

arXiv:1110.2400 [pdf]

The CHRONIOUS Ontology-Driven Search Tool: Enabling Access to Focused and Up-to-Date Healthcare Literature

Authors: Stephan Kiefer, Jochen Rauch, Riccardo Albertoni, Marco Attene, Franca Giannini, Simone Marini, Luc Schneider, Carlos Mesquita, Xin Xing, Michael Lawo

Abstract: This paper presents an advanced search engine prototype for bibliography retrieval developed within the CHRONIOUS European IP project of the seventh Framework Program (FP7). This search engine is specifically targeted to clinicians and healthcare practitioners searching for documents related to Chronic Obstructive Pulmonary Disease (COPD) and Chronic Kidney Disease (CKD). To this aim, the presente… ▽ More This paper presents an advanced search engine prototype for bibliography retrieval developed within the CHRONIOUS European IP project of the seventh Framework Program (FP7). This search engine is specifically targeted to clinicians and healthcare practitioners searching for documents related to Chronic Obstructive Pulmonary Disease (COPD) and Chronic Kidney Disease (CKD). To this aim, the presented tool exploits two pathology-specific ontologies that allow focused document indexing and retrieval. These ontologies have been developed on the top of the Middle Layer Ontology for Clinical Care (MLOCC), which provides a link with the Basic Formal Ontology, a foundational ontology used in the Open Biological and Biomedical Ontologies (OBO) Foundry. In addition link with the terms of the MeSH (Medical Subject Heading) thesaurus has been provided to guarantee the coverage with the general certified medical terms and multilingual capabilities. △ Less

Submitted 11 October, 2011; originally announced October 2011.

Comments: published in eChallenges e-2011 Conference Proceedings Paul Cunningham and Miriam Cunningham (Eds) IIMC International Information Management Corporation, 2011 ISBN: 978-1-905824-27-4

ACM Class: I.2.4; H.3.4

Showing 1–4 of 4 results for author: Marini, S