Search | arXiv e-print repository

Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

Authors: Elliott Gordon-Rodriguez, Thomas P. Quinn, John P. Cunningham

Abstract: Data augmentation plays a key role in modern machine learning pipelines. While numerous augmentation strategies have been studied in the context of computer vision and natural language processing, less is known for other data modalities. Our work extends the success of data augmentation to compositional data, i.e., simplex-valued data, which is of particular interest in the context of the human mi… ▽ More Data augmentation plays a key role in modern machine learning pipelines. While numerous augmentation strategies have been studied in the context of computer vision and natural language processing, less is known for other data modalities. Our work extends the success of data augmentation to compositional data, i.e., simplex-valued data, which is of particular interest in the context of the human microbiome. Drawing on key principles from compositional data analysis, such as the Aitchison geometry of the simplex and subcompositions, we define novel augmentation strategies for this data modality. Incorporating our data augmentations into standard supervised learning pipelines results in consistent performance gains across a wide range of standard benchmark datasets. In particular, we set a new state-of-the-art for key disease prediction tasks including colorectal cancer, type 2 diabetes, and Crohn's disease. In addition, our data augmentations enable us to define a novel contrastive learning model, which improves on previous representation learning approaches for microbiome compositional data. Our code is available at https://github.com/cunningham-lab/AugCoDa. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2110.08253 [pdf, other]

A Field Guide to Scientific XAI: Transparent and Interpretable Deep Learning for Bioinformatics Research

Authors: Thomas P Quinn, Sunil Gupta, Svetha Venkatesh, Vuong Le

Abstract: Deep learning has become popular because of its potential to achieve high accuracy in prediction tasks. However, accuracy is not always the only goal of statistical modelling, especially for models developed as part of scientific research. Rather, many scientific models are developed to facilitate scientific discovery, by which we mean to abstract a human-understandable representation of the natur… ▽ More Deep learning has become popular because of its potential to achieve high accuracy in prediction tasks. However, accuracy is not always the only goal of statistical modelling, especially for models developed as part of scientific research. Rather, many scientific models are developed to facilitate scientific discovery, by which we mean to abstract a human-understandable representation of the natural world. Unfortunately, the opacity of deep neural networks limit their role in scientific discovery, creating a new demand for models that are transparently interpretable. This article is a field guide to transparent model design. It provides a taxonomy of transparent model design concepts, a practical workflow for putting design concepts into practice, and a general template for reporting design choices. We hope this field guide will help researchers more effectively design transparently interpretable models, and thus enable them to use deep learning for scientific discovery. △ Less

Submitted 13 October, 2021; originally announced October 2021.

arXiv:2109.02866 [pdf, other]

Readying Medical Students for Medical AI: The Need to Embed AI Ethics Education

Authors: Thomas P Quinn, Simon Coghlan

Abstract: Medical students will almost inevitably encounter powerful medical AI systems early in their careers. Yet, contemporary medical education does not adequately equip students with the basic clinical proficiency in medical AI needed to use these tools safely and effectively. Education reform is urgently needed, but not easily implemented, largely due to an already jam-packed medical curricula. In thi… ▽ More Medical students will almost inevitably encounter powerful medical AI systems early in their careers. Yet, contemporary medical education does not adequately equip students with the basic clinical proficiency in medical AI needed to use these tools safely and effectively. Education reform is urgently needed, but not easily implemented, largely due to an already jam-packed medical curricula. In this article, we propose an education reform framework as an effective and efficient solution, which we call the Embedded AI Ethics Education Framework. Unlike other calls for education reform to accommodate AI teaching that are more radical in scope, our framework is modest and incremental. It leverages existing bioethics or medical ethics curricula to develop and deliver content on the ethical issues associated with medical AI, especially the harms of technology misuse, disuse, and abuse that affect the risk-benefit analyses at the heart of healthcare. In doing so, the framework provides a simple tool for going beyond the "What?" and the "Why?" of medical AI ethics education, to answer the "How?", giving universities, course directors, and/or professors a broad road-map for equip** their students with the necessary clinical proficiency in medical AI. △ Less

Submitted 7 September, 2021; originally announced September 2021.

arXiv:2103.12983 [pdf, other]

Counterfactual Explanation with Multi-Agent Reinforcement Learning for Drug Target Prediction

Authors: Tri Minh Nguyen, Thomas P Quinn, Thin Nguyen, Truyen Tran

Abstract: Motivation: Many high-performance DTA models have been proposed, but they are mostly black-box and thus lack human interpretability. Explainable AI (XAI) can make DTA models more trustworthy, and can also enable scientists to distill biological knowledge from the models. Counterfactual explanation is one popular approach to explaining the behaviour of a deep neural network, which works by systemat… ▽ More Motivation: Many high-performance DTA models have been proposed, but they are mostly black-box and thus lack human interpretability. Explainable AI (XAI) can make DTA models more trustworthy, and can also enable scientists to distill biological knowledge from the models. Counterfactual explanation is one popular approach to explaining the behaviour of a deep neural network, which works by systematically answering the question "How would the model output change if the inputs were changed in this way?". Most counterfactual explanation methods only operate on single input data. It remains an open problem how to extend counterfactual-based XAI methods to DTA models, which have two inputs, one for drug and one for target, that also happen to be discrete in nature. Methods: We propose a multi-agent reinforcement learning framework, Multi-Agent Counterfactual Drug target binding Affinity (MACDA), to generate counterfactual explanations for the drug-protein complex. Our proposed framework provides human-interpretable counterfactual instances while optimizing both the input drug and target for counterfactual generation at the same time. Results: We benchmark the proposed MACDA framework using the Davis dataset and find that our framework produces more parsimonious explanations with no loss in explanation validity, as measured by encoding similarity and QED. We then present a case study involving ABL1 and Nilotinib to demonstrate how MACDA can explain the behaviour of a DTA model in the underlying substructure interaction between inputs in its prediction, revealing mechanisms that align with prior domain knowledge. △ Less

Submitted 1 June, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

arXiv:2012.06000 [pdf, other]

The Three Ghosts of Medical AI: Can the Black-Box Present Deliver?

Authors: Thomas P. Quinn, Stephan Jacobs, Manisha Senadeera, Vuong Le, Simon Coghlan

Abstract: Our title alludes to the three Christmas ghosts encountered by Ebenezer Scrooge in \textit{A Christmas Carol}, who guide Ebenezer through the past, present, and future of Christmas holiday events. Similarly, our article will take readers through a journey of the past, present, and future of medical AI. In doing so, we focus on the crux of modern machine learning: the reliance on powerful but intri… ▽ More Our title alludes to the three Christmas ghosts encountered by Ebenezer Scrooge in \textit{A Christmas Carol}, who guide Ebenezer through the past, present, and future of Christmas holiday events. Similarly, our article will take readers through a journey of the past, present, and future of medical AI. In doing so, we focus on the crux of modern machine learning: the reliance on powerful but intrinsically opaque models. When applied to the healthcare domain, these models fail to meet the needs for transparency that their clinician and patient end-users require. We review the implications of this failure, and argue that opaque models (1) lack quality assurance, (2) fail to elicit trust, and (3) restrict physician-patient dialogue. We then discuss how upholding transparency in all aspects of model design and model validation can help ensure the reliability of medical AI. △ Less

Submitted 10 December, 2020; originally announced December 2020.

arXiv:2008.07734 [pdf, other]

Trust and Medical AI: The challenges we face and the expertise needed to overcome them

Authors: Thomas P. Quinn, Manisha Senadeera, Stephan Jacobs, Simon Coghlan, Vuong Le

Abstract: Artificial intelligence (AI) is increasingly of tremendous interest in the medical field. However, failures of medical AI could have serious consequences for both clinical outcomes and the patient experience. These consequences could erode public trust in AI, which could in turn undermine trust in our healthcare institutions. This article makes two contributions. First, it describes the major conc… ▽ More Artificial intelligence (AI) is increasingly of tremendous interest in the medical field. However, failures of medical AI could have serious consequences for both clinical outcomes and the patient experience. These consequences could erode public trust in AI, which could in turn undermine trust in our healthcare institutions. This article makes two contributions. First, it describes the major conceptual, technical, and humanistic challenges in medical AI. Second, it proposes a solution that hinges on the education and accreditation of new expert groups who specialize in the development, verification, and operation of medical AI technologies. These groups will be required to maintain trust in our healthcare institutions. △ Less

Submitted 18 August, 2020; originally announced August 2020.

Comments: 6 pages, 2 figures

arXiv:2006.01392 [pdf, other]

DeepCoDA: personalized interpretability for compositional health data

Authors: Thomas P. Quinn, Dang Nguyen, Santu Rana, Sunil Gupta, Svetha Venkatesh

Abstract: Interpretability allows the domain-expert to directly evaluate the model's relevance and reliability, a practice that offers assurance and builds trust. In the healthcare setting, interpretable models should implicate relevant biological mechanisms independent of technical factors like data pre-processing. We define personalized interpretability as a measure of sample-specific feature attribution,… ▽ More Interpretability allows the domain-expert to directly evaluate the model's relevance and reliability, a practice that offers assurance and builds trust. In the healthcare setting, interpretable models should implicate relevant biological mechanisms independent of technical factors like data pre-processing. We define personalized interpretability as a measure of sample-specific feature attribution, and view it as a minimum requirement for a precision health model to justify its conclusions. Some health data, especially those generated by high-throughput sequencing experiments, have nuances that compromise precision health models and their interpretation. These data are compositional, meaning that each feature is conditionally dependent on all other features. We propose the Deep Compositional Data Analysis (DeepCoDA) framework to extend precision health modelling to high-dimensional compositional data, and to provide personalized interpretability through patient-specific weights. Our architecture maintains state-of-the-art performance across 25 real-world data sets, all while producing interpretations that are both personalized and fully coherent for compositional data. △ Less

Submitted 16 June, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

Comments: To appear at ICML 2020

Showing 1–7 of 7 results for author: Quinn, T P