-
Bugs as Features (Part II): A Perspective on Enriching Microbiome-Gut-Brain Axis Analyses
Authors:
Thomaz F. S. Bastiaanssen,
Thomas P. Quinn,
Amy Loughman
Abstract:
The microbiome-gut-brain-axis field is multidisciplinary, benefiting from the expertise of microbiology, ecology, psychiatry, computational biology, and epidemiology amongst other disciplines. As the field matures and moves beyond a basic demonstration of its relevance, it is critical that study design and analysis are robust and foster reproducibility.
In this companion piece to Bugs as Feature…
▽ More
The microbiome-gut-brain-axis field is multidisciplinary, benefiting from the expertise of microbiology, ecology, psychiatry, computational biology, and epidemiology amongst other disciplines. As the field matures and moves beyond a basic demonstration of its relevance, it is critical that study design and analysis are robust and foster reproducibility.
In this companion piece to Bugs as Features (part I), we present techniques from adjacent and disparate fields to enrich and inform the analysis of microbiome-gut-brain-axis data. Emerging techniques built specifically for the microbiome-gut-brain axis are also demonstrated. All of these methods are contextualised to inform several common challenges: how do we establish causality? How can we integrate data from multiple 'omics techniques? How might we account for the dynamicism of host-microbiome interactions?
This perspective is offered to experienced and emerging microbiome scientists alike, to assist with these questions and others, at the study conception, design, analysis and interpretation stages of research.
△ Less
Submitted 25 July, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Knowledge-based Integration of Multi-Omic Datasets with Anansi: Annotation-based Analysis of Specific Interactions
Authors:
Thomaz F. S. Bastiaanssen,
Thomas P. Quinn,
John F. Cryan
Abstract:
Motivation: Studies including more than one type of 'omics data sets are becoming more prevalent. Integrating these data sets can be a way to solidify findings and even to make new discoveries. However, integrating multi-omics data sets is challenging. Typically, data sets are integrated by performing an all-vs-all correlation analysis, where each feature of the first data set is correlated to eac…
▽ More
Motivation: Studies including more than one type of 'omics data sets are becoming more prevalent. Integrating these data sets can be a way to solidify findings and even to make new discoveries. However, integrating multi-omics data sets is challenging. Typically, data sets are integrated by performing an all-vs-all correlation analysis, where each feature of the first data set is correlated to each feature of the second data set. However, all-vs-all association testing produces unstructured results that are hard to interpret, and involves potentially unnecessary hypothesis testing that reduces statistical power due to false discovery rate (FDR) adjustment.
Implementation: Here, we present the anansi framework, and accompanying R package, as a way to improve upon all-vs-all association analysis. We take a knowledge-based approach where external databases like KEGG are used to constrain the all-vs-all association hypothesis space, only considering pairwise associations that are a priori known to occur. This produces structured results that are easier to interpret, and increases statistical power by skip** unnecessary hypothesis tests. In this paper, we present the anansi framework and demonstrate its application to learn metabolite-function interactions in the context of host-microbe interactions. We further extend our framework beyond pairwise association testing to differential association testing, and show how anansi can be used to identify associations that differ in strength or degree based on sample covariates such as case/control status.
Availability: https://github.com/thomazbastiaanssen/anansi
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Bugs as Features (Part I): Concepts and Foundations for the Compositional Data Analysis of the Microbiome-Gut-Brain Axis
Authors:
Thomaz F. S. Bastiaanssen,
Thomas P. Quinn,
Amy Loughman
Abstract:
There has been a growing acknowledgement of the involvement of the gut microbiome - the collection of microbes that reside in our gut - in regulating our mood and behaviour. This phenomenon is referred to as the microbiome-gut-brain axis. While our techniques to measure the presence and abundance of these microbes have been steadily improving, the analysis of microbiome data is non-trivial.
Here…
▽ More
There has been a growing acknowledgement of the involvement of the gut microbiome - the collection of microbes that reside in our gut - in regulating our mood and behaviour. This phenomenon is referred to as the microbiome-gut-brain axis. While our techniques to measure the presence and abundance of these microbes have been steadily improving, the analysis of microbiome data is non-trivial.
Here, we present a perspective on the concepts and foundations of data analysis and interpretation of microbiome experiments with a focus on the microbiome-gut-brain axis domain. We give an overview of foundational considerations prior to commencing analysis alongside the core microbiome analysis approaches of alpha diversity, beta diversity, differential feature abundance and functional inference. We emphasize the compositional data analysis (CoDA) paradigm.
Further, this perspective features an extensive and heavily annotated microbiome analysis in R in the supplementary materials, as a resource for new and experienced bioinformaticians alike.
△ Less
Submitted 25 July, 2023; v1 submitted 25 July, 2022;
originally announced July 2022.
-
Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome
Authors:
Elliott Gordon-Rodriguez,
Thomas P. Quinn,
John P. Cunningham
Abstract:
Data augmentation plays a key role in modern machine learning pipelines. While numerous augmentation strategies have been studied in the context of computer vision and natural language processing, less is known for other data modalities. Our work extends the success of data augmentation to compositional data, i.e., simplex-valued data, which is of particular interest in the context of the human mi…
▽ More
Data augmentation plays a key role in modern machine learning pipelines. While numerous augmentation strategies have been studied in the context of computer vision and natural language processing, less is known for other data modalities. Our work extends the success of data augmentation to compositional data, i.e., simplex-valued data, which is of particular interest in the context of the human microbiome. Drawing on key principles from compositional data analysis, such as the Aitchison geometry of the simplex and subcompositions, we define novel augmentation strategies for this data modality. Incorporating our data augmentations into standard supervised learning pipelines results in consistent performance gains across a wide range of standard benchmark datasets. In particular, we set a new state-of-the-art for key disease prediction tasks including colorectal cancer, type 2 diabetes, and Crohn's disease. In addition, our data augmentations enable us to define a novel contrastive learning model, which improves on previous representation learning approaches for microbiome compositional data. Our code is available at https://github.com/cunningham-lab/AugCoDa.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
A Field Guide to Scientific XAI: Transparent and Interpretable Deep Learning for Bioinformatics Research
Authors:
Thomas P Quinn,
Sunil Gupta,
Svetha Venkatesh,
Vuong Le
Abstract:
Deep learning has become popular because of its potential to achieve high accuracy in prediction tasks. However, accuracy is not always the only goal of statistical modelling, especially for models developed as part of scientific research. Rather, many scientific models are developed to facilitate scientific discovery, by which we mean to abstract a human-understandable representation of the natur…
▽ More
Deep learning has become popular because of its potential to achieve high accuracy in prediction tasks. However, accuracy is not always the only goal of statistical modelling, especially for models developed as part of scientific research. Rather, many scientific models are developed to facilitate scientific discovery, by which we mean to abstract a human-understandable representation of the natural world. Unfortunately, the opacity of deep neural networks limit their role in scientific discovery, creating a new demand for models that are transparently interpretable. This article is a field guide to transparent model design. It provides a taxonomy of transparent model design concepts, a practical workflow for putting design concepts into practice, and a general template for reporting design choices. We hope this field guide will help researchers more effectively design transparently interpretable models, and thus enable them to use deep learning for scientific discovery.
△ Less
Submitted 13 October, 2021;
originally announced October 2021.
-
Readying Medical Students for Medical AI: The Need to Embed AI Ethics Education
Authors:
Thomas P Quinn,
Simon Coghlan
Abstract:
Medical students will almost inevitably encounter powerful medical AI systems early in their careers. Yet, contemporary medical education does not adequately equip students with the basic clinical proficiency in medical AI needed to use these tools safely and effectively. Education reform is urgently needed, but not easily implemented, largely due to an already jam-packed medical curricula. In thi…
▽ More
Medical students will almost inevitably encounter powerful medical AI systems early in their careers. Yet, contemporary medical education does not adequately equip students with the basic clinical proficiency in medical AI needed to use these tools safely and effectively. Education reform is urgently needed, but not easily implemented, largely due to an already jam-packed medical curricula. In this article, we propose an education reform framework as an effective and efficient solution, which we call the Embedded AI Ethics Education Framework. Unlike other calls for education reform to accommodate AI teaching that are more radical in scope, our framework is modest and incremental. It leverages existing bioethics or medical ethics curricula to develop and deliver content on the ethical issues associated with medical AI, especially the harms of technology misuse, disuse, and abuse that affect the risk-benefit analyses at the heart of healthcare. In doing so, the framework provides a simple tool for going beyond the "What?" and the "Why?" of medical AI ethics education, to answer the "How?", giving universities, course directors, and/or professors a broad road-map for equip** their students with the necessary clinical proficiency in medical AI.
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
Stool Studies Don't Pass the Sniff Test: A Systematic Review of Human Gut Microbiome Research Suggests Widespread Misuse of Machine Learning
Authors:
Thomas P. Quinn
Abstract:
In the machine learning culture, an independent test set is required for proper model verification. Failures in model verification, including test set omission and test set leakage, make it impossible to know whether or not a trained model is fit for purpose. In this article, we present a systematic review and quantitative analysis of human gut microbiome classification studies, conducted to measu…
▽ More
In the machine learning culture, an independent test set is required for proper model verification. Failures in model verification, including test set omission and test set leakage, make it impossible to know whether or not a trained model is fit for purpose. In this article, we present a systematic review and quantitative analysis of human gut microbiome classification studies, conducted to measure the frequency and impact of test set omission and test set leakage on area under the receiver operating curve (AUC) reporting. Among 102 articles included for analysis, we find that only 12% of studies report a bona fide test set AUC, meaning that the published AUCs for 88% of studies cannot be trusted at face value. Our findings cast serious doubt on the general validity of research claiming that the gut microbiome has high diagnostic or prognostic potential in human disease.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
A Critique of Differential Abundance Analysis, and Advocacy for an Alternative
Authors:
Thomas P Quinn,
Elliott Gordon-Rodriguez,
Ionas Erb
Abstract:
It is largely taken for granted that differential abundance analysis is, by default, the best first step when analyzing genomic data. We argue that this is not necessarily the case. In this article, we identify key limitations that are intrinsic to differential abundance analysis: it is (a) dependent on unverifiable assumptions, (b) an unreliable construct, and (c) overly reductionist. We formulat…
▽ More
It is largely taken for granted that differential abundance analysis is, by default, the best first step when analyzing genomic data. We argue that this is not necessarily the case. In this article, we identify key limitations that are intrinsic to differential abundance analysis: it is (a) dependent on unverifiable assumptions, (b) an unreliable construct, and (c) overly reductionist. We formulate an alternative framework called ratio-based biomarker analysis which does not suffer from the identified limitations. Moreover, ratio-based biomarkers are highly flexible. Beyond replacing DAA, they can also be used for many other bespoke analyses, including dimension reduction and multi-omics data integration.
△ Less
Submitted 7 June, 2021; v1 submitted 15 April, 2021;
originally announced April 2021.
-
Counterfactual Explanation with Multi-Agent Reinforcement Learning for Drug Target Prediction
Authors:
Tri Minh Nguyen,
Thomas P Quinn,
Thin Nguyen,
Truyen Tran
Abstract:
Motivation: Many high-performance DTA models have been proposed, but they are mostly black-box and thus lack human interpretability. Explainable AI (XAI) can make DTA models more trustworthy, and can also enable scientists to distill biological knowledge from the models. Counterfactual explanation is one popular approach to explaining the behaviour of a deep neural network, which works by systemat…
▽ More
Motivation: Many high-performance DTA models have been proposed, but they are mostly black-box and thus lack human interpretability. Explainable AI (XAI) can make DTA models more trustworthy, and can also enable scientists to distill biological knowledge from the models. Counterfactual explanation is one popular approach to explaining the behaviour of a deep neural network, which works by systematically answering the question "How would the model output change if the inputs were changed in this way?". Most counterfactual explanation methods only operate on single input data. It remains an open problem how to extend counterfactual-based XAI methods to DTA models, which have two inputs, one for drug and one for target, that also happen to be discrete in nature.
Methods: We propose a multi-agent reinforcement learning framework, Multi-Agent Counterfactual Drug target binding Affinity (MACDA), to generate counterfactual explanations for the drug-protein complex. Our proposed framework provides human-interpretable counterfactual instances while optimizing both the input drug and target for counterfactual generation at the same time.
Results: We benchmark the proposed MACDA framework using the Davis dataset and find that our framework produces more parsimonious explanations with no loss in explanation validity, as measured by encoding similarity and QED. We then present a case study involving ABL1 and Nilotinib to demonstrate how MACDA can explain the behaviour of a DTA model in the underlying substructure interaction between inputs in its prediction, revealing mechanisms that align with prior domain knowledge.
△ Less
Submitted 1 June, 2021; v1 submitted 24 March, 2021;
originally announced March 2021.
-
The Three Ghosts of Medical AI: Can the Black-Box Present Deliver?
Authors:
Thomas P. Quinn,
Stephan Jacobs,
Manisha Senadeera,
Vuong Le,
Simon Coghlan
Abstract:
Our title alludes to the three Christmas ghosts encountered by Ebenezer Scrooge in \textit{A Christmas Carol}, who guide Ebenezer through the past, present, and future of Christmas holiday events. Similarly, our article will take readers through a journey of the past, present, and future of medical AI. In doing so, we focus on the crux of modern machine learning: the reliance on powerful but intri…
▽ More
Our title alludes to the three Christmas ghosts encountered by Ebenezer Scrooge in \textit{A Christmas Carol}, who guide Ebenezer through the past, present, and future of Christmas holiday events. Similarly, our article will take readers through a journey of the past, present, and future of medical AI. In doing so, we focus on the crux of modern machine learning: the reliance on powerful but intrinsically opaque models. When applied to the healthcare domain, these models fail to meet the needs for transparency that their clinician and patient end-users require. We review the implications of this failure, and argue that opaque models (1) lack quality assurance, (2) fail to elicit trust, and (3) restrict physician-patient dialogue. We then discuss how upholding transparency in all aspects of model design and model validation can help ensure the reliability of medical AI.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Trust and Medical AI: The challenges we face and the expertise needed to overcome them
Authors:
Thomas P. Quinn,
Manisha Senadeera,
Stephan Jacobs,
Simon Coghlan,
Vuong Le
Abstract:
Artificial intelligence (AI) is increasingly of tremendous interest in the medical field. However, failures of medical AI could have serious consequences for both clinical outcomes and the patient experience. These consequences could erode public trust in AI, which could in turn undermine trust in our healthcare institutions. This article makes two contributions. First, it describes the major conc…
▽ More
Artificial intelligence (AI) is increasingly of tremendous interest in the medical field. However, failures of medical AI could have serious consequences for both clinical outcomes and the patient experience. These consequences could erode public trust in AI, which could in turn undermine trust in our healthcare institutions. This article makes two contributions. First, it describes the major conceptual, technical, and humanistic challenges in medical AI. Second, it proposes a solution that hinges on the education and accreditation of new expert groups who specialize in the development, verification, and operation of medical AI technologies. These groups will be required to maintain trust in our healthcare institutions.
△ Less
Submitted 18 August, 2020;
originally announced August 2020.
-
DeepCoDA: personalized interpretability for compositional health data
Authors:
Thomas P. Quinn,
Dang Nguyen,
Santu Rana,
Sunil Gupta,
Svetha Venkatesh
Abstract:
Interpretability allows the domain-expert to directly evaluate the model's relevance and reliability, a practice that offers assurance and builds trust. In the healthcare setting, interpretable models should implicate relevant biological mechanisms independent of technical factors like data pre-processing. We define personalized interpretability as a measure of sample-specific feature attribution,…
▽ More
Interpretability allows the domain-expert to directly evaluate the model's relevance and reliability, a practice that offers assurance and builds trust. In the healthcare setting, interpretable models should implicate relevant biological mechanisms independent of technical factors like data pre-processing. We define personalized interpretability as a measure of sample-specific feature attribution, and view it as a minimum requirement for a precision health model to justify its conclusions. Some health data, especially those generated by high-throughput sequencing experiments, have nuances that compromise precision health models and their interpretation. These data are compositional, meaning that each feature is conditionally dependent on all other features. We propose the Deep Compositional Data Analysis (DeepCoDA) framework to extend precision health modelling to high-dimensional compositional data, and to provide personalized interpretability through patient-specific weights. Our architecture maintains state-of-the-art performance across 25 real-world data sets, all while producing interpretations that are both personalized and fully coherent for compositional data.
△ Less
Submitted 16 June, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.