Beyond attention: deriving biologically interpretable insights from weakly-supervised multiple-instance learning models
Authors:
Willem Bonnaffé,
CRUK ICGC Prostate Group,
Freddie Hamdy,
Yang Hu,
Ian Mills,
Jens Rittscher,
Clare Verrill,
Dan J. Woodcock
Abstract:
Recent advances in attention-based multiple instance learning (MIL) have improved our insights into the tissue regions that models rely on to make predictions in digital pathology. However, the interpretability of these approaches is still limited. In particular, they do not report whether high-attention regions are positively or negatively associated with the class labels or how well these region…
▽ More
Recent advances in attention-based multiple instance learning (MIL) have improved our insights into the tissue regions that models rely on to make predictions in digital pathology. However, the interpretability of these approaches is still limited. In particular, they do not report whether high-attention regions are positively or negatively associated with the class labels or how well these regions correspond to previously established clinical and biological knowledge. We address this by introducing a post-training methodology to analyse MIL models. Firstly, we introduce prediction-attention-weighted (PAW) maps by combining tile-level attention and prediction scores produced by a refined encoder, allowing us to quantify the predictive contribution of high-attention regions. Secondly, we introduce a biological feature instantiation technique by integrating PAW maps with nuclei segmentation masks. This further improves interpretability by providing biologically meaningful features related to the cellular organisation of the tissue and facilitates comparisons with known clinical features. We illustrate the utility of our approach by comparing PAW maps obtained for prostate cancer diagnosis (i.e. samples containing malignant tissue, 381/516 tissue samples) and prognosis (i.e. samples from patients with biochemical recurrence following surgery, 98/663 tissue samples) in a cohort of patients from the international cancer genome consortium (ICGC UK Prostate Group). Our approach reveals that regions that are predictive of adverse prognosis do not tend to co-locate with the tumour regions, indicating that non-cancer cells should also be studied when evaluating prognosis.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
Quantifying intrinsic and extrinsic noise in gene transcription using the linear noise approximation: An application to single cell data
Authors:
Bärbel Finkenstädt,
Dan J. Woodcock,
Michal Komorowski,
Claire V. Harper,
Julian R. E. Davis,
Mike R. H. White,
David A. Rand
Abstract:
A central challenge in computational modeling of dynamic biological systems is parameter inference from experimental time course measurements. However, one would not only like to infer kinetic parameters but also study their variability from cell to cell. Here we focus on the case where single-cell fluorescent protein imaging time series data are available for a population of cells. Based on van K…
▽ More
A central challenge in computational modeling of dynamic biological systems is parameter inference from experimental time course measurements. However, one would not only like to infer kinetic parameters but also study their variability from cell to cell. Here we focus on the case where single-cell fluorescent protein imaging time series data are available for a population of cells. Based on van Kampen's linear noise approximation, we derive a dynamic state space model for molecular populations which is then extended to a hierarchical model. This model has potential to address the sources of variability relevant to single-cell data, namely, intrinsic noise due to the stochastic nature of the birth and death processes involved in reactions and extrinsic noise arising from the cell-to-cell variation of kinetic parameters. In order to infer such a model from experimental data, one must also quantify the measurement process where one has to allow for nonmeasurable molecular species as well as measurement noise of unknown level and variance. The availability of multiple single-cell time series data here provides a unique testbed to fit such a model and quantify these different sources of variation from experimental data.
△ Less
Submitted 8 January, 2014;
originally announced January 2014.