-
Objective hearing threshold identification from auditory brainstem response measurements using supervised and self-supervised approaches
Authors:
Dominik Thalmeier,
Gregor Miller,
Elida Schneltzer,
Anja Hurt,
Martin Hrabě de Angelis,
Lore Becker,
Christian L. Müller,
Holger Maier
Abstract:
Hearing loss is a major health problem and psychological burden in humans. Mouse models offer a possibility to elucidate genes involved in the underlying developmental and pathophysiological mechanisms of hearing impairment. To this end, large-scale mouse phenoty** programs include auditory phenoty** of single-gene knockout mouse lines. Using the auditory brainstem response (ABR) procedure, th…
▽ More
Hearing loss is a major health problem and psychological burden in humans. Mouse models offer a possibility to elucidate genes involved in the underlying developmental and pathophysiological mechanisms of hearing impairment. To this end, large-scale mouse phenoty** programs include auditory phenoty** of single-gene knockout mouse lines. Using the auditory brainstem response (ABR) procedure, the German Mouse Clinic and similar facilities worldwide have produced large, uniform data sets of averaged ABR raw data of mutant and wildtype mice. In the course of standard ABR analysis, hearing thresholds are assessed visually by trained staff from series of signal curves of increasing sound pressure level. This is time-consuming and prone to be biased by the reader as well as the graphical display quality and scale. In an attempt to reduce workload and improve quality and reproducibility, we developed and compared two methods for automated hearing threshold identification from averaged ABR raw data: a supervised approach involving two combined neural networks trained on human-generated labels and a self-supervised approach, which exploits the signal power spectrum and combines random forest sound level estimation with a piece-wise curve fitting algorithm for threshold finding. We show that both models work well, outperform human threshold detection, and are suitable for fast, reliable, and unbiased hearing threshold detection and quality control. In a high-throughput mouse phenoty** environment, both methods perform well as part of an automated end-to-end screening pipeline to detect candidate genes for hearing involvement. Code for both models as well as data used for this work are freely available.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Instrumental Variable Estimation for Compositional Treatments
Authors:
Elisabeth Ailer,
Christian L. Müller,
Niki Kilbertus
Abstract:
Many scientific datasets are compositional in nature. Important biological examples include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. First, we crisply articulate p…
▽ More
Many scientific datasets are compositional in nature. Important biological examples include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. First, we crisply articulate potential pitfalls for practitioners regarding the interpretation of compositional causes from the viewpoint of interventions and warn against attributing causal meaning to common summary statistics such as diversity indices in microbiome data analysis. We then advocate for and develop multivariate methods using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account while still yielding scientifically interpretable results. In a comparative analysis on synthetic and real microbiome data we show the advantages and limitations of our proposal. We posit that our analysis provides a useful framework and guidance for valid and informative cause-effect estimation in the context of compositional data.
△ Less
Submitted 28 May, 2024; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Learning physically consistent mathematical models from data using group sparsity
Authors:
Suryanarayana Maddu,
Bevan L. Cheeseman,
Christian L. Müller,
Ivo F. Sbalzarini
Abstract:
We propose a statistical learning framework based on group-sparse regression that can be used to 1) enforce conservation laws, 2) ensure model equivalence, and 3) guarantee symmetries when learning or inferring differential-equation models from measurement data. Directly learning $\textit{interpretable}$ mathematical models from data has emerged as a valuable modeling approach. However, in areas l…
▽ More
We propose a statistical learning framework based on group-sparse regression that can be used to 1) enforce conservation laws, 2) ensure model equivalence, and 3) guarantee symmetries when learning or inferring differential-equation models from measurement data. Directly learning $\textit{interpretable}$ mathematical models from data has emerged as a valuable modeling approach. However, in areas like biology, high noise levels, sensor-induced correlations, and strong inter-system variability can render data-driven models nonsensical or physically inconsistent without additional constraints on the model structure. Hence, it is important to leverage $\textit{prior}$ knowledge from physical principles to learn "biologically plausible and physically consistent" models rather than models that simply fit the data best. We present a novel group Iterative Hard Thresholding (gIHT) algorithm and use stability selection to infer physically consistent models with minimal parameter tuning. We show several applications from systems biology that demonstrate the benefits of enforcing $\textit{priors}$ in data-driven modeling.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Generalized Stability Approach for Regularized Graphical Models
Authors:
Christian L. Müller,
Richard Bonneau,
Zachary Kurtz
Abstract:
Selecting regularization parameters in penalized high-dimensional graphical models in a principled, data-driven, and computationally efficient manner continues to be one of the key challenges in high-dimensional statistics. We present substantial computational gains and conceptual generalizations of the Stability Approach to Regularization Selection (StARS), a state-of-the-art graphical model sele…
▽ More
Selecting regularization parameters in penalized high-dimensional graphical models in a principled, data-driven, and computationally efficient manner continues to be one of the key challenges in high-dimensional statistics. We present substantial computational gains and conceptual generalizations of the Stability Approach to Regularization Selection (StARS), a state-of-the-art graphical model selection scheme. Using properties of the Poisson-Binomial distribution and convex non-asymptotic distributional modeling we propose lower and upper bounds on the StARS graph regularization path which results in greatly reduced computational cost without compromising regularization selection. We also generalize the StARS criterion from single edge to induced subgraph (graphlet) stability. We show that simultaneously requiring edge and graphlet stability leads to superior graph recovery performance independent of graph topology. These novel insights render Gaussian graphical model selection a routine task on standard multi-core computers.
△ Less
Submitted 23 May, 2016;
originally announced May 2016.
-
Sparse and compositionally robust inference of microbial ecological networks
Authors:
Zachary D. Kurtz,
Christian L. Mueller,
Emily R. Miraldi,
Dan R. Littman,
Martin J. Blaser,
Richard A. Bonneau
Abstract:
16S-ribosomal sequencing and other metagonomic techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions, identification of underlying mechanisms requires new statistical tools, as these datasets pre…
▽ More
16S-ribosomal sequencing and other metagonomic techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions, identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from 16S datasets are compositional, and thus, microbial abundances are not independent. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU interaction networks is severely under-powered, and additional assumptions are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological interactions from metagenomic datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological interaction network is sparse. To reconstruct the interaction network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. Because no large-scale microbial ecological networks have been experimentally validated, SPIEC-EASI comprises computational tools to generate realistic OTU count data from a set of diverse underlying network topologies. SPIEC-EASI outperforms state-of-the-art methods in terms of edge recovery and network properties on realistic synthetic data under a variety of scenarios. SPIEC-EASI also reproducibly predicts previously unknown microbial interactions using data from the American Gut project.
△ Less
Submitted 13 February, 2015; v1 submitted 18 August, 2014;
originally announced August 2014.
-
Global parameter identification of stochastic reaction networks from single trajectories
Authors:
Christian L. Muller,
Rajesh Ramaswamy,
Ivo F. Sbalzarini
Abstract:
We consider the problem of inferring the unknown parameters of a stochastic biochemical network model from a single measured time-course of the concentration of some of the involved species. Such measurements are available, e.g., from live-cell fluorescence microscopy in image-based systems biology. In addition, fluctuation time-courses from, e.g., fluorescence correlation spectroscopy provide add…
▽ More
We consider the problem of inferring the unknown parameters of a stochastic biochemical network model from a single measured time-course of the concentration of some of the involved species. Such measurements are available, e.g., from live-cell fluorescence microscopy in image-based systems biology. In addition, fluctuation time-courses from, e.g., fluorescence correlation spectroscopy provide additional information about the system dynamics that can be used to more robustly infer parameters than when considering only mean concentrations. Estimating model parameters from a single experimental trajectory enables single-cell measurements and quantification of cell--cell variability. We propose a novel combination of an adaptive Monte Carlo sampler, called Gaussian Adaptation, and efficient exact stochastic simulation algorithms that allows parameter identification from single stochastic trajectories. We benchmark the proposed method on a linear and a non-linear reaction network at steady state and during transient phases. In addition, we demonstrate that the present method also provides an ellipsoidal volume estimate of the viable part of parameter space and is able to estimate the physical volume of the compartment in which the observed reactions take place.
△ Less
Submitted 21 November, 2011;
originally announced November 2011.
-
Solving the Advection-Diffusion Equations in Biological Contexts using the Cellular Potts Model
Authors:
Debasis Dan,
Chris Mueller,
Kun Chen,
James A. Glazier
Abstract:
The Cellular Potts Model (CPM) is a robust, cell-level methodology for simulation of biological tissues and morphogenesis. Both tissue physiology and morphogenesis depend on diffusion of chemical morphogens in the extra-cellular fluid or matrix (ECM). Standard diffusion solvers applied to the cellular potts model use finite difference methods on the underlying CPM lattice. However, these methods…
▽ More
The Cellular Potts Model (CPM) is a robust, cell-level methodology for simulation of biological tissues and morphogenesis. Both tissue physiology and morphogenesis depend on diffusion of chemical morphogens in the extra-cellular fluid or matrix (ECM). Standard diffusion solvers applied to the cellular potts model use finite difference methods on the underlying CPM lattice. However, these methods produce a diffusing field tied to the underlying lattice, which is inaccurate in many biological situations in which cell or ECM movement causes advection rapid compared to diffusion. Finite difference schemes suffer numerical instabilities solving the resulting advection-diffusion equations. To circumvent these problems we simulate advection-diffusion within the framework of the CPM using off-lattice finite-difference methods. We define a set of generalized fluid particles which detach advection and diffusion from the lattice. Diffusion occurs between neighboring fluid particles by local averaging rules which approximate the Laplacian. Directed spin flips in the CPM handle the advective movement of the fluid particles. A constraint on relative velocities in the fluid explicitly accounts for fluid viscosity. We use the CPM to solve various diffusion examples including multiple instantaneous sources, continuous sources, moving sources and different boundary geometries and conditions to validate our approximation against analytical and established numerical solutions. We also verify the CPM results for Poiseuille flow and Taylor-Aris dispersion.
△ Less
Submitted 28 April, 2005;
originally announced April 2005.