-
Biophysical models of cis-regulation as interpretable neural networks
Authors:
Ammar Tareen,
Justin B. Kinney
Abstract:
The adoption of deep learning techniques in genomics has been hindered by the difficulty of mechanistically interpreting the models that these techniques produce. In recent years, a variety of post-hoc attribution methods have been proposed for addressing this neural network interpretability problem in the context of gene regulation. Here we describe a complementary way of approaching this problem…
▽ More
The adoption of deep learning techniques in genomics has been hindered by the difficulty of mechanistically interpreting the models that these techniques produce. In recent years, a variety of post-hoc attribution methods have been proposed for addressing this neural network interpretability problem in the context of gene regulation. Here we describe a complementary way of approaching this problem. Our strategy is based on the observation that two large classes of biophysical models of cis-regulatory mechanisms can be expressed as deep neural networks in which nodes and weights have explicit physiochemical interpretations. We also demonstrate how such biophysical networks can be rapidly inferred, using modern deep learning frameworks, from the data produced by certain types of massively parallel reporter assays (MPRAs). These results suggest a scalable strategy for using MPRAs to systematically characterize the biophysical basis of gene regulation in a wide range of biological contexts. They also highlight gene regulation as a promising venue for the development of scientifically interpretable approaches to deep learning.
△ Less
Submitted 7 February, 2020; v1 submitted 30 December, 2019;
originally announced January 2020.
-
Response to NITRD, NCO, NSF Request for Information on "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan"
Authors:
J. Amundson,
J. Annis,
C. Avestruz,
D. Bowring,
J. Caldeira,
G. Cerati,
C. Chang,
S. Dodelson,
D. Elvira,
A. Farahi,
K. Genser,
L. Gray,
O. Gutsche,
P. Harris,
J. Kinney,
J. B. Kowalkowski,
R. Kutschke,
S. Mrenna,
B. Nord,
A. Para,
K. Pedro,
G. N. Perdue,
A. Scheinker,
P. Spentzouris,
J. St. John
, et al. (5 additional authors not shown)
Abstract:
We present a response to the 2018 Request for Information (RFI) from the NITRD, NCO, NSF regarding the "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan." Through this document, we provide a response to the question of whether and how the National Artificial Intelligence Research and Development Strategic Plan (NAIRDSP) should be updated from the perspect…
▽ More
We present a response to the 2018 Request for Information (RFI) from the NITRD, NCO, NSF regarding the "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan." Through this document, we provide a response to the question of whether and how the National Artificial Intelligence Research and Development Strategic Plan (NAIRDSP) should be updated from the perspective of Fermilab, America's premier national laboratory for High Energy Physics (HEP). We believe the NAIRDSP should be extended in light of the rapid pace of development and innovation in the field of Artificial Intelligence (AI) since 2016, and present our recommendations below. AI has profoundly impacted many areas of human life, promising to dramatically reshape society --- e.g., economy, education, science --- in the coming years. We are still early in this process. It is critical to invest now in this technology to ensure it is safe and deployed ethically. Science and society both have a strong need for accuracy, efficiency, transparency, and accountability in algorithms, making investments in scientific AI particularly valuable. Thus far the US has been a leader in AI technologies, and we believe as a national Laboratory it is crucial to help maintain and extend this leadership. Moreover, investments in AI will be important for maintaining US leadership in the physical sciences.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
Density estimation on small datasets
Authors:
Wei-Chia Chen,
Ammar Tareen,
Justin B. Kinney
Abstract:
How might a smooth probability distribution be estimated, with accurately quantified uncertainty, from a limited amount of sampled data? Here we describe a field-theoretic approach that addresses this problem remarkably well in one dimension, providing an exact nonparametric Bayesian posterior without relying on tunable parameters or large-data approximations. Strong non-Gaussian constraints, whic…
▽ More
How might a smooth probability distribution be estimated, with accurately quantified uncertainty, from a limited amount of sampled data? Here we describe a field-theoretic approach that addresses this problem remarkably well in one dimension, providing an exact nonparametric Bayesian posterior without relying on tunable parameters or large-data approximations. Strong non-Gaussian constraints, which require a non-perturbative treatment, are found to play a major role in reducing distribution uncertainty. A software implementation of this method is provided.
△ Less
Submitted 29 August, 2018; v1 submitted 5 April, 2018;
originally announced April 2018.
-
Modeling multi-particle complexes in stochastic chemical systems
Authors:
Muir J. Morrison,
Justin B. Kinney
Abstract:
Large complexes of classical particles play central roles in biology, in polymer physics, and in other disciplines. However, physics currently lacks mathematical methods for describing such complexes in terms of component particles, interaction energies, and assembly rules. Here we describe a Fock space structure that addresses this need, as well as diagrammatic methods that facilitate the use of…
▽ More
Large complexes of classical particles play central roles in biology, in polymer physics, and in other disciplines. However, physics currently lacks mathematical methods for describing such complexes in terms of component particles, interaction energies, and assembly rules. Here we describe a Fock space structure that addresses this need, as well as diagrammatic methods that facilitate the use of this formalism. These methods can dramatically simplify the equations governing both equilibrium and non-equilibrium stochastic chemical systems. A mathematical relationship between the set of all complexes and a list of rules for complex assembly is also identified.
△ Less
Submitted 23 March, 2016;
originally announced March 2016.
-
Learning quantitative sequence-function relationships from massively parallel experiments
Authors:
Gurinder S. Atwal,
Justin B. Kinney
Abstract:
A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships -- functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such…
▽ More
A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships -- functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes" -- directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.
△ Less
Submitted 22 September, 2015; v1 submitted 29 May, 2015;
originally announced June 2015.
-
Unification of field theory and maximum entropy methods for learning probability densities
Authors:
Justin B. Kinney
Abstract:
The need to estimate smooth probability distributions (a.k.a. probability densities) from finite sampled data is ubiquitous in science. Many approaches to this problem have been described, but none is yet regarded as providing a definitive solution. Maximum entropy estimation and Bayesian field theory are two such approaches. Both have origins in statistical physics, but the relationship between t…
▽ More
The need to estimate smooth probability distributions (a.k.a. probability densities) from finite sampled data is ubiquitous in science. Many approaches to this problem have been described, but none is yet regarded as providing a definitive solution. Maximum entropy estimation and Bayesian field theory are two such approaches. Both have origins in statistical physics, but the relationship between them has remained unclear. Here I unify these two methods by showing that every maximum entropy density estimate can be recovered in the infinite smoothness limit of an appropriate Bayesian field theory. I also show that Bayesian field theory estimation can be performed without imposing any boundary conditions on candidate densities, and that the infinite smoothness limit of these theories recovers the most common types of maximum entropy estimates. Bayesian field theory is thus seen to provide a natural test of the validity of the maximum entropy null hypothesis. Bayesian field theory also returns a lower entropy density estimate when the maximum entropy hypothesis is falsified. The computations necessary for this approach can be performed rapidly for one-dimensional data, and software for doing this is provided. Based on these results, I argue that Bayesian field theory is poised to provide a definitive solution to the density estimation problem in one dimension.
△ Less
Submitted 28 July, 2015; v1 submitted 19 November, 2014;
originally announced November 2014.
-
Rapid and deterministic estimation of probability densities using scale-free field theories
Authors:
Justin B. Kinney
Abstract:
The question of how best to estimate a continuous probability density from finite data is an intriguing open problem at the interface of statistics and physics. Previous work has argued that this problem can be addressed in a natural way using methods from statistical field theory. Here I describe new results that allow this field-theoretic approach to be rapidly and deterministically computed in…
▽ More
The question of how best to estimate a continuous probability density from finite data is an intriguing open problem at the interface of statistics and physics. Previous work has argued that this problem can be addressed in a natural way using methods from statistical field theory. Here I describe new results that allow this field-theoretic approach to be rapidly and deterministically computed in low dimensions, making it practical for use in day-to-day data analysis. Importantly, this approach does not impose a privileged length scale for smoothness of the inferred probability density, but rather learns a natural length scale from the data due to the tradeoff between goodness-of-fit and an Occam factor. Open source software implementing this method in one and two dimensions is provided.
△ Less
Submitted 18 April, 2014; v1 submitted 23 December, 2013;
originally announced December 2013.
-
Three-dimensional coherent X-ray diffraction imaging of a ceramic nanofoam: determination of structural deformation mechanisms
Authors:
A. Barty,
S. Marchesini,
H. N. Chapman,
C. Cui,
M. R. Howells,
D. A. Shapiro,
A. M. Minor,
J. C. H. Spence,
U. Weierstall,
J. Ilavsky,
A. Noy,
S. P. Hau-Riege,
A. B. Artyukhin,
T. Baumann,
T. Willey,
J. Stolken,
T. van Buuren,
J. H. Kinney
Abstract:
Ultra-low density polymers, metals, and ceramic nanofoams are valued for their high strength-to-weight ratio, high surface area and insulating properties ascribed to their structural geometry. We obtain the labrynthine internal structure of a tantalum oxide nanofoam by X-ray diffractive imaging. Finite element analysis from the structure reveals mechanical properties consistent with bulk samples…
▽ More
Ultra-low density polymers, metals, and ceramic nanofoams are valued for their high strength-to-weight ratio, high surface area and insulating properties ascribed to their structural geometry. We obtain the labrynthine internal structure of a tantalum oxide nanofoam by X-ray diffractive imaging. Finite element analysis from the structure reveals mechanical properties consistent with bulk samples and with a diffusion limited cluster aggregation model, while excess mass on the nodes discounts the dangling fragments hypothesis of percolation theory.
△ Less
Submitted 25 June, 2008; v1 submitted 30 August, 2007;
originally announced August 2007.
-
Progress in Three-Dimensional Coherent X-Ray Diffraction Imaging
Authors:
S. Marchesini,
H. N. Chapman,
A. Barty,
A. Noy,
S. P. Hau-Riege,
J. M. Kinney,
C. Cui,
M. R. Howells,
R. Rosen,
J. C. H. Spence,
U. Weierstall,
D. Shapiro,
T. Beetz,
C. Jacobsen,
E. Lima,
A. M. Minor,
H. He
Abstract:
The Fourier inversion of phased coherent diffraction patterns offers images without the resolution and depth-of-focus limitations of lens-based tomographic systems. We report on our recent experimental images inverted using recent developments in phase retrieval algorithms, and summarize efforts that led to these accomplishments. These include ab-initio reconstruction of a two-dimensional test p…
▽ More
The Fourier inversion of phased coherent diffraction patterns offers images without the resolution and depth-of-focus limitations of lens-based tomographic systems. We report on our recent experimental images inverted using recent developments in phase retrieval algorithms, and summarize efforts that led to these accomplishments. These include ab-initio reconstruction of a two-dimensional test pattern, infinite depth of focus image of a thick object, and its high-resolution (~10 nm resolution) three-dimensional image. Developments on the structural imaging of low density aerogel samples are discussed.
△ Less
Submitted 6 October, 2005; v1 submitted 4 October, 2005;
originally announced October 2005.