Search | arXiv e-print repository

Feynman on Artificial Intelligence and Machine Learning, with Updates

Abstract: I present my recollections of Richard Feynman's mid-1980s interest in artificial intelligence and neural networks, set in the technical context of the physics-related approaches to neural networks of that time. I attempt to evaluate his ideas in the light of the substantial advances in the field since then, and vice versa. There are aspects of Feynman's interests that I think have been largely ach… ▽ More I present my recollections of Richard Feynman's mid-1980s interest in artificial intelligence and neural networks, set in the technical context of the physics-related approaches to neural networks of that time. I attempt to evaluate his ideas in the light of the substantial advances in the field since then, and vice versa. There are aspects of Feynman's interests that I think have been largely achieved and others that remain excitingly open, notably in computational science, and potentially including the revival of symbolic methods therein. △ Less

Submitted 31 August, 2022; originally announced September 2022.

Comments: 17 pages. To appear in Feynman Lectures on Computation, 2nd edition, published by Taylor & Francis Group, edited by Anthony J. G. Hey

arXiv:2109.05053 [pdf, other]

Physics-based machine learning for modeling stochastic IP3-dependent calcium dynamics

Authors: Oliver K. Ernst, Tom Bartol, Terrence Sejnowski, Eric Mjolsness

Abstract: We present a machine learning method for model reduction which incorporates domain-specific physics through candidate functions. Our method estimates an effective probability distribution and differential equation model from stochastic simulations of a reaction network. The close connection between reduced and fine scale descriptions allows approximations derived from the master equation to be int… ▽ More We present a machine learning method for model reduction which incorporates domain-specific physics through candidate functions. Our method estimates an effective probability distribution and differential equation model from stochastic simulations of a reaction network. The close connection between reduced and fine scale descriptions allows approximations derived from the master equation to be introduced into the learning problem. This representation is shown to improve generalization and allows a large reduction in network size for a classic model of inositol trisphosphate (IP3) dependent calcium oscillations in non-excitable cells. △ Less

Submitted 10 September, 2021; originally announced September 2021.

Comments: 26 pages

MSC Class: 68T07 ACM Class: I.2.6; I.2.1; I.2.0

arXiv:2106.15716 [pdf, other]

Diff2Dist: Learning Spectrally Distinct Edge Functions, with Applications to Cell Morphology Analysis

Authors: Cory Braker Scott, Eric Mjolsness, Diane Oyen, Chie Kodera, David Bouchez, Magalie Uyttewaal

Abstract: We present a method for learning "spectrally descriptive" edge weights for graphs. We generalize a previously known distance measure on graphs (Graph Diffusion Distance), thereby allowing it to be tuned to minimize an arbitrary loss function. Because all steps involved in calculating this modified GDD are differentiable, we demonstrate that it is possible for a small neural network model to learn… ▽ More We present a method for learning "spectrally descriptive" edge weights for graphs. We generalize a previously known distance measure on graphs (Graph Diffusion Distance), thereby allowing it to be tuned to minimize an arbitrary loss function. Because all steps involved in calculating this modified GDD are differentiable, we demonstrate that it is possible for a small neural network model to learn edge weights which minimize loss. GDD alone does not effectively discriminate between graphs constructed from shoot apical meristem images of wild-type vs. mutant \emph{Arabidopsis thaliana} specimens. However, training edge weights and kernel parameters with contrastive loss produces a learned distance metric with large margins between these graph categories. We demonstrate this by showing improved performance of a simple k-nearest-neighbors classifier on the learned distance matrix. We also demonstrate a further application of this method to biological image analysis: once trained, we use our model to compute the distance between the biological graphs and a set of graphs output by a cell division simulator. This allows us to identify simulation parameter regimes which are similar to each class of graph in our original dataset. △ Less

Submitted 29 June, 2021; originally announced June 2021.

arXiv:2002.05842 [pdf, other]

Graph Prolongation Convolutional Networks: Explicitly Multiscale Machine Learning on Graphs with Applications to Modeling of Cytoskeleton

Authors: C. B. Scott, Eric Mjolsness

Abstract: We define a novel type of ensemble Graph Convolutional Network (GCN) model. Using optimized linear projection operators to map between spatial scales of graph, this ensemble model learns to aggregate information from each scale for its final prediction. We calculate these linear projection operators as the infima of an objective function relating the structure matrices used for each GCN. Equipped… ▽ More We define a novel type of ensemble Graph Convolutional Network (GCN) model. Using optimized linear projection operators to map between spatial scales of graph, this ensemble model learns to aggregate information from each scale for its final prediction. We calculate these linear projection operators as the infima of an objective function relating the structure matrices used for each GCN. Equipped with these projections, our model (a Graph Prolongation-Convolutional Network) outperforms other GCN ensemble models at predicting the potential energy of monomer subunits in a coarse-grained mechanochemical simulation of microtubule bending. We demonstrate these performance gains by measuring an estimate of the FLOPs spent to train each model, as well as wall-clock time. Because our model learns at multiple scales, it is possible to train at each scale according to a predetermined schedule of coarse vs. fine training. We examine several such schedules adapted from the Algebraic Multigrid (AMG) literature, and quantify the computational benefit of each. We also compare this model to another model which features an optimized coarsening of the input graph. Finally, we derive backpropagation rules for the input of our network model with respect to its output, and discuss how our method may be extended to very large graphs. △ Less

Submitted 6 April, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

Comments: Revised version submitted to IOP: Machine Learning, Science, and Technology

arXiv:1909.04203 [pdf, other]

Novel diffusion-derived distance measures for graphs

Authors: C. B. Scott, Eric Mjolsness

Abstract: We define a new family of similarity and distance measures on graphs, and explore their theoretical properties in comparison to conventional distance metrics. These measures are defined by the solution(s) to an optimization problem which attempts find a map minimizing the discrepancy between two graph Laplacian exponential matrices, under norm-preserving and sparsity constraints. Variants of the d… ▽ More We define a new family of similarity and distance measures on graphs, and explore their theoretical properties in comparison to conventional distance metrics. These measures are defined by the solution(s) to an optimization problem which attempts find a map minimizing the discrepancy between two graph Laplacian exponential matrices, under norm-preserving and sparsity constraints. Variants of the distance metric are introduced to consider such optimized maps under sparsity constraints as well as fixed time-scaling between the two Laplacians. The objective function of this optimization is multimodal and has discontinuous slope, and is hence difficult for univariate optimizers to solve. We demonstrate a novel procedure for efficiently calculating these optima for two of our distance measure variants. We present numerical experiments demonstrating that (a) upper bounds of our distance metrics can be used to distinguish between lineages of related graphs; (b) our procedure is faster at finding the required optima, by as much as a factor of 10^3; and (c) the upper bounds satisfy the triangle inequality exactly under some assumptions and approximately under others. We also derive an upper bound for the distance between two graph products, in terms of the distance between the two pairs of factors. Additionally, we present several possible applications, including the construction of infinite "graph limits" by means of Cauchy sequences of graphs related to one another by our distance measure. △ Less

Submitted 9 September, 2019; originally announced September 2019.

arXiv:1909.04118 [pdf, ps, other]

Structural Commutation Relations for Stochastic Labelled Graph Grammar Rule Operators

Authors: Eric Mjolsness

Abstract: We show how to calculate the operator algebra and the operator Lie algebra of a stochastic labelled-graph grammar. More specifically, we carry out a generic calculation of the product (and therefore the commutator) of time-evolution operators for any two labelled-graph grammar rewrite rules, where the operator corresponding to each rule is defined in terms of elementary two-state creation/annihila… ▽ More We show how to calculate the operator algebra and the operator Lie algebra of a stochastic labelled-graph grammar. More specifically, we carry out a generic calculation of the product (and therefore the commutator) of time-evolution operators for any two labelled-graph grammar rewrite rules, where the operator corresponding to each rule is defined in terms of elementary two-state creation/annihilation operators. The resulting graph grammar algebra has the following properties: (1) The product and commutator of two such operators is a sum of such operators with integer coefficients. Thus, the algebra and the Lie algebra occurs entirely at the structural (or graph-combinatorial) level of graph grammar rules, lifted from the level of elementary creation/annihilation operators (an improvement over [1], Propositions 1 and 2). (2) The product of the off-diagonal (state-changing) parts of two such graph rule operators is a sum of off-diagonal graph rule operators with non-negative integer coefficients. (3) These results apply whether the semantics of a graph grammar rule leaves behind hanging edges (Theorem 1), or removes hanging edges (Theorem 2). (4) The algebra is constructive in terms of elementary two-state creation/annihilation operators (Corollaries 3 and 8). These results are useful because dynamical transformations of labelled graphs comprise a general modeling framework, and algebraic commutators of time-evolution operators have many analytic uses including designing simulation algorithms and estimating their errors. △ Less

Submitted 9 September, 2019; originally announced September 2019.

MSC Class: 47B47 ACM Class: F.4.3

arXiv:1905.12122 [pdf, other]

Deep Learning Moment Closure Approximations using Dynamic Boltzmann Distributions

Authors: Oliver K. Ernst, Tom Bartol, Terrence Sejnowski, Eric Mjolsness

Abstract: The moments of spatial probabilistic systems are often given by an infinite hierarchy of coupled differential equations. Moment closure methods are used to approximate a subset of low order moments by terminating the hierarchy at some order and replacing higher order terms with functions of lower order ones. For a given system, it is not known beforehand which closure approximation is optimal, i.e… ▽ More The moments of spatial probabilistic systems are often given by an infinite hierarchy of coupled differential equations. Moment closure methods are used to approximate a subset of low order moments by terminating the hierarchy at some order and replacing higher order terms with functions of lower order ones. For a given system, it is not known beforehand which closure approximation is optimal, i.e. which higher order terms are relevant in the current regime. Further, the generalization of such approximations is typically poor, as higher order corrections may become relevant over long timescales. We have developed a method to learn moment closure approximations directly from data using dynamic Boltzmann distributions (DBDs). The dynamics of the distribution are parameterized using basis functions from finite element methods, such that the approach can be applied without knowing the true dynamics of the system under consideration. We use the hierarchical architecture of deep Boltzmann machines (DBMs) with multinomial latent variables to learn closure approximations for progressively higher order spatial correlations. The learning algorithm uses a centering transformation, allowing the dynamic DBM to be trained without the need for pre-training. We demonstrate the method for a Lotka-Volterra system on a lattice, a typical example in spatial chemical reaction networks. The approach can be applied broadly to learn deep generative models in applications where infinite systems of differential equations arise. △ Less

Submitted 28 May, 2019; originally announced May 2019.

arXiv:1806.05703 [pdf, other]

Multilevel Artificial Neural Network Training for Spatially Correlated Learning

Authors: C. B. Scott, Eric Mjolsness

Abstract: Multigrid modeling algorithms are a technique used to accelerate relaxation models running on a hierarchy of similar graphlike structures. We introduce and demonstrate a new method for training neural networks which uses multilevel methods. Using an objective function derived from a graph-distance metric, we perform orthogonally-constrained optimization to find optimal prolongation and restriction… ▽ More Multigrid modeling algorithms are a technique used to accelerate relaxation models running on a hierarchy of similar graphlike structures. We introduce and demonstrate a new method for training neural networks which uses multilevel methods. Using an objective function derived from a graph-distance metric, we perform orthogonally-constrained optimization to find optimal prolongation and restriction maps between graphs. We compare and contrast several methods for performing this numerical optimization, and additionally present some new theoretical results on upper bounds of this type of objective function. Once calculated, these optimal maps between graphs form the core of Multiscale Artificial Neural Network (MsANN) training, a new procedure we present which simultaneously trains a hierarchy of neural network models of varying spatial resolution. Parameter information is passed between members of this hierarchy according to standard coarsening and refinement schedules from the multiscale modelling literature. In our machine learning experiments, these models are able to learn faster than default training, achieving a comparable level of error in an order of magnitude fewer training examples. △ Less

Submitted 20 May, 2019; v1 submitted 14 June, 2018; originally announced June 2018.

Comments: Manuscript (24 pages) and Supplementary Material (4 pages). Updated January 2019 to reflect new formulation of MsANN structure and new training procedure

arXiv:1804.11044 [pdf, other]

doi 10.1007/s11538-019-00628-7

Prospects for Declarative Mathematical Modeling of Complex Biological Systems

Authors: Eric Mjolsness

Abstract: Declarative modeling uses symbolic expressions to represent models. With such expressions one can formalize high-level mathematical computations on models that would be difficult or impossible to perform directly on a lower-level simulation program, in a general-purpose programming language. Examples of such computations on models include model analysis, relatively general-purpose model-reduction… ▽ More Declarative modeling uses symbolic expressions to represent models. With such expressions one can formalize high-level mathematical computations on models that would be difficult or impossible to perform directly on a lower-level simulation program, in a general-purpose programming language. Examples of such computations on models include model analysis, relatively general-purpose model-reduction maps, and the initial phases of model implementation, all of which should preserve or approximate the mathematical semantics of a complex biological model. The potential advantages are particularly relevant in the case of developmental modeling, wherein complex spatial structures exhibit dynamics at molecular, cellular, and organogenic levels to relate genotype to multicellular phenotype. Multiscale modeling can benefit from both the expressive power of declarative modeling languages and the application of model reduction methods to link models across scale. Based on previous work, here we define declarative modeling of complex biological systems by defining the operator algebra semantics of an increasingly powerful series of declarative modeling languages including reaction-like dynamics of parameterized and extended objects; we define semantics-preserving implementation and semantics-approximating model reduction transformations; and we outline a "meta-hierarchy" for organizing declarative models and the mathematical methods that can fruitfully manipulate them. △ Less

Submitted 30 June, 2019; v1 submitted 30 April, 2018; originally announced April 2018.

Journal ref: Bull. Math. Biol. (2019)

arXiv:1212.4080 [pdf, ps, other]

doi 10.1063/1.4766353

A Hierarchical Exact Accelerated Stochastic Simulation Algorithm

Authors: David Orendorff, Eric Mjolsness

Abstract: A new algorithm, "HiER-leap", is derived which improves on the computational properties of the ER-leap algorithm for exact accelerated simulation of stochastic chemical kinetics. Unlike ER-leap, HiER-leap utilizes a hierarchical or divide-and-conquer organization of reaction channels into tightly coupled "blocks" and is thereby able to speed up systems with many reaction channels. Like ER-leap, Hi… ▽ More A new algorithm, "HiER-leap", is derived which improves on the computational properties of the ER-leap algorithm for exact accelerated simulation of stochastic chemical kinetics. Unlike ER-leap, HiER-leap utilizes a hierarchical or divide-and-conquer organization of reaction channels into tightly coupled "blocks" and is thereby able to speed up systems with many reaction channels. Like ER-leap, HiER-leap is based on the use of upper and lower bounds on the reaction propensities to define a rejection sampling algorithm with inexpensive early rejection and acceptance steps. But in HiER-leap, large portions of intra-block sampling may be done in parallel. An accept/reject step is used to synchronize across blocks. This method scales well when many reaction channels are present and has desirable asymptotic properties. The algorithm is exact, parallelizable and achieves a significant speedup over SSA and ER-leap on certain problems. This algorithm offers a potentially important step towards efficient in silico modeling of entire organisms. △ Less

Submitted 17 December, 2012; originally announced December 2012.

Comments: 22 pages, 3 figures

Journal ref: J. Chem. Phys. 137, 214104 (2012)

arXiv:1212.0582 [pdf]

Compositional Stochastic Modeling and Probabilistic Programming

Authors: Eric Mjolsness

Abstract: Probabilistic programming is related to a compositional approach to stochastic modeling by switching from discrete to continuous time dynamics. In continuous time, an operator-algebra semantics is available in which processes proceeding in parallel (and possibly interacting) have summed time-evolution operators. From this foundation, algorithms for simulation, inference and model reduction may be… ▽ More Probabilistic programming is related to a compositional approach to stochastic modeling by switching from discrete to continuous time dynamics. In continuous time, an operator-algebra semantics is available in which processes proceeding in parallel (and possibly interacting) have summed time-evolution operators. From this foundation, algorithms for simulation, inference and model reduction may be systematically derived. The useful consequences are potentially far-reaching in computational science, machine learning and beyond. Hybrid compositional stochastic modeling/probabilistic programming approaches may also be possible. △ Less

Submitted 3 December, 2012; originally announced December 2012.

Comments: Extended Abstract for the Neural Information Processing Systems (NIPS) Workshop on Probabilistic Programming, 2012

arXiv:1209.5231 [pdf]

doi 10.1088/1478-3975/10/3/035009

Time-Ordered Product Expansions for Computational Stochastic Systems Biology

Authors: Eric Mjolsness

Abstract: The time-ordered product framework of quantum field theory can also be used to understand salient phenomena in stochastic biochemical networks. It is used here to derive Gillespie's Stochastic Simulation Algorithm (SSA) for chemical reaction networks; consequently, the SSA can be interpreted in terms of Feynman diagrams. It is also used here to derive other, more general simulation and parameter-l… ▽ More The time-ordered product framework of quantum field theory can also be used to understand salient phenomena in stochastic biochemical networks. It is used here to derive Gillespie's Stochastic Simulation Algorithm (SSA) for chemical reaction networks; consequently, the SSA can be interpreted in terms of Feynman diagrams. It is also used here to derive other, more general simulation and parameter-learning algorithms including simulation algorithms for networks of stochastic reaction-like processes operating on parameterized objects, and also hybrid stochastic reaction/differential equation models in which systems of ordinary differential equations evolve the parameters of objects that can also undergo stochastic reactions. Thus, the time-ordered product expansion (TOPE) can be used systematically to derive simulation and parameter-fitting algorithms for stochastic systems. △ Less

Submitted 24 September, 2012; originally announced September 2012.

Comments: Submitted to Q-Bio 2012 conference, Santa Fe, New Mexico

arXiv:cs/0511073 [pdf, ps, other]

Stochastic Process Semantics for Dynamical Grammar Syntax: An Overview

Authors: Eric Mjolsness

Abstract: We define a class of probabilistic models in terms of an operator algebra of stochastic processes, and a representation for this class in terms of stochastic parameterized grammars. A syntactic specification of a grammar is mapped to semantics given in terms of a ring of operators, so that grammatical composition corresponds to operator addition or multiplication. The operators are generators fo… ▽ More We define a class of probabilistic models in terms of an operator algebra of stochastic processes, and a representation for this class in terms of stochastic parameterized grammars. A syntactic specification of a grammar is mapped to semantics given in terms of a ring of operators, so that grammatical composition corresponds to operator addition or multiplication. The operators are generators for the time-evolution of stochastic processes. Within this modeling framework one can express data clustering models, logic programs, ordinary and stochastic differential equations, graph grammars, and stochastic chemical reaction kinetics. This mathematical formulation connects these apparently distant fields to one another and to mathematical methods from quantum field theory and operator algebra. △ Less

Submitted 19 November, 2005; originally announced November 2005.

Comments: Accepted for: Ninth International Symposium on Artificial Intelligence and Mathematics, January 2006

Report number: UCI ICS TR# 05-14 ACM Class: D.3.1

Showing 1–13 of 13 results for author: Mjolsness, E