Search | arXiv e-print repository

Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative Methods

Authors: Emanuele Zappala, Daniel Levine, Sizhuang He, Syed Rizvi, Sacha Levy, David van Dijk

Abstract: Deep neural networks, despite their success in numerous applications, often function without established theoretical foundations. In this paper, we bridge this gap by drawing parallels between deep learning and classical numerical analysis. By framing neural networks as operators with fixed points representing desired solutions, we develop a theoretical framework grounded in iterative methods for… ▽ More Deep neural networks, despite their success in numerous applications, often function without established theoretical foundations. In this paper, we bridge this gap by drawing parallels between deep learning and classical numerical analysis. By framing neural networks as operators with fixed points representing desired solutions, we develop a theoretical framework grounded in iterative methods for operator equations. Under defined conditions, we present convergence proofs based on fixed point theory. We demonstrate that popular architectures, such as diffusion models and AlphaFold, inherently employ iterative operator learning. Empirical assessments highlight that performing iterations through network operators improves performance. We also introduce an iterative graph neural network, PIGN, that further demonstrates benefits of iterations. Our work aims to enhance the understanding of deep learning by merging insights from numerical analysis, potentially guiding the design of future networks with clearer theoretical underpinnings and improved performance. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 27 pages (13+14). 8 Figures and 5 tables. Comments are welcome!

arXiv:2301.13338 [pdf, other]

Continuous Spatiotemporal Transformers

Authors: Antonio H. de O. Fonseca, Emanuele Zappala, Josue Ortega Caro, David van Dijk

Abstract: Modeling spatiotemporal dynamical systems is a fundamental challenge in machine learning. Transformer models have been very successful in NLP and computer vision where they provide interpretable representations of data. However, a limitation of transformers in modeling continuous dynamical systems is that they are fundamentally discrete time and space models and thus have no guarantees regarding c… ▽ More Modeling spatiotemporal dynamical systems is a fundamental challenge in machine learning. Transformer models have been very successful in NLP and computer vision where they provide interpretable representations of data. However, a limitation of transformers in modeling continuous dynamical systems is that they are fundamentally discrete time and space models and thus have no guarantees regarding continuous sampling. To address this challenge, we present the Continuous Spatiotemporal Transformer (CST), a new transformer architecture that is designed for the modeling of continuous systems. This new framework guarantees a continuous and smooth output via optimization in Sobolev space. We benchmark CST against traditional transformers as well as other spatiotemporal dynamics modeling methods and achieve superior performance in a number of tasks on synthetic and real systems, including learning brain dynamics from calcium imaging data. △ Less

Submitted 28 July, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: Updated version, after reviews

arXiv:2210.09475 [pdf, other]

FIMP: Foundation Model-Informed Message Passing for Graph Neural Networks

Authors: Syed Asad Rizvi, Nazreen Pallikkavaliyaveetil, David Zhang, Zhuoyang Lyu, Nhi Nguyen, Haoran Lyu, Benjamin Christensen, Josue Ortega Caro, Antonio H. O. Fonseca, Emanuele Zappala, Maryam Bagherian, Christopher Averill, Chadi G. Abdallah, Amin Karbasi, Rex Ying, Maria Brbic, Rahul Madhav Dhodapkar, David van Dijk

Abstract: Foundation models have achieved remarkable success across many domains, relying on pretraining over vast amounts of data. Graph-structured data often lacks the same scale as unstructured data, making the development of graph foundation models challenging. In this work, we propose Foundation-Informed Message Passing (FIMP), a Graph Neural Network (GNN) message-passing framework that leverages pretr… ▽ More Foundation models have achieved remarkable success across many domains, relying on pretraining over vast amounts of data. Graph-structured data often lacks the same scale as unstructured data, making the development of graph foundation models challenging. In this work, we propose Foundation-Informed Message Passing (FIMP), a Graph Neural Network (GNN) message-passing framework that leverages pretrained non-textual foundation models in graph-based tasks. We show that the self-attention layers of foundation models can effectively be repurposed on graphs to perform cross-node attention-based message-passing. Our model is evaluated on a real-world image network dataset and two biological applications (single-cell RNA sequencing data and fMRI brain activity recordings) in both finetuned and zero-shot settings. FIMP outperforms strong baselines, demonstrating that it can effectively leverage state-of-the-art foundation models in graph tasks. △ Less

Submitted 1 July, 2024; v1 submitted 17 October, 2022; originally announced October 2022.

Comments: 16 pages (12 + 4 pages appendix). 5 figures and 4 tables

arXiv:2209.15190 [pdf, other]

Neural Integral Equations

Authors: Emanuele Zappala, Antonio Henrique de Oliveira Fonseca, Josue Ortega Caro, David van Dijk

Abstract: Integral equations (IEs) are equations that model spatiotemporal systems with non-local interactions. They have found important applications throughout theoretical and applied sciences, including in physics, chemistry, biology, and engineering. While efficient algorithms exist for solving given IEs, no method exists that can learn an IE and its associated dynamics from data alone. In this paper, w… ▽ More Integral equations (IEs) are equations that model spatiotemporal systems with non-local interactions. They have found important applications throughout theoretical and applied sciences, including in physics, chemistry, biology, and engineering. While efficient algorithms exist for solving given IEs, no method exists that can learn an IE and its associated dynamics from data alone. In this paper, we introduce Neural Integral Equations (NIE), a method that learns an unknown integral operator from data through an IE solver. We also introduce Attentional Neural Integral Equations (ANIE), where the integral is replaced by self-attention, which improves scalability, capacity, and results in an interpretable model. We demonstrate that (A)NIE outperforms other methods in both speed and accuracy on several benchmark tasks in ODE, PDE, and IE systems of synthetic and real-world data. △ Less

Submitted 18 May, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: 19 + 20 pages, 18 figures and 11 tables. Comments are welcome! v4: Article expanded to include theoretical guarantees for convergence of the integral equations solver and further experiments on brain data and interpretability of dynamics

arXiv:2206.14282 [pdf, other]

Neural Integro-Differential Equations

Authors: Emanuele Zappala, Antonio Henrique de Oliveira Fonseca, Andrew Henry Moberly, Michael James Higley, Chadi Abdallah, Jessica Cardin, David van Dijk

Abstract: Modeling continuous dynamical systems from discretely sampled observations is a fundamental problem in data science. Often, such dynamics are the result of non-local processes that present an integral over time. As such, these systems are modeled with Integro-Differential Equations (IDEs); generalizations of differential equations that comprise both an integral and a differential component. For ex… ▽ More Modeling continuous dynamical systems from discretely sampled observations is a fundamental problem in data science. Often, such dynamics are the result of non-local processes that present an integral over time. As such, these systems are modeled with Integro-Differential Equations (IDEs); generalizations of differential equations that comprise both an integral and a differential component. For example, brain dynamics are not accurately modeled by differential equations since their behavior is non-Markovian, i.e. dynamics are in part dictated by history. Here, we introduce the Neural IDE (NIDE), a novel deep learning framework based on the theory of IDEs where integral operators are learned using neural networks. We test NIDE on several toy and brain activity datasets and demonstrate that NIDE outperforms other models. These tasks include time extrapolation as well as predicting dynamics from unseen initial conditions, which we test on whole-cortex activity recordings in freely behaving mice. Further, we show that NIDE can decompose dynamics into their Markovian and non-Markovian constituents via the learned integral operator, which we test on fMRI brain activity recordings of people on ketamine. Finally, the integrand of the integral operator provides a latent space that gives insight into the underlying dynamics, which we demonstrate on wide-field brain imaging recordings. Altogether, NIDE is a novel approach that enables modeling of complex non-local dynamics with neural networks. △ Less

Submitted 29 November, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

Comments: 18 pages (including 8 pages Appendix), 8 figures and 6 tables. v4: Final version with reviewers' comments included, to appear in AAAI-23 (up to formatting differences)

arXiv:2010.05820 [pdf, other]

Permutation invariant networks to learn Wasserstein metrics

Authors: Arijit Sehanobish, Neal Ravindra, David van Dijk

Abstract: Understanding the space of probability measures on a metric space equipped with a Wasserstein distance is one of the fundamental questions in mathematical analysis. The Wasserstein metric has received a lot of attention in the machine learning community especially for its principled way of comparing distributions. In this work, we use a permutation invariant network to map samples from probability… ▽ More Understanding the space of probability measures on a metric space equipped with a Wasserstein distance is one of the fundamental questions in mathematical analysis. The Wasserstein metric has received a lot of attention in the machine learning community especially for its principled way of comparing distributions. In this work, we use a permutation invariant network to map samples from probability measures into a low-dimensional space such that the Euclidean distance between the encoded samples reflects the Wasserstein distance between probability measures. We show that our network can generalize to correctly compute distances between unseen densities. We also show that these networks can learn the first and the second moments of probability distributions. △ Less

Submitted 26 February, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: Fix typos, Accepted as a spotlight at Topological Data Analysis and Beyond Workshop at Neurips 2020. Added more experiments and results. Comments welcome

arXiv:2007.04777 [pdf, other]

Self-supervised edge features for improved Graph Neural Network training

Authors: Arijit Sehanobish, Neal G. Ravindra, David van Dijk

Abstract: Graph Neural Networks (GNN) have been extensively used to extract meaningful representations from graph structured data and to perform predictive tasks such as node classification and link prediction. In recent years, there has been a lot of work incorporating edge features along with node features for prediction tasks. One of the main difficulties in using edge features is that they are often han… ▽ More Graph Neural Networks (GNN) have been extensively used to extract meaningful representations from graph structured data and to perform predictive tasks such as node classification and link prediction. In recent years, there has been a lot of work incorporating edge features along with node features for prediction tasks. One of the main difficulties in using edge features is that they are often handcrafted, hard to get, specific to a particular domain, and may contain redundant information. In this work, we present a framework for creating new edge features, applicable to any domain, via a combination of self-supervised and unsupervised learning. In addition to this, we use Forman-Ricci curvature as an additional edge feature to encapsulate the local geometry of the graph. We then encode our edge features via a Set Transformer and combine them with node features extracted from popular GNN architectures for node classification in an end-to-end training scheme. We validate our work on three biological datasets comprising of single-cell RNA sequencing data of neurological disease, \textit{in vitro} SARS-CoV-2 infection, and human COVID-19 patients. We demonstrate that our method achieves better performance on node classification tasks over baseline Graph Attention Network (GAT) and Graph Convolutional Network (GCN) models. Furthermore, given the attention mechanism on edge and node features, we are able to interpret the cell types and genes that determine the course and severity of COVID-19, contributing to a growing list of potential disease biomarkers and therapeutic targets. △ Less

Submitted 23 June, 2020; originally announced July 2020.

Comments: Comments welcome. arXiv admin note: substantial text overlap with arXiv:2006.12971

ACM Class: I.2.4; J.3

arXiv:2006.13297 [pdf, other]

Learning Potentials of Quantum Systems using Deep Neural Networks

Authors: Arijit Sehanobish, Hector H. Corzo, Onur Kara, David van Dijk

Abstract: Attempts to apply Neural Networks (NN) to a wide range of research problems have been ubiquitous and plentiful in recent literature. Particularly, the use of deep NNs for understanding complex physical and chemical phenomena has opened a new niche of science where the analysis tools from Machine Learning (ML) are combined with the computational concepts of the natural sciences. Reports from this u… ▽ More Attempts to apply Neural Networks (NN) to a wide range of research problems have been ubiquitous and plentiful in recent literature. Particularly, the use of deep NNs for understanding complex physical and chemical phenomena has opened a new niche of science where the analysis tools from Machine Learning (ML) are combined with the computational concepts of the natural sciences. Reports from this unification of ML have presented evidence that NNs can learn classical Hamiltonian mechanics. This application of NNs to classical physics and its results motivate the following question: Can NNs be endowed with inductive biases through observation as means to provide insights into quantum phenomena? In this work, this question is addressed by investigating possible approximations for reconstructing the Hamiltonian of a quantum system in an unsupervised manner by using only limited information obtained from the system's probability distribution. △ Less

Submitted 14 January, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: New density to potential experiments, substantial rearrangement of the paper, Under Review, comments welcome

Report number: Vol-2964 (0074-2964-3) ACM Class: I.2.6; J.2

Journal ref: CEUR Workshop Proceedings 2021

arXiv:2006.12971 [pdf, other]

Gaining Insight into SARS-CoV-2 Infection and COVID-19 Severity Using Self-supervised Edge Features and Graph Neural Networks

Authors: Arijit Sehanobish, Neal G. Ravindra, David van Dijk

Abstract: A molecular and cellular understanding of how SARS-CoV-2 variably infects and causes severe COVID-19 remains a bottleneck in develo** interventions to end the pandemic. We sought to use deep learning to study the biology of SARS-CoV-2 infection and COVID-19 severity by identifying transcriptomic patterns and cell types associated with SARS-CoV-2 infection and COVID-19 severity. To do this, we de… ▽ More A molecular and cellular understanding of how SARS-CoV-2 variably infects and causes severe COVID-19 remains a bottleneck in develo** interventions to end the pandemic. We sought to use deep learning to study the biology of SARS-CoV-2 infection and COVID-19 severity by identifying transcriptomic patterns and cell types associated with SARS-CoV-2 infection and COVID-19 severity. To do this, we developed a new approach to generating self-supervised edge features. We propose a model that builds on Graph Attention Networks (GAT), creates edge features using self-supervised learning, and ingests these edge features via a Set Transformer. This model achieves significant improvements in predicting the disease state of individual cells, given their transcriptome. We apply our model to single-cell RNA sequencing datasets of SARS-CoV-2 infected lung organoids and bronchoalveolar lavage fluid samples of patients with COVID-19, achieving state-of-the-art performance on both datasets with our model. We then borrow from the field of explainable AI (XAI) to identify the features (genes) and cell types that discriminate bystander vs. infected cells across time and moderate vs. severe COVID-19 disease. To the best of our knowledge, this represents the first application of deep learning to identifying the molecular and cellular determinants of SARS-CoV-2 infection and COVID-19 severity using single-cell omics data. △ Less

Submitted 15 December, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: To appear at AAAI'21. Previous version (v2) accepted as a spotlight talk at ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+) and recipient of best paper award for Covid-19 applications. Significant improvements over v2

arXiv:2006.11578 [pdf, other]

Learning aligned embeddings for semi-supervised word translation using Maximum Mean Discrepancy

Authors: Antonio H. O. Fonseca, David van Dijk

Abstract: Word translation is an integral part of language translation. In machine translation, each language is considered a domain with its own word embedding. The alignment between word embeddings allows linking semantically equivalent words in multilingual contexts. Moreover, it offers a way to infer cross-lingual meaning for words without a direct translation. Current methods for word embedding alignme… ▽ More Word translation is an integral part of language translation. In machine translation, each language is considered a domain with its own word embedding. The alignment between word embeddings allows linking semantically equivalent words in multilingual contexts. Moreover, it offers a way to infer cross-lingual meaning for words without a direct translation. Current methods for word embedding alignment are either supervised, i.e. they require known word pairs, or learn a cross-domain transformation on fixed embeddings in an unsupervised way. Here we propose an end-to-end approach for word embedding alignment that does not require known word pairs. Our method, termed Word Alignment through MMD (WAM), learns embeddings that are aligned during sentence translation training using a localized Maximum Mean Discrepancy (MMD) constraint between the embeddings. We show that our method not only out-performs unsupervised methods, but also supervised methods that train on known word translations. △ Less

Submitted 20 June, 2020; originally announced June 2020.

arXiv:2002.07128 [pdf, other]

Disease State Prediction From Single-Cell Data Using Graph Attention Networks

Authors: Neal G. Ravindra, Arijit Sehanobish, Jenna L. Pappalardo, David A. Hafler, David van Dijk

Abstract: Single-cell RNA sequencing (scRNA-seq) has revolutionized biological discovery, providing an unbiased picture of cellular heterogeneity in tissues. While scRNA-seq has been used extensively to provide insight into both healthy systems and diseases, it has not been used for disease prediction or diagnostics. Graph Attention Networks (GAT) have proven to be versatile for a wide range of tasks by lea… ▽ More Single-cell RNA sequencing (scRNA-seq) has revolutionized biological discovery, providing an unbiased picture of cellular heterogeneity in tissues. While scRNA-seq has been used extensively to provide insight into both healthy systems and diseases, it has not been used for disease prediction or diagnostics. Graph Attention Networks (GAT) have proven to be versatile for a wide range of tasks by learning from both original features and graph structures. Here we present a graph attention model for predicting disease state from single-cell data on a large dataset of Multiple Sclerosis (MS) patients. MS is a disease of the central nervous system that can be difficult to diagnose. We train our model on single-cell data obtained from blood and cerebrospinal fluid (CSF) for a cohort of seven MS patients and six healthy adults (HA), resulting in 66,667 individual cells. We achieve 92 % accuracy in predicting MS, outperforming other state-of-the-art methods such as a graph convolutional network and a random forest classifier. Further, we use the learned graph attention model to get insight into the features (cell types and genes) that are important for this prediction. The graph attention model also allow us to infer a new feature space for the cells that emphasizes the differences between the two conditions. Finally we use the attention weights to learn a new low-dimensional embedding that can be visualized. To the best of our knowledge, this is the first effort to use graph attention, and deep learning in general, to predict disease state from single-cell data. We envision applying this method to single-cell data for other diseases. △ Less

Submitted 12 March, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

Comments: Incorporated suggestions from anonymous reviewers, Accepted at ACM CHIL 2020, comments welcome

ACM Class: J.3; I.2.6

arXiv:2002.04461 [pdf, other]

TrajectoryNet: A Dynamic Optimal Transport Network for Modeling Cellular Dynamics

Authors: Alexander Tong, Jessie Huang, Guy Wolf, David van Dijk, Smita Krishnaswamy

Abstract: It is increasingly common to encounter data from dynamic processes captured by static cross-sectional measurements over time, particularly in biomedical settings. Recent attempts to model individual trajectories from this data use optimal transport to create pairwise matchings between time points. However, these methods cannot model continuous dynamics and non-linear paths that entities can take i… ▽ More It is increasingly common to encounter data from dynamic processes captured by static cross-sectional measurements over time, particularly in biomedical settings. Recent attempts to model individual trajectories from this data use optimal transport to create pairwise matchings between time points. However, these methods cannot model continuous dynamics and non-linear paths that entities can take in these systems. To address this issue, we establish a link between continuous normalizing flows and dynamic optimal transport, that allows us to model the expected paths of points over time. Continuous normalizing flows are generally under constrained, as they are allowed to take an arbitrary path from the source to the target distribution. We present TrajectoryNet, which controls the continuous paths taken between distributions to produce dynamic optimal transport. We show how this is particularly applicable for studying cellular dynamics in data from single-cell RNA sequencing (scRNA-seq) technologies, and that TrajectoryNet improves upon recently proposed static optimal transport-based models that can be used for interpolating cellular distributions. △ Less

Submitted 26 July, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

Comments: Presented at ICML 2020

arXiv:1907.04463 [pdf, other]

doi 10.1109/BigData47090.2019.9006013

Coarse Graining of Data via Inhomogeneous Diffusion Condensation

Authors: Nathan Brugnone, Alex Gonopolskiy, Mark W. Moyle, Manik Kuchroo, David van Dijk, Kevin R. Moon, Daniel Colon-Ramos, Guy Wolf, Matthew J. Hirn, Smita Krishnaswamy

Abstract: Big data often has emergent structure that exists at multiple levels of abstraction, which are useful for characterizing complex interactions and dynamics of the observations. Here, we consider multiple levels of abstraction via a multiresolution geometry of data points at different granularities. To construct this geometry we define a time-inhomogeneous diffusion process that effectively condense… ▽ More Big data often has emergent structure that exists at multiple levels of abstraction, which are useful for characterizing complex interactions and dynamics of the observations. Here, we consider multiple levels of abstraction via a multiresolution geometry of data points at different granularities. To construct this geometry we define a time-inhomogeneous diffusion process that effectively condenses data points together to uncover nested grou**s at larger and larger granularities. This inhomogeneous process creates a deep cascade of intrinsic low pass filters on the data affinity graph that are applied in sequence to gradually eliminate local variability while adjusting the learned data geometry to increasingly coarser resolutions. We provide visualizations to exhibit our method as a continuously-hierarchical clustering with directions of eliminated variation highlighted at each step. The utility of our algorithm is demonstrated via neuronal data condensation, where the constructed multiresolution data geometry uncovers the organization, grou**, and connectivity between neurons. △ Less

Submitted 9 March, 2020; v1 submitted 9 July, 2019; originally announced July 2019.

Comments: 14 pages, 7 figures

ACM Class: I.5.3

Journal ref: Proceedings of the 2019 IEEE International Conference on Big Data, pages 2624-2633, 2019

arXiv:1902.00033 [pdf, other]

Compressed Diffusion

Authors: Scott Gigante, Jay S. Stanley III, Ngan Vu, David van Dijk, Kevin Moon, Guy Wolf, Smita Krishnaswamy

Abstract: Diffusion maps are a commonly used kernel-based method for manifold learning, which can reveal intrinsic structures in data and embed them in low dimensions. However, as with most kernel methods, its implementation requires a heavy computational load, reaching up to cubic complexity in the number of data points. This limits its usability in modern data analysis. Here, we present a new approach to… ▽ More Diffusion maps are a commonly used kernel-based method for manifold learning, which can reveal intrinsic structures in data and embed them in low dimensions. However, as with most kernel methods, its implementation requires a heavy computational load, reaching up to cubic complexity in the number of data points. This limits its usability in modern data analysis. Here, we present a new approach to computing the diffusion geometry, and related embeddings, from a compressed diffusion process between data regions rather than data points. Our construction is based on an adaptation of the previously proposed measure-based Gaussian correlation (MGC) kernel that robustly captures the local geometry around data points. We use this MGC kernel to efficiently compress diffusion relations from pointwise to data region resolution. Finally, a spectral embedding of the data regions provides coordinates that are used to interpolate and approximate the pointwise diffusion map embedding of data. We analyze theoretical connections between our construction and the original diffusion geometry of diffusion maps, and demonstrate the utility of our method in analyzing big datasets, where it outperforms competing approaches. △ Less

Submitted 10 June, 2019; v1 submitted 31 January, 2019; originally announced February 2019.

Comments: 4 pages double column, published in SampTA 2019

Journal ref: Sampling Theory & Applications (2019)

arXiv:1901.09078 [pdf, other]

Finding Archetypal Spaces Using Neural Networks

Authors: David van Dijk, Daniel Burkhardt, Matthew Amodio, Alex Tong, Guy Wolf, Smita Krishnaswamy

Abstract: Archetypal analysis is a data decomposition method that describes each observation in a dataset as a convex combination of "pure types" or archetypes. These archetypes represent extrema of a data space in which there is a trade-off between features, such as in biology where different combinations of traits provide optimal fitness for different environments. Existing methods for archetypal analysis… ▽ More Archetypal analysis is a data decomposition method that describes each observation in a dataset as a convex combination of "pure types" or archetypes. These archetypes represent extrema of a data space in which there is a trade-off between features, such as in biology where different combinations of traits provide optimal fitness for different environments. Existing methods for archetypal analysis work well when a linear relationship exists between the feature space and the archetypal space. However, such methods are not applicable to systems where the feature space is generated non-linearly from the combination of archetypes, such as in biological systems or image transformations. Here, we propose a reformulation of the problem such that the goal is to learn a non-linear transformation of the data into a latent archetypal space. To solve this problem, we introduce Archetypal Analysis network (AAnet), which is a deep neural network framework for learning and generating from a latent archetypal representation of data. We demonstrate state-of-the-art recovery of ground-truth archetypes in non-linear data domains, show AAnet can generate from data geometry rather than from data density, and use AAnet to identify biologically meaningful archetypes in single-cell gene expression data. △ Less

Submitted 13 November, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

Comments: 9 pages, 10 figures, to be presented at IEEE Big Data 2019

arXiv:1810.00424 [pdf, other]

Interpretable Neuron Structuring with Graph Spectral Regularization

Authors: Alexander Tong, David van Dijk, Jay S. Stanley III, Matthew Amodio, Kristina Yim, Rebecca Muhle, James Noonan, Guy Wolf, Smita Krishnaswamy

Abstract: While neural networks are powerful approximators used to classify or embed data into lower dimensional spaces, they are often regarded as black boxes with uninterpretable features. Here we propose Graph Spectral Regularization for making hidden layers more interpretable without significantly impacting performance on the primary task. Taking inspiration from spatial organization and localization of… ▽ More While neural networks are powerful approximators used to classify or embed data into lower dimensional spaces, they are often regarded as black boxes with uninterpretable features. Here we propose Graph Spectral Regularization for making hidden layers more interpretable without significantly impacting performance on the primary task. Taking inspiration from spatial organization and localization of neuron activations in biological networks, we use a graph Laplacian penalty to structure the activations within a layer. This penalty encourages activations to be smooth either on a predetermined graph or on a feature-space graph learned from the data via co-activations of a hidden layer of the neural network. We show numerous uses for this additional structure including cluster indication and visualization in biological and image data sets. △ Less

Submitted 14 February, 2020; v1 submitted 30 September, 2018; originally announced October 2018.

Comments: 12 pages, 6 figures, presented at IDA 2020

arXiv:1805.12198 [pdf, other]

Out-of-Sample Extrapolation with Neuron Editing

Authors: Matthew Amodio, David van Dijk, Ruth Montgomery, Guy Wolf, Smita Krishnaswamy

Abstract: While neural networks can be trained to map from one specific dataset to another, they usually do not learn a generalized transformation that can extrapolate accurately outside the space of training. For instance, a generative adversarial network (GAN) exclusively trained to transform images of black-haired men to blond-haired men might not have the same effect on images of black-haired women. Thi… ▽ More While neural networks can be trained to map from one specific dataset to another, they usually do not learn a generalized transformation that can extrapolate accurately outside the space of training. For instance, a generative adversarial network (GAN) exclusively trained to transform images of black-haired men to blond-haired men might not have the same effect on images of black-haired women. This is because neural networks are good at generation within the manifold of the data that they are trained on. However, generating new samples outside of the manifold or extrapolating "out-of-sample" is a much harder problem that has been less well studied. To address this, we introduce a technique called neuron editing that learns how neurons encode an edit for a particular transformation in a latent space. We use an autoencoder to decompose the variation within the dataset into activations of different neurons and generate transformed data by defining an editing transformation on those neurons. By performing the transformation in a latent trained space, we encode fairly complex and non-linear transformations to the data with much simpler distribution shifts to the neuron's activations. We motivate our technique on an image domain and then move to our two main biological applications: removal of batch artifacts representing unwanted noise and modeling the effect of drug treatments to predict synergy between drugs. △ Less

Submitted 23 January, 2019; v1 submitted 30 May, 2018; originally announced May 2018.

arXiv:1802.03497 [pdf, other]

Modeling Global Dynamics from Local Snapshots with Deep Generative Neural Networks

Authors: Scott Gigante, David van Dijk, Kevin Moon, Alexander Strzalkowski, Guy Wolf, Smita Krishnaswamy

Abstract: Complex high dimensional stochastic dynamic systems arise in many applications in the natural sciences and especially biology. However, while these systems are difficult to describe analytically, "snapshot" measurements that sample the output of the system are often available. In order to model the dynamics of such systems given snapshot data, or local transitions, we present a deep neural network… ▽ More Complex high dimensional stochastic dynamic systems arise in many applications in the natural sciences and especially biology. However, while these systems are difficult to describe analytically, "snapshot" measurements that sample the output of the system are often available. In order to model the dynamics of such systems given snapshot data, or local transitions, we present a deep neural network framework we call Dynamics Modeling Network or DyMoN. DyMoN is a neural network framework trained as a deep generative Markov model whose next state is a probability distribution based on the current state. DyMoN is trained using samples of current and next-state pairs, and thus does not require longitudinal measurements. We show the advantage of DyMoN over shallow models such as Kalman filters and hidden Markov models, and other deep models such as recurrent neural networks in its ability to embody the dynamics (which can be studied via perturbation of the neural network) and generate longitudinal hypothetical trajectories. We perform three case studies in which we apply DyMoN to different types of biological systems and extract features of the dynamics in each case by examining the learned model. △ Less

Submitted 10 June, 2019; v1 submitted 9 February, 2018; originally announced February 2018.

Comments: Published in SampTA 2019

Journal ref: Sampling Theory & Applications (2019)

arXiv:1601.00891 [pdf, other]

doi 10.1186/s13059-016-1037-6

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Authors: Yuxiang Jiang, Tal Ronnen Oron, Wyatt T Clark, Asma R Bankapur, Daniel D'Andrea, Rosalba Lepore, Christopher S Funk, Indika Kahanda, Karin M Verspoor, Asa Ben-Hur, Emily Koo, Duncan Penfold-Brown, Dennis Shasha, Noah Youngs, Richard Bonneau, Alexandra Lin, Sayed ME Sahraeian, Pier Luigi Martelli, Giuseppe Profiti, Rita Casadio, Renzhi Cao, Zhaolong Zhong, Jianlin Cheng, Adrian Altenhoff, Nives Skunca , et al. (122 additional authors not shown)

Abstract: Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our a… ▽ More Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our ability to understand the molecular underpinnings of life is the assignment of function to biological macromolecules, especially proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, accurately assessing methods for protein function prediction and tracking progress in the field remain challenging. Methodology: We have conducted the second Critical Assessment of Functional Annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. One hundred twenty-six methods from 56 research groups were evaluated for their ability to predict biological functions using the Gene Ontology and gene-disease associations using the Human Phenotype Ontology on a set of 3,681 proteins from 18 species. CAFA2 featured significantly expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis also compared the best methods participating in CAFA1 to those of CAFA2. Conclusions: The top performing methods in CAFA2 outperformed the best methods from CAFA1, demonstrating that computational function prediction is improving. This increased accuracy can be attributed to the combined effect of the growing number of experimental annotations and improved methods for function prediction. △ Less

Submitted 2 January, 2016; originally announced January 2016.

Comments: Submitted to Genome Biology

Showing 1–19 of 19 results for author: van Dijk, D