Search | arXiv e-print repository

Continuum Attention for Neural Operators

Authors: Edoardo Calvello, Nikola B. Kovachki, Matthew E. Levine, Andrew M. Stuart

Abstract: Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time-series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they a… ▽ More Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time-series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they are universal; it is thus natural to ask whether the attention mechanism can be used in the design of neural operators. Motivated by this, we study transformers in the function space setting. We formulate attention as a map between infinite dimensional function spaces and prove that the attention mechanism as implemented in practice is a Monte Carlo or finite difference approximation of this operator. The function space formulation allows for the design of transformer neural operators, a class of architectures designed to learn map**s between function spaces, for which we prove a universal approximation result. The prohibitive cost of applying the attention operator to functions defined on multi-dimensional domains leads to the need for more efficient attention-based architectures. For this reason we also introduce a function space generalization of the patching strategy from computer vision, and introduce a class of associated neural operators. Numerical results, on an array of operator learning problems, demonstrate the promise of our approaches to function space formulations of attention and their use in neural operators. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2402.17233 [pdf, other]

Hybrid$^2$ Neural ODE Causal Modeling and an Application to Glycemic Response

Authors: Bob Junyi Zou, Matthew E. Levine, Dessi P. Zaharieva, Ramesh Johari, Emily B. Fox

Abstract: Hybrid models composing mechanistic ODE-based dynamics with flexible and expressive neural network components have grown rapidly in popularity, especially in scientific domains where such ODE-based modeling offers important interpretability and validated causal grounding (e.g., for counterfactual reasoning). The incorporation of mechanistic models also provides inductive bias in standard blackbox… ▽ More Hybrid models composing mechanistic ODE-based dynamics with flexible and expressive neural network components have grown rapidly in popularity, especially in scientific domains where such ODE-based modeling offers important interpretability and validated causal grounding (e.g., for counterfactual reasoning). The incorporation of mechanistic models also provides inductive bias in standard blackbox modeling approaches, critical when learning from small datasets or partially observed, complex systems. Unfortunately, as the hybrid models become more flexible, the causal grounding provided by the mechanistic model can quickly be lost. We address this problem by leveraging another common source of domain knowledge: \emph{ranking} of treatment effects for a set of interventions, even if the precise treatment effect is unknown. We encode this information in a \emph{causal loss} that we combine with the standard predictive loss to arrive at a \emph{hybrid loss} that biases our learning towards causally valid hybrid models. We demonstrate our ability to achieve a win-win, state-of-the-art predictive performance \emph{and} causal validity, in the challenging task of modeling glucose dynamics post-exercise in individuals with type 1 diabetes. △ Less

Submitted 11 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2401.00035 [pdf, other]

Learning About Structural Errors in Models of Complex Dynamical Systems

Authors: **-Long Wu, Matthew E. Levine, Tapio Schneider, Andrew Stuart

Abstract: Complex dynamical systems are notoriously difficult to model because some degrees of freedom (e.g., small scales) may be computationally unresolvable or are incompletely understood, yet they are dynamically important. For example, the small scales of cloud dynamics and droplet formation are crucial for controlling climate, yet are unresolvable in global climate models. Semi-empirical closure model… ▽ More Complex dynamical systems are notoriously difficult to model because some degrees of freedom (e.g., small scales) may be computationally unresolvable or are incompletely understood, yet they are dynamically important. For example, the small scales of cloud dynamics and droplet formation are crucial for controlling climate, yet are unresolvable in global climate models. Semi-empirical closure models for the effects of unresolved degrees of freedom often exist and encode important domain-specific knowledge. Building on such closure models and correcting them through learning the structural errors can be an effective way of fusing data with domain knowledge. Here we describe a general approach, principles, and algorithms for learning about structural errors. Key to our approach is to include structural error models inside the models of complex systems, for example, in closure models for unresolved scales. The structural errors then map, usually nonlinearly, to observable data. As a result, however, mismatches between model output and data are only indirectly informative about structural errors, due to a lack of labeled pairs of inputs and outputs of structural error models. Additionally, derivatives of the model may not exist or be readily available. We discuss how structural error models can be learned from indirect data with derivative-free Kalman inversion algorithms and variants, how sparsity constraints enforce a "do no harm" principle, and various ways of modeling structural errors. We also discuss the merits of using non-local and/or stochastic error models. In addition, we demonstrate how data assimilation techniques can assist the learning about structural errors in non-ergodic systems. The concepts and algorithms are illustrated in two numerical examples based on the Lorenz-96 system and a human glucose-insulin model. △ Less

Submitted 28 May, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

Comments: 40 pages, 13 figures

MSC Class: 68T01

arXiv:2304.14300 [pdf, other]

Learning Absorption Rates in Glucose-Insulin Dynamics from Meal Covariates

Authors: Ke Alexander Wang, Matthew E. Levine, Jiaxin Shi, Emily B. Fox

Abstract: Traditional models of glucose-insulin dynamics rely on heuristic parameterizations chosen to fit observations within a laboratory setting. However, these models cannot describe glucose dynamics in daily life. One source of failure is in their descriptions of glucose absorption rates after meal events. A meal's macronutritional content has nuanced effects on the absorption profile, which is difficu… ▽ More Traditional models of glucose-insulin dynamics rely on heuristic parameterizations chosen to fit observations within a laboratory setting. However, these models cannot describe glucose dynamics in daily life. One source of failure is in their descriptions of glucose absorption rates after meal events. A meal's macronutritional content has nuanced effects on the absorption profile, which is difficult to model mechanistically. In this paper, we propose to learn the effects of macronutrition content from glucose-insulin data and meal covariates. Given macronutrition information and meal times, we use a neural network to predict an individual's glucose absorption rate. We use this neural rate function as the control function in a differential equation of glucose dynamics, enabling end-to-end training. On simulated data, our approach is able to closely approximate true absorption rates, resulting in better forecast than heuristic parameterizations, despite only observing glucose, insulin, and macronutritional information. Our work readily generalizes to meal events with higher-dimensional covariates, such as images, setting the stage for glucose dynamics models that are personalized to each individual's daily life. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: Work presented at NeurIPS 2022 Workshop on Learning from Time Series for Health (TS4H). arXiv admin note: substantial text overlap with arXiv:2302.11939

arXiv:2107.06658 [pdf, other]

A Framework for Machine Learning of Model Error in Dynamical Systems

Authors: Matthew E. Levine, Andrew M. Stuart

Abstract: The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge. Our formulation i… ▽ More The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless. First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T, the time-interval over which training data is specified. Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features. Numerical results are presented (Lorenz '63, Lorenz '96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less data-hungry and more parametrically efficient. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models. △ Less

Submitted 17 August, 2022; v1 submitted 14 July, 2021; originally announced July 2021.

arXiv:2003.06541 [pdf, ps, other]

Using Data Assimilation of Mechanistic Models to Estimate Glucose and Insulin Metabolism

Authors: Jami J. Mulgrave, Matthew E. Levine, David J. Albers, Joon Ha, Arthur Sherman, George Hripcsak

Abstract: Motivation: There is a growing need to integrate mechanistic models of biological processes with computational methods in healthcare in order to improve prediction. We apply data assimilation in the context of Type 2 diabetes to understand parameters associated with the disease. Results: The data assimilation method captures how well patients improve glucose tolerance after their surgery. Data a… ▽ More Motivation: There is a growing need to integrate mechanistic models of biological processes with computational methods in healthcare in order to improve prediction. We apply data assimilation in the context of Type 2 diabetes to understand parameters associated with the disease. Results: The data assimilation method captures how well patients improve glucose tolerance after their surgery. Data assimilation has the potential to improve phenoty** in Type 2 diabetes. △ Less

Submitted 13 March, 2020; originally announced March 2020.

arXiv:1911.09856 [pdf, other]

doi 10.1016/j.jbi.2020.103639

Enabling Personalized Decision Support with Patient-Generated Data and Attributable Components

Authors: Elliot G Mitchell, Esteban G Tabak, Matthew E Levine, Lena Mamykina, David J Albers

Abstract: Decision-making related to health is complex. Machine learning (ML) and patient generated data can identify patterns and insights at the individual level, where human cognition falls short, but not all ML-generated information is of equal utility for making health-related decisions. We develop and apply attributable components analysis (ACA), a method inspired by optimal transport theory, to type… ▽ More Decision-making related to health is complex. Machine learning (ML) and patient generated data can identify patterns and insights at the individual level, where human cognition falls short, but not all ML-generated information is of equal utility for making health-related decisions. We develop and apply attributable components analysis (ACA), a method inspired by optimal transport theory, to type 2 diabetes self-monitoring data to identify patterns of association between nutrition and blood glucose control. In comparison with linear regression, we found that ACA offers a number of characteristics that make it promising for use in decision support applications. For example, ACA was able to identify non-linear relationships, was more robust to outliers, and offered broader and more expressive uncertainty estimates. In addition, our results highlight a tradeoff between model accuracy and interpretability, and we discuss implications for ML-driven decision support systems. △ Less

Submitted 13 January, 2021; v1 submitted 21 November, 2019; originally announced November 2019.

Journal ref: Journal of Biomedical Informatics. 113 (2021) 103639

arXiv:1910.14193 [pdf, other]

A Simple Modeling Framework For Prediction In The Human Glucose-Insulin System

Authors: M. Sirlanci, M. E. Levine, C. C. Low Wang, D. J. Albers, A. M. Stuart

Abstract: In this paper, we build a new, simple, and interpretable mathematical model to estimate and forecast physiology related to the human glucose-insulin system, constrained by available data. By constructing a simple yet flexible model class with interpretable parameters, this general model can be specialized to work in different settings, such as type 2 diabetes mellitus (T2DM) and intensive care uni… ▽ More In this paper, we build a new, simple, and interpretable mathematical model to estimate and forecast physiology related to the human glucose-insulin system, constrained by available data. By constructing a simple yet flexible model class with interpretable parameters, this general model can be specialized to work in different settings, such as type 2 diabetes mellitus (T2DM) and intensive care unit (ICU); different choices of appropriate model functions describing uptake of nutrition and removal of glucose differentiate between the models. In both cases, the available data is sparse and collected in clinical settings, major factors that have constrained our model choice to the simple form adopted. The model has the form of a linear stochastic differential equation (SDE) to describe the evolution of the BG level. The model includes a term quantifying glucose removal from the bloodstream through the regulation system of the human body and two other terms representing the effect of nutrition and externally delivered insulin. The stochastic fluctuations encapsulate model error necessitated by the simple model form and enable flexible incorporation of data. The model parameters must be learned in a patient-specific fashion, leading to personalized models. We present experimental results on patient-specific parameter estimation and future BG level forecasting in T2DM and ICU settings. The resulting model leads to the prediction of the BG level as an expected value accompanied by a band around this value which accounts for uncertainties in the prediction. Such predictions, then, have the potential for use as part of control systems that are robust to model imperfections and noisy data. Finally, the model's predictive capability is compared with two different models built explicitly for T2DM and ICU contexts. △ Less

Submitted 20 September, 2022; v1 submitted 30 October, 2019; originally announced October 2019.

Comments: 41 pages, 8 figures, 4 tables

MSC Class: 92

arXiv:1901.05668 [pdf, other]

doi 10.1088/1361-6420/ab1c09

Ensemble Kalman Methods With Constraints

Authors: David J. Albers, Paul-Adrien Blancquart, Matthew E. Levine, Elnaz Esmaeilzadeh Seylabi, Andrew Stuart

Abstract: Ensemble Kalman methods constitute an increasingly important tool in both state and parameter estimation problems. Their popularity stems from the derivative-free nature of the methodology which may be readily applied when computer code is available for the underlying state-space dynamics (for state estimation) or for the parameter-to-observable map (for parameter estimation). There are many appli… ▽ More Ensemble Kalman methods constitute an increasingly important tool in both state and parameter estimation problems. Their popularity stems from the derivative-free nature of the methodology which may be readily applied when computer code is available for the underlying state-space dynamics (for state estimation) or for the parameter-to-observable map (for parameter estimation). There are many applications in which it is desirable to enforce prior information in the form of equality or inequality constraints on the state or parameter. This paper establishes a general framework for doing so, describing a widely applicable methodology, a theory which justifies the methodology, and a set of numerical experiments exemplifying it. △ Less

Submitted 6 September, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

arXiv:1802.08761 [pdf]

Behavioral-clinical phenoty** with type 2 diabetes self-monitoring data

Authors: Matthew E. Levine, David J. Albers, Marissa Burgermaster, Patricia G. Davidson, Arlene M. Smaldone, Lena Mamykina

Abstract: Objective: To evaluate unsupervised clustering methods for identifying individual-level behavioral-clinical phenotypes that relate personal biomarkers and behavioral traits in type 2 diabetes (T2DM) self-monitoring data. Materials and Methods: We used hierarchical clustering (HC) to identify groups of meals with similar nutrition and glycemic impact for 6 individuals with T2DM who collected self-m… ▽ More Objective: To evaluate unsupervised clustering methods for identifying individual-level behavioral-clinical phenotypes that relate personal biomarkers and behavioral traits in type 2 diabetes (T2DM) self-monitoring data. Materials and Methods: We used hierarchical clustering (HC) to identify groups of meals with similar nutrition and glycemic impact for 6 individuals with T2DM who collected self-monitoring data. We evaluated clusters on: 1) correspondence to gold standards generated by certified diabetes educators (CDEs) for 3 participants; 2) face validity, rated by CDEs, and 3) impact on CDEs' ability to identify patterns for another 3 participants. Results: Gold standard (GS) included 9 patterns across 3 participants. Of these, all 9 were re-discovered using HC: 4 GS patterns were consistent with patterns identified by HC (over 50% of meals in a cluster followed the pattern); another 5 were included as sub-groups in broader clusers. 50% (9/18) of clusters were rated over 3 on 5-point Likert scale for validity, significance, and being actionable. After reviewing clusters, CDEs identified patterns that were more consistent with data (70% reduction in contradictions between patterns and participants' records). Discussion: Hierarchical clustering of blood glucose and macronutrient consumption appears suitable for discovering behavioral-clinical phenotypes in T2DM. Most clusters corresponded to gold standard and were rated positively by CDEs for face validity. Cluster visualizations helped CDEs identify more robust patterns in nutrition and glycemic impact, creating new possibilities for visual analytic solutions. Conclusion: Machine learning methods can use diabetes self-monitoring data to create personalized behavioral-clinical phenotypes, which may prove useful for delivering personalized medicine. △ Less

Submitted 23 February, 2018; originally announced February 2018.

arXiv:1801.08929 [pdf]

Methodological variations in lagged regression for detecting physiologic drug effects in EHR data

Authors: Matthew E. Levine, David J. Albers, George Hripcsak

Abstract: We studied how lagged linear regression can be used to detect the physiologic effects of drugs from data in the electronic health record (EHR). We systematically examined the effect of methodological variations ((i) time series construction, (ii) temporal parameterization, (iii) intra-subject normalization, (iv) differencing (lagged rates of change achieved by taking differences between consecutiv… ▽ More We studied how lagged linear regression can be used to detect the physiologic effects of drugs from data in the electronic health record (EHR). We systematically examined the effect of methodological variations ((i) time series construction, (ii) temporal parameterization, (iii) intra-subject normalization, (iv) differencing (lagged rates of change achieved by taking differences between consecutive measurements), (v) explanatory variables, and (vi) regression models) on performance of lagged linear methods in this context. We generated two gold standards (one knowledge-base derived, one expert-curated) for expected pairwise relationships between 7 drugs and 4 labs, and evaluated how the 64 unique combinations of methodological perturbations reproduce gold standards. Our 28 cohorts included patients in Columbia University Medical Center/NewYork-Presbyterian Hospital clinical database. The most accurate methods achieved AUROC of 0.794 for knowledge-base derived gold standard (95%CI [0.741, 0.847]) and 0.705 for expert-curated gold standard (95% CI [0.629, 0.781]). We observed a 0.633 mean AUROC (95%CI [0.610, 0.657], expert-curated gold standard) across all methods that re-parameterize time according to sequence and use either a joint autoregressive model with differencing or an independent lag model without differencing. The complement of this set of methods achieved a mean AUROC close to 0.5, indicating the importance of these choices. We conclude that time- series analysis of EHR data will likely rely on some of the beneficial pre-processing and modeling methodologies identified, and will certainly benefit from continued careful analysis of methodological perturbations. This study found that methodological variations, such as pre-processing and representations, significantly affect results, exposing the importance of evaluating these components when comparing machine-learning methods. △ Less

Submitted 26 January, 2018; originally announced January 2018.

arXiv:1709.00163 [pdf, other]

Offline and online data assimilation for real-time blood glucose forecasting in type 2 diabetes

Authors: Matthew E Levine, George Hripcsak, Lena Mamykina, Andrew Stuart, David J Albers

Abstract: We evaluate the benefits of combining different offline and online data assimilation methodologies to improve personalized blood glucose prediction with type 2 diabetes self-monitoring data. We collect self-monitoring data (nutritional reports and pre- and post-prandial glucose measurements) from 4 individuals with diabetes and 2 individuals without diabetes. We write online to refer to methods th… ▽ More We evaluate the benefits of combining different offline and online data assimilation methodologies to improve personalized blood glucose prediction with type 2 diabetes self-monitoring data. We collect self-monitoring data (nutritional reports and pre- and post-prandial glucose measurements) from 4 individuals with diabetes and 2 individuals without diabetes. We write online to refer to methods that update state and parameters sequentially as nutrition and glucose data are received, and offline to refer to methods that estimate parameters over a fixed data set, distributed over a time window containing multiple nutrition and glucose measurements. We fit a model of ultradian glucose dynamics to the first half of each data set using offline (MCMC and nonlinear optimization) and online (unscented Kalman filter and an unfiltered model---a dynamical model driven by nutrition data that does not update states) data assimilation methods. Model parameters estimated over the first half of the data are used within online forecasting methods to issue forecasts over the second half of each data set. Offline data assimilation methods provided consistent advantages in predictive performance and practical usability in 4 of 6 patient data sets compared to online data assimilation methods alone; yet 2 of 6 patients were best predicted with a strictly online approach. Interestingly, parameter estimates generated offline led to worse predictions when fed to a stochastic filter than when used in a simple, unfiltered model that incorporates new nutritional information, but does not update model states based on glucose measurements. The relative improvements seen from the unfiltered model, when carefully trained offline, exposes challenges in model sensitivity and filtering applications, but also opens possibilities for improved glucose forecasting and relaxed patient self-monitoring requirements. △ Less

Submitted 1 September, 2017; originally announced September 2017.

Showing 1–12 of 12 results for author: Levine, M E