Search | arXiv e-print repository

SPDE priors for uncertainty quantification of end-to-end neural data assimilation schemes

Authors: Maxime Beauchamp, Nicolas Desassis, J. Emmanuel Johnson, Simon Benaichouche, Pierre Tandeo, Ronan Fablet

Abstract: The spatio-temporal interpolation of large geophysical datasets has historically been adressed by Optimal Interpolation (OI) and more sophisticated model-based or data-driven DA techniques. In the last ten years, the link established between Stochastic Partial Differential Equations (SPDE) and Gaussian Markov Random Fields (GMRF) opened a new way of handling both large datasets and physically-indu… ▽ More The spatio-temporal interpolation of large geophysical datasets has historically been adressed by Optimal Interpolation (OI) and more sophisticated model-based or data-driven DA techniques. In the last ten years, the link established between Stochastic Partial Differential Equations (SPDE) and Gaussian Markov Random Fields (GMRF) opened a new way of handling both large datasets and physically-induced covariance matrix in Optimal Interpolation. Recent advances in the deep learning community also enables to adress this problem as neural architecture embedding data assimilation variational framework. The reconstruction task is seen as a joint learning problem of the prior involved in the variational inner cost and the gradient-based minimization of the latter: both prior models and solvers are stated as neural networks with automatic differentiation which can be trained by minimizing a loss function, typically stated as the mean squared error between some ground truth and the reconstruction. In this work, we draw from the SPDE-based Gaussian Processes to estimate complex prior models able to handle non-stationary covariances in both space and time and provide a stochastic framework for interpretability and uncertainty quantification. Our neural variational scheme is modified to embed an augmented state formulation with both state and SPDE parametrization to estimate. Instead of a neural prior, we use a stochastic PDE as surrogate model along the data assimilation window. The training involves a loss function for both reconstruction task and SPDE prior model, where the likelihood of the SPDE parameters given the true states is involved in the training. Because the prior is stochastic, we can easily draw samples in the prior distribution before conditioning to provide a flexible way to estimate the posterior distribution based on thousands of members. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2309.15599 [pdf, other]

OceanBench: The Sea Surface Height Edition

Authors: J. Emmanuel Johnson, Quentin Febvre, Anastasia Gorbunova, Sammy Metref, Maxime Ballarotta, Julien Le Sommer, Ronan Fablet

Abstract: The ocean profoundly influences human activities and plays a critical role in climate regulation. Our understanding has improved over the last decades with the advent of satellite remote sensing data, allowing us to capture essential quantities over the globe, e.g., sea surface height (SSH). However, ocean satellite data presents challenges for information extraction due to their sparsity and irre… ▽ More The ocean profoundly influences human activities and plays a critical role in climate regulation. Our understanding has improved over the last decades with the advent of satellite remote sensing data, allowing us to capture essential quantities over the globe, e.g., sea surface height (SSH). However, ocean satellite data presents challenges for information extraction due to their sparsity and irregular sampling, signal complexity, and noise. Machine learning (ML) techniques have demonstrated their capabilities in dealing with large-scale, complex signals. Therefore we see an opportunity for ML models to harness the information contained in ocean satellite data. However, data representation and relevant evaluation metrics can be the defining factors when determining the success of applied ML. The processing steps from the raw observation data to a ML-ready state and from model outputs to interpretable quantities require domain expertise, which can be a significant barrier to entry for ML researchers. OceanBench is a unifying framework that provides standardized processing steps that comply with domain-expert standards. It provides plug-and-play data and pre-configured pipelines for ML researchers to benchmark their models and a transparent configurable framework for researchers to customize and extend the pipeline for their tasks. In this work, we demonstrate the OceanBench framework through a first edition dedicated to SSH interpolation challenges. We provide datasets and ML-ready benchmarking pipelines for the long-standing problem of interpolating observations from simulated ocean satellite data, multi-modal and multi-sensor fusion issues, and transfer-learning to real ocean satellite observations. The OceanBench framework is available at github.com/jejjohnson/oceanbench and the dataset registry is available at github.com/quentinf00/oceanbench-data-registry. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: J. Emmanuel Johnson and Quentin Febvre contributed equally to this work

arXiv:2211.10444 [pdf, other]

Neural Fields for Fast and Scalable Interpolation of Geophysical Ocean Variables

Authors: J. Emmanuel Johnson, Redouane Lguensat, Ronan Fablet, Emmanuel Cosme, Julien Le Sommer

Abstract: Optimal Interpolation (OI) is a widely used, highly trusted algorithm for interpolation and reconstruction problems in geosciences. With the influx of more satellite missions, we have access to more and more observations and it is becoming more pertinent to take advantage of these observations in applications such as forecasting and reanalysis. With the increase in the volume of available data, sc… ▽ More Optimal Interpolation (OI) is a widely used, highly trusted algorithm for interpolation and reconstruction problems in geosciences. With the influx of more satellite missions, we have access to more and more observations and it is becoming more pertinent to take advantage of these observations in applications such as forecasting and reanalysis. With the increase in the volume of available data, scalability remains an issue for standard OI and it prevents many practitioners from effectively and efficiently taking advantage of these large sums of data to learn the model hyperparameters. In this work, we leverage recent advances in Neural Fields (NerFs) as an alternative to the OI framework where we show how they can be easily applied to standard reconstruction problems in physical oceanography. We illustrate the relevance of NerFs for gap-filling of sparse measurements of sea surface height (SSH) via satellite altimetry and demonstrate how NerFs are scalable with comparable results to the standard OI. We find that NerFs are a practical set of methods that can be readily applied to geoscience interpolation problems and we anticipate a wider adoption in the future. △ Less

Submitted 18 November, 2022; originally announced November 2022.

Comments: Machine Learning and the Physical Sciences workshop, NeurIPS 2022

arXiv:2206.03860 [pdf, other]

Orthonormal Convolutions for the Rotation Based Iterative Gaussianization

Authors: Valero Laparra, Alexander Hepburn, J. Emmanuel Johnson, Jesús Malo

Abstract: In this paper we elaborate an extension of rotation-based iterative Gaussianization, RBIG, which makes image Gaussianization possible. Although RBIG has been successfully applied to many tasks, it is limited to medium dimensionality data (on the order of a thousand dimensions). In images its application has been restricted to small image patches or isolated pixels, because rotation in RBIG is base… ▽ More In this paper we elaborate an extension of rotation-based iterative Gaussianization, RBIG, which makes image Gaussianization possible. Although RBIG has been successfully applied to many tasks, it is limited to medium dimensionality data (on the order of a thousand dimensions). In images its application has been restricted to small image patches or isolated pixels, because rotation in RBIG is based on principal or independent component analysis and these transformations are difficult to learn and scale. Here we present the \emph{Convolutional RBIG}: an extension that alleviates this issue by imposing that the rotation in RBIG is a convolution. We propose to learn convolutional rotations (i.e. orthonormal convolutions) by optimising for the reconstruction loss between the input and an approximate inverse of the transformation using the transposed convolution operation. Additionally, we suggest different regularizers in learning these orthonormal convolutions. For example, imposing sparsity in the activations leads to a transformation that extends convolutional independent component analysis to multilayer architectures. We also highlight how statistical properties of the data, such as multivariate mutual information, can be obtained from \emph{Convolutional RBIG}. We illustrate the behavior of the transform with a simple example of texture synthesis, and analyze its properties by visualizing the stimuli that maximize the response in certain feature and layer. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2205.08864 [pdf, ps, other]

The Kernelized Taylor Diagram

Authors: Kristoffer Wickstrøm, J. Emmanuel Johnson, Sigurd Løkse, Gustau Camps-Valls, Karl Øyvind Mikalsen, Michael Kampffmeyer, Robert Jenssen

Abstract: This paper presents the kernelized Taylor diagram, a graphical framework for visualizing similarities between data populations. The kernelized Taylor diagram builds on the widely used Taylor diagram, which is used to visualize similarities between populations. However, the Taylor diagram has several limitations such as not capturing non-linear relationships and sensitivity to outliers. To address… ▽ More This paper presents the kernelized Taylor diagram, a graphical framework for visualizing similarities between data populations. The kernelized Taylor diagram builds on the widely used Taylor diagram, which is used to visualize similarities between populations. However, the Taylor diagram has several limitations such as not capturing non-linear relationships and sensitivity to outliers. To address such limitations, we propose the kernelized Taylor diagram. Our proposed kernelized Taylor diagram is capable of visualizing similarities between populations with minimal assumptions of the data distributions. The kernelized Taylor diagram relates the maximum mean discrepancy and the kernel mean embedding in a single diagram, a construction that, to the best of our knowledge, have not been devised prior to this work. We believe that the kernelized Taylor diagram can be a valuable tool in data visualization. △ Less

Submitted 18 May, 2022; originally announced May 2022.

Comments: Accepted at the Norwegian Artificial Intelligence Symposium 2022. Code available at: https://github.com/Wickstrom/KernelizedTaylorDiagram

arXiv:2012.04947 [pdf, other]

doi 10.1109/IGARSS.2018.8519020

Disentangling Derivatives, Uncertainty and Error in Gaussian Process Models

Authors: Juan Emmanuel Johnson, Valero Laparra, Gustau Camps-Valls

Abstract: Gaussian Processes (GPs) are a class of kernel methods that have shown to be very useful in geoscience applications. They are widely used because they are simple, flexible and provide very accurate estimates for nonlinear problems, especially in parameter retrieval. An addition to a predictive mean function, GPs come equipped with a useful property: the predictive variance function which provides… ▽ More Gaussian Processes (GPs) are a class of kernel methods that have shown to be very useful in geoscience applications. They are widely used because they are simple, flexible and provide very accurate estimates for nonlinear problems, especially in parameter retrieval. An addition to a predictive mean function, GPs come equipped with a useful property: the predictive variance function which provides confidence intervals for the predictions. The GP formulation usually assumes that there is no input noise in the training and testing points, only in the observations. However, this is often not the case in Earth observation problems where an accurate assessment of the instrument error is usually available. In this paper, we showcase how the derivative of a GP model can be used to provide an analytical error propagation formulation and we analyze the predictive variance and the propagated error terms in a temperature prediction problem from infrared sounding data. △ Less

Submitted 9 December, 2020; originally announced December 2020.

Comments: arXiv admin note: text overlap with arXiv:2005.09907

Journal ref: 2018 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

arXiv:2012.01985 [pdf, other]

RotNet: Fast and Scalable Estimation of Stellar Rotation Periods Using Convolutional Neural Networks

Authors: J. Emmanuel Johnson, Sairam Sundaresan, Tansu Daylan, Lisseth Gavilan, Daniel K. Giles, Stela Ishitani Silva, Anna Jungbluth, Brett Morris, Andrés Muñoz-Jaramillo

Abstract: Magnetic activity in stars manifests as dark spots on their surfaces that modulate the brightness observed by telescopes. These light curves contain important information on stellar rotation. However, the accurate estimation of rotation periods is computationally expensive due to scarce ground truth information, noisy data, and large parameter spaces that lead to degenerate solutions. We harness t… ▽ More Magnetic activity in stars manifests as dark spots on their surfaces that modulate the brightness observed by telescopes. These light curves contain important information on stellar rotation. However, the accurate estimation of rotation periods is computationally expensive due to scarce ground truth information, noisy data, and large parameter spaces that lead to degenerate solutions. We harness the power of deep learning and successfully apply Convolutional Neural Networks to regress stellar rotation periods from Kepler light curves. Geometry-preserving time-series to image transformations of the light curves serve as inputs to a ResNet-18 based architecture which is trained through transfer learning. The McQuillan catalog of published rotation periods is used as ansatz to groundtruth. We benchmark the performance of our method against a random forest regressor, a 1D CNN, and the Auto-Correlation Function (ACF) - the current standard to estimate rotation periods. Despite limiting our input to fewer data points (1k), our model yields more accurate results and runs 350 times faster than ACF runs on the same number of data points and 10,000 times faster than ACF runs on 65k data points. With only minimal feature engineering our approach has impressive accuracy, motivating the application of deep learning to regress stellar parameters on an even larger scale △ Less

Submitted 3 December, 2020; v1 submitted 2 December, 2020; originally announced December 2020.

Comments: Accepted for the Machine Learning and the Physical Sciences Workshop, NeurIPS 2020

arXiv:2012.01012 [pdf, other]

Information Theory in Density Destructors

Authors: J. Emmanuel Johnson, Valero Laparra, Gustau Camps-Valls, Raul Santos-Rodríguez, Jesús Malo

Abstract: Density destructors are differentiable and invertible transforms that map multivariate PDFs of arbitrary structure (low entropy) into non-structured PDFs (maximum entropy). Multivariate Gaussianization and multivariate equalization are specific examples of this family, which break down the complexity of the original PDF through a set of elementary transforms that progressively remove the structure… ▽ More Density destructors are differentiable and invertible transforms that map multivariate PDFs of arbitrary structure (low entropy) into non-structured PDFs (maximum entropy). Multivariate Gaussianization and multivariate equalization are specific examples of this family, which break down the complexity of the original PDF through a set of elementary transforms that progressively remove the structure of the data. We demonstrate how this property of density destructive flows is connected to classical information theory, and how density destructors can be used to get more accurate estimates of information theoretic quantities. Experiments with total correlation and mutual information inmultivariate sets illustrate the ability of density destructors compared to competing methods. These results suggest that information theoretic measures may be an alternative optimization criteria when learning density destructive flows. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: Accepted at the Workshop on Invertible Neural Nets and Normalizing Flows, ICML 2019

arXiv:2010.03807 [pdf, other]

Information Theory Measures via Multidimensional Gaussianization

Authors: Valero Laparra, J. Emmanuel Johnson, Gustau Camps-Valls, Raul Santos-Rodríguez, Jesus Malo

Abstract: Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems. It has several desirable properties for real world applications: it naturally deals with multivariate data, it can handle heterogeneous data types, and the measures can be interpreted in physical units. However, it has not been adopted by a wider audience because obtaining informati… ▽ More Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems. It has several desirable properties for real world applications: it naturally deals with multivariate data, it can handle heterogeneous data types, and the measures can be interpreted in physical units. However, it has not been adopted by a wider audience because obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality. Here we propose an indirect way of computing information based on a multivariate Gaussianization transform. Our proposal mitigates the difficulty of multivariate density estimation by reducing it to a composition of tractable (marginal) operations and simple linear transformations, which can be interpreted as a particular deep neural network. We introduce specific Gaussianization-based methodologies to estimate total correlation, entropy, mutual information and Kullback-Leibler divergence. We compare them to recent estimators showing the accuracy on synthetic data generated from different multivariate distributions. We made the tools and datasets publicly available to provide a test-bed to analyze future methodologies. Results show that our proposal is superior to previous estimators particularly in high-dimensional scenarios; and that it leads to interesting insights in neuroscience, geoscience, computer vision, and machine learning. △ Less

Submitted 25 November, 2020; v1 submitted 8 October, 2020; originally announced October 2020.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2007.14706 [pdf, other]

doi 10.1371/journal.pone.0235885

Kernel Methods and their derivatives: Concept and perspectives for the Earth system sciences

Authors: J. Emmanuel Johnson, Valero Laparra, Adrián Pérez-Suay, Miguel D. Mahecha, Gustau Camps-Valls

Abstract: Kernel methods are powerful machine learning techniques which implement generic non-linear functions to solve complex tasks in a simple way. They Have a solid mathematical background and exhibit excellent performance in practice. However, kernel machines are still considered black-box models as the feature map** is not directly accessible and difficult to interpret.The aim of this work is to sho… ▽ More Kernel methods are powerful machine learning techniques which implement generic non-linear functions to solve complex tasks in a simple way. They Have a solid mathematical background and exhibit excellent performance in practice. However, kernel machines are still considered black-box models as the feature map** is not directly accessible and difficult to interpret.The aim of this work is to show that it is indeed possible to interpret the functions learned by various kernel methods is intuitive despite their complexity. Specifically, we show that derivatives of these functions have a simple mathematical formulation, are easy to compute, and can be applied to many different problems. We note that model function derivatives in kernel machines is proportional to the kernel function derivative. We provide the explicit analytic form of the first and second derivatives of the most common kernel functions with regard to the inputs as well as generic formulas to compute higher order derivatives. We use them to analyze the most used supervised and unsupervised kernel learning methods: Gaussian Processes for regression, Support Vector Machines for classification, Kernel Entropy Component Analysis for density estimation, and the Hilbert-Schmidt Independence Criterion for estimating the dependency between random variables. For all cases we expressed the derivative of the learned function as a linear combination of the kernel function derivative. Moreover we provide intuitive explanations through illustrative toy examples and show how to improve the interpretation of real applications in the context of spatiotemporal Earth system data cubes. This work reflects on the observation that function derivatives may play a crucial role in kernel methods analysis and understanding. △ Less

Submitted 5 October, 2020; v1 submitted 29 July, 2020; originally announced July 2020.

Comments: 21 pages, 10 figures, PLOS One Journal

arXiv:2005.09907 [pdf, other]

doi 10.1109/LGRS.2019.2921476

Accounting for Input Noise in Gaussian Process Parameter Retrieval

Authors: J. Emmanuel Johnson, Valero Laparra, Gustau Camps-Valls

Abstract: Gaussian processes (GPs) are a class of Kernel methods that have shown to be very useful in geoscience and remote sensing applications for parameter retrieval, model inversion, and emulation. They are widely used because they are simple, flexible, and provide accurate estimates. GPs are based on a Bayesian statistical framework which provides a posterior probability function for each estimation. T… ▽ More Gaussian processes (GPs) are a class of Kernel methods that have shown to be very useful in geoscience and remote sensing applications for parameter retrieval, model inversion, and emulation. They are widely used because they are simple, flexible, and provide accurate estimates. GPs are based on a Bayesian statistical framework which provides a posterior probability function for each estimation. Therefore, besides the usual prediction (given in this case by the mean function), GPs come equipped with the possibility to obtain a predictive variance (i.e., error bars, confidence intervals) for each prediction. Unfortunately, the GP formulation usually assumes that there is no noise in the inputs, only in the observations. However, this is often not the case in earth observation problems where an accurate assessment of the measuring instrument error is typically available, and where there is huge interest in characterizing the error propagation through the processing pipeline. In this letter, we demonstrate how one can account for input noise estimates using a GP model formulation which propagates the error terms using the derivative of the predictive mean function. We analyze the resulting predictive variance term and show how they more accurately represent the model error in a temperature prediction problem from infrared sounding data. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Journal ref: IEEE Geoscience and Remote Sensing Letters ( Volume: 17 , Issue: 3 , March 2020 )

arXiv:1812.06175 [pdf, other]

Can Deep Learning Predict Risky Retail Investors? A Case Study in Financial Risk Behavior Forecasting

Authors: Yaodong Yang, Alisa Kolesnikova, Stefan Lessmann, Tiejun Ma, Ming-Chien Sung, Johnnie E. V. Johnson

Abstract: The paper examines the potential of deep learning to support decisions in financial risk management. We develop a deep learning model for predicting whether individual spread traders secure profits from future trades. This task embodies typical modeling challenges faced in risk and behavior forecasting. Conventional machine learning requires data that is representative of the feature-target relati… ▽ More The paper examines the potential of deep learning to support decisions in financial risk management. We develop a deep learning model for predicting whether individual spread traders secure profits from future trades. This task embodies typical modeling challenges faced in risk and behavior forecasting. Conventional machine learning requires data that is representative of the feature-target relationship and relies on the often costly development, maintenance, and revision of handcrafted features. Consequently, modeling highly variable, heterogeneous patterns such as trader behavior is challenging. Deep learning promises a remedy. Learning hierarchical distributed representations of the data in an automatic manner (e.g. risk taking behavior), it uncovers generative features that determine the target (e.g., trader's profitability), avoids manual feature engineering, and is more robust toward change (e.g. dynamic market conditions). The results of employing a deep network for operational risk forecasting confirm the feature learning capability of deep learning, provide guidance on designing a suitable network architecture and demonstrate the superiority of deep learning over machine learning and rule-based benchmarks. △ Less

Submitted 17 November, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

Comments: Within the "equal" contribution, Yaodong Yang contributed the core deep learning algorithm along with its experimental results, and the first draft of the manuscript (including Figure 1,2,3,4,7,8,9,11, and Table 3)

arXiv:cond-mat/0209111 [pdf, ps, other]

Approaches to Network Classification

Authors: Vladimir Gudkov, Joseph E. Johnson, Shmuel Nussinov

Abstract: We introduce a novel approach to description of networks/graphs. It is based on an analogue physical model which is dynamically evolved. This evolution depends on the connectivity matrix and readily brings out many qualitative features of the graph. We introduce a novel approach to description of networks/graphs. It is based on an analogue physical model which is dynamically evolved. This evolution depends on the connectivity matrix and readily brings out many qualitative features of the graph. △ Less

Submitted 4 September, 2002; originally announced September 2002.

Comments: RevTeX4

arXiv:cs/0206020 [pdf, ps, other]

Multidimensional Network Monitoring for Intrusion Detection

Authors: Vladimir Gudkov, Joseph E. Johnson

Abstract: An approach for real-time network monitoring in terms of numerical time-dependant functions of protocol parameters is suggested. Applying complex systems theory for information f{l}ow analysis of networks, the information traffic is described as a trajectory in multi-dimensional parameter-time space with about 10-12 dimensions. The network traffic description is synthesized by applying methods o… ▽ More An approach for real-time network monitoring in terms of numerical time-dependant functions of protocol parameters is suggested. Applying complex systems theory for information f{l}ow analysis of networks, the information traffic is described as a trajectory in multi-dimensional parameter-time space with about 10-12 dimensions. The network traffic description is synthesized by applying methods of theoretical physics and complex systems theory, to provide a robust approach for network monitoring that detects known intrusions, and supports develo** real systems for detection of unknown intrusions. The methods of data analysis and pattern recognition presented are the basis of a technology study for an automatic intrusion detection system that detects the attack in the reconnaissance stage. △ Less

Submitted 13 June, 2002; originally announced June 2002.

Comments: Talk at the International Conference on Complex Systems, June 9-14, 2002, Nashua, NH

ACM Class: C.2.0.; C.2.3.; C.2.5

arXiv:cs/0110019 [pdf, ps, other]

New approach for network monitoring and intrusion detection

Authors: Vladimir Gudkov, Joseph E. Johnson

Abstract: The approach for a network behavior description in terms of numerical time-dependant functions of the protocol parameters is suggested. This provides a basis for application of methods of mathematical and theoretical physics for information flow analysis on network and for extraction of patterns of typical network behavior. The information traffic can be described as a trajectory in multi-dimens… ▽ More The approach for a network behavior description in terms of numerical time-dependant functions of the protocol parameters is suggested. This provides a basis for application of methods of mathematical and theoretical physics for information flow analysis on network and for extraction of patterns of typical network behavior. The information traffic can be described as a trajectory in multi-dimensional parameter-time space with dimension about 10-12. Based on this study some algorithms for the proposed intrusion detection system are discussed. △ Less

Submitted 5 October, 2001; originally announced October 2001.

ACM Class: K.6.5; E.4; F.1.1; F.1.3

Showing 1–15 of 15 results for author: Johnson, J E