Search | arXiv e-print repository

GeoThermalCloud: Machine Learning for Geothermal Resource Exploration

Authors: Maruti K. Mudunuru, Velimir V. Vesselinov, Bulbul Ahmmed

Abstract: This paper presents a novel ML-based methodology for geothermal exploration towards PFA applications. Our methodology is provided through our open-source ML framework, GeoThermalCloud \url{https://github.com/SmartTensors/GeoThermalCloud.jl}. The GeoThermalCloud uses a series of unsupervised, supervised, and physics-informed ML methods available in SmartTensors AI platform \url{https://github.com/S… ▽ More This paper presents a novel ML-based methodology for geothermal exploration towards PFA applications. Our methodology is provided through our open-source ML framework, GeoThermalCloud \url{https://github.com/SmartTensors/GeoThermalCloud.jl}. The GeoThermalCloud uses a series of unsupervised, supervised, and physics-informed ML methods available in SmartTensors AI platform \url{https://github.com/SmartTensors}. Here, the presented analyses are performed using our unsupervised ML algorithm called NMF$k$, which is available in the SmartTensors AI platform. Our ML algorithm facilitates the discovery of new phenomena, hidden patterns, and mechanisms that helps us to make informed decisions. Moreover, the GeoThermalCloud enhances the collected PFA data and discovers signatures representative of geothermal resources. Through GeoThermalCloud, we could identify hidden patterns in the geothermal field data needed to discover blind systems efficiently. Crucial geothermal signatures often overlooked in traditional PFA are extracted using the GeoThermalCloud and analyzed by the subject matter experts to provide ML-enhanced PFA, which is informative for efficient exploration. We applied our ML methodology to various open-source geothermal datasets within the U.S. (some of these are collected by past PFA work). The results provide valuable insights into resource types within those regions. This ML-enhanced workflow makes the GeoThermalCloud attractive for the geothermal community to improve existing datasets and extract valuable information often unnoticed during geothermal exploration. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Comments: 24 pages, 7 figures

arXiv:2002.11511 [pdf, other]

doi 10.1016/j.jcp.2021.110147

A Comparative Study of Machine Learning Models for Predicting the State of Reactive Mixing

Authors: B. Ahmmed, M. K. Mudunuru, S. Karra, S. C. James, V. V. Vesselinov

Abstract: Accurate predictions of reactive mixing are critical for many Earth and environmental science problems. To investigate mixing dynamics over time under different scenarios, a high-fidelity, finite-element-based numerical model is built to solve the fast, irreversible bimolecular reaction-diffusion equations to simulate a range of reactive-mixing scenarios. A total of 2,315 simulations are performed… ▽ More Accurate predictions of reactive mixing are critical for many Earth and environmental science problems. To investigate mixing dynamics over time under different scenarios, a high-fidelity, finite-element-based numerical model is built to solve the fast, irreversible bimolecular reaction-diffusion equations to simulate a range of reactive-mixing scenarios. A total of 2,315 simulations are performed using different sets of model input parameters comprising various spatial scales of vortex structures in the velocity field, time-scales associated with velocity oscillations, the perturbation parameter for the vortex-based velocity, anisotropic dispersion contrast, and molecular diffusion. Outputs comprise concentration profiles of the reactants and products. The inputs and outputs of these simulations are concatenated into feature and label matrices, respectively, to train 20 different machine learning (ML) emulators to approximate system behavior. The 20 ML emulators based on linear methods, Bayesian methods, ensemble learning methods, and multilayer perceptron (MLP), are compared to assess these models. The ML emulators are specifically trained to classify the state of mixing and predict three quantities of interest (QoIs) characterizing species production, decay, and degree of mixing. Linear classifiers and regressors fail to reproduce the QoIs; however, ensemble methods (classifiers and regressors) and the MLP accurately classify the state of reactive mixing and the QoIs. Among ensemble methods, random forest and decision-tree-based AdaBoost faithfully predict the QoIs. At run time, trained ML emulators are $\approx10^5$ times faster than the high-fidelity numerical simulations. Speed and accuracy of the ensemble and MLP models facilitate uncertainty quantification, which usually requires 1,000s of model run, to estimate the uncertainty bounds on the QoIs. △ Less

Submitted 24 February, 2020; originally announced February 2020.

Comments: 31 pages

arXiv:1906.02401 [pdf, other]

Learning to regularize with a variational autoencoder for hydrologic inverse analysis

Authors: Daniel O'Malley, John K. Golden, Velimir V. Vesselinov

Abstract: Inverse problems often involve matching observational data using a physical model that takes a large number of parameters as input. These problems tend to be under-constrained and require regularization to impose additional structure on the solution in parameter space. A central difficulty in regularization is turning a complex conceptual model of this additional structure into a functional mathem… ▽ More Inverse problems often involve matching observational data using a physical model that takes a large number of parameters as input. These problems tend to be under-constrained and require regularization to impose additional structure on the solution in parameter space. A central difficulty in regularization is turning a complex conceptual model of this additional structure into a functional mathematical form to be used in the inverse analysis. In this work we propose a method of regularization involving a machine learning technique known as a variational autoencoder (VAE). The VAE is trained to map a low-dimensional set of latent variables with a simple structure to the high-dimensional parameter space that has a complex structure. We train a VAE on unconditioned realizations of the parameters for a hydrological inverse problem. These unconditioned realizations neither rely on the observational data used to perform the inverse analysis nor require any forward runs of the physical model, thus making the computational cost of generating the training data minimal. The central benefit of this approach is that regularization is then performed on the latent variables from the VAE, which can be regularized simply. A second benefit of this approach is that the VAE reduces the number of variables in the optimization problem, thus making gradient-based optimization more computationally efficient when adjoint methods are unavailable. After performing regularization and optimization on the latent variables, the VAE then decodes the problem back to the original parameter space. Our approach constitutes a novel framework for regularization and optimization, readily applicable to a wide range of inverse problems. We call the approach RegAE. △ Less

Submitted 5 June, 2019; originally announced June 2019.

arXiv:1805.06454 [pdf, other]

doi 10.1016/j.jcp.2019.05.039

Unsupervised Machine Learning Based on Non-Negative Tensor Factorization for Analyzing Reactive-Mixing

Authors: V. V. Vesselinov, M. K. Mudunuru, S. Karra, D. O. Malley, B. S. Alexandrov

Abstract: Analysis of reactive-diffusion simulations requires a large number of independent model runs. For each high-fidelity simulation, inputs are varied and the predicted mixing behavior is represented by changes in species concentration. It is then required to discern how the model inputs impact the mixing process. This task is challenging and typically involves interpretation of large model outputs. H… ▽ More Analysis of reactive-diffusion simulations requires a large number of independent model runs. For each high-fidelity simulation, inputs are varied and the predicted mixing behavior is represented by changes in species concentration. It is then required to discern how the model inputs impact the mixing process. This task is challenging and typically involves interpretation of large model outputs. However, the task can be automated and substantially simplified by applying Machine Learning (ML) methods. In this paper, we present an application of an unsupervised ML method (called NTFk) using Non-negative Tensor Factorization (NTF) coupled with a custom clustering procedure based on k-means to reveal hidden features in product concentration. An attractive aspect of the proposed ML method is that it ensures the extracted features are non-negative, which are important to obtain a meaningful deconstruction of the mixing processes. The ML method is applied to a large set of high-resolution FEM simulations representing reaction-diffusion processes in perturbed vortex-based velocity fields. The applied FEM ensures that species concentration are always non-negative. The simulated reaction is a fast irreversible bimolecular reaction. The reactive-diffusion model input parameters that control mixing include properties of velocity field, anisotropic dispersion, and molecular diffusion. We demonstrate the applicability of the ML method to produce a meaningful deconstruction of model outputs to discriminate between different physical processes impacting the reactants, their mixing, and the spatial distribution of the product. The presented ML analysis allowed us to identify additive features that characterize mixing behavior. △ Less

Submitted 21 February, 2019; v1 submitted 15 May, 2018; originally announced May 2018.

Comments: 34 pages

arXiv:1704.01605 [pdf, other]

doi 10.1371/journal.pone.0206653

Nonnegative/binary matrix factorization with a D-Wave quantum annealer

Authors: Daniel O'Malley, Velimir V. Vesselinov, Boian S. Alexandrov, Ludmil B. Alexandrov

Abstract: D-Wave quantum annealers represent a novel computational architecture and have attracted significant interest, but have been used for few real-world computations. Machine learning has been identified as an area where quantum annealing may be useful. Here, we show that the D-Wave 2X can be effectively used as part of an unsupervised machine learning method. This method can be used to analyze large… ▽ More D-Wave quantum annealers represent a novel computational architecture and have attracted significant interest, but have been used for few real-world computations. Machine learning has been identified as an area where quantum annealing may be useful. Here, we show that the D-Wave 2X can be effectively used as part of an unsupervised machine learning method. This method can be used to analyze large datasets. The D-Wave only limits the number of features that can be extracted from the dataset. We apply this method to learn the features from a set of facial images. △ Less

Submitted 5 April, 2017; originally announced April 2017.

arXiv:1612.03950 [pdf]

Nonnegative Matrix Factorization for identification of unknown number of sources emitting delayed signals

Authors: Filip L. Iliev, Valentin G. Stanev, Velimir V. Vesselinov, Boian S. Alexandrov

Abstract: Factor analysis is broadly used as a powerful unsupervised machine learning tool for reconstruction of hidden features in recorded mixtures of signals. In the case of a linear approximation, the mixtures can be decomposed by a variety of model-free Blind Source Separation (BSS) algorithms. Most of the available BSS algorithms consider an instantaneous mixing of signals, while the case when the mix… ▽ More Factor analysis is broadly used as a powerful unsupervised machine learning tool for reconstruction of hidden features in recorded mixtures of signals. In the case of a linear approximation, the mixtures can be decomposed by a variety of model-free Blind Source Separation (BSS) algorithms. Most of the available BSS algorithms consider an instantaneous mixing of signals, while the case when the mixtures are linear combinations of signals with delays is less explored. Especially difficult is the case when the number of sources of the signals with delays is unknown and has to be determined from the data as well. To address this problem, in this paper, we present a new method based on Nonnegative Matrix Factorization (NMF) that is capable of identifying: (a) the unknown number of the sources, (b) the delays and speed of propagation of the signals, and (c) the locations of the sources. Our method can be used to decompose records of mixtures of signals with delays emitted by an unknown number of sources in a nondispersive medium, based only on recorded data. This is the case, for example, when electromagnetic signals from multiple antennas are received asynchronously; or mixtures of acoustic or seismic signals recorded by sensors located at different positions; or when a shift in frequency is induced by the Doppler effect. By applying our method to synthetic datasets, we demonstrate its ability to identify the unknown number of sources as well as the waveforms, the delays, and the strengths of the signals. Using Bayesian analysis, we also evaluate estimation uncertainties and identify the region of likelihood where the positions of the sources can be found. △ Less

Submitted 23 March, 2018; v1 submitted 12 December, 2016; originally announced December 2016.

Report number: LA-UR-16-27232 MSC Class: ]68T10

Journal ref: PloS one. 2018 Mar 8;13(3):e0193974

arXiv:1612.03948 [pdf, other]

doi 10.1016/j.apm.2018.03.006

Identification of release sources in advection-diffusion system by machine learning combined with Green function inverse method

Authors: Valentin G. Stanev, Filip L. Iliev, Scott Hansen, Velimir V. Vesselinov, Boian S. Alexandrov

Abstract: The identification of sources of advection-diffusion transport is based usually on solving complex ill-posed inverse models against the available state- variable data records. However, if there are several sources with different locations and strengths, the data records represent mixtures rather than the separate influences of the original sources. Importantly, the number of these original release… ▽ More The identification of sources of advection-diffusion transport is based usually on solving complex ill-posed inverse models against the available state- variable data records. However, if there are several sources with different locations and strengths, the data records represent mixtures rather than the separate influences of the original sources. Importantly, the number of these original release sources is typically unknown, which hinders reliability of the classical inverse-model analyses. To address this challenge, we present here a novel hybrid method for identification of the unknown number of release sources. Our hybrid method, called HNMF, couples unsupervised learning based on Nonnegative Matrix Factorization (NMF) and inverse-analysis Green functions method. HNMF synergistically performs decomposition of the recorded mixtures, finds the number of the unknown sources and uses the Green function of advection-diffusion equation to identify their characteristics. In the paper, we introduce the method and demonstrate that it is capable of identifying the advection velocity and dispersivity of the medium as well as the unknown number, locations, and properties of various sets of synthetic release sources with different space and time dependencies, based only on the recorded data. HNMF can be applied directly to any problem controlled by a partial-differential parabolic equation where mixtures of an unknown number of sources are measured at multiple locations. △ Less

Submitted 23 March, 2018; v1 submitted 12 December, 2016; originally announced December 2016.

Report number: LA-UR-16-27231 MSC Class: 68T10

Showing 1–7 of 7 results for author: Vesselinov, V V