-
A non-intrusive machine learning framework for debiasing long-time coarse resolution climate simulations and quantifying rare events statistics
Authors:
Benedikt Barthel Sorensen,
Alexis Charalampopoulos,
Shixuan Zhang,
Bryce Harrop,
Ruby Leung,
Themistoklis Sapsis
Abstract:
Due to the rapidly changing climate, the frequency and severity of extreme weather is expected to increase over the coming decades. As fully-resolved climate simulations remain computationally intractable, policy makers must rely on coarse-models to quantify risk for extremes. However, coarse models suffer from inherent bias due to the ignored "sub-grid" scales. We propose a framework to non-intru…
▽ More
Due to the rapidly changing climate, the frequency and severity of extreme weather is expected to increase over the coming decades. As fully-resolved climate simulations remain computationally intractable, policy makers must rely on coarse-models to quantify risk for extremes. However, coarse models suffer from inherent bias due to the ignored "sub-grid" scales. We propose a framework to non-intrusively debias coarse-resolution climate predictions using neural-network (NN) correction operators. Previous efforts have attempted to train such operators using loss functions that match statistics. However, this approach falls short with events that have longer return period than that of the training data, since the reference statistics have not converged. Here, the scope is to formulate a learning method that allows for correction of dynamics and quantification of extreme events with longer return period than the training data. The key obstacle is the chaotic nature of the underlying dynamics. To overcome this challenge, we introduce a dynamical systems approach where the correction operator is trained using reference data and a coarse model simulation nudged towards that reference. The method is demonstrated on debiasing an under-resolved quasi-geostrophic model and the Energy Exascale Earth System Model (E3SM). For the former, our method enables the quantification of events that have return period two orders longer than the training data. For the latter, when trained on 8 years of ERA5 data, our approach is able to correct the coarse E3SM output to closely reflect the 36-year ERA5 statistics for all prognostic variables and significantly reduce their spatial biases.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Improved statistical benchmarking of digital pathology models using pairwise frames evaluation
Authors:
Ylaine Gerardin,
John Shamshoian,
Judy Shen,
Nhat Le,
Jamie Prezioso,
John Abel,
Isaac Finberg,
Daniel Borders,
Raymond Biju,
Michael Nercessian,
Vaed Prasad,
Joseph Lee,
Spencer Wyman,
Sid Gupta,
Abigail Emerson,
Bahar Rahsepar,
Darpan Sanghavi,
Ryan Leung,
Limin Yu,
Archit Khosla,
Amaro Taylor-Weiner
Abstract:
Nested pairwise frames is a method for relative benchmarking of cell or tissue digital pathology models against manual pathologist annotations on a set of sampled patches. At a high level, the method compares agreement between a candidate model and pathologist annotations with agreement among pathologists' annotations. This evaluation framework addresses fundamental issues of data size and annotat…
▽ More
Nested pairwise frames is a method for relative benchmarking of cell or tissue digital pathology models against manual pathologist annotations on a set of sampled patches. At a high level, the method compares agreement between a candidate model and pathologist annotations with agreement among pathologists' annotations. This evaluation framework addresses fundamental issues of data size and annotator variability in using manual pathologist annotations as a source of ground truth for model validation. We implemented nested pairwise frames evaluation for tissue classification, cell classification, and cell count prediction tasks and show results for cell and tissue models deployed on an H&E-stained melanoma dataset.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
An experience with PyCUDA: Refactoring an existing implementation of a ray-surface intersection algorithm
Authors:
Raymond Leung
Abstract:
This article is a sequel to "GPU implementation of a ray-surface intersection algorithm in CUDA" (arXiv:2209.02878) [1]. Its main focus is PyCUDA which represents a Python scripting approach to GPU run-time code generation in the Compute Unified Device Architecture (CUDA) framework. It accompanies the open-source code distributed in GitHub which provides a PyCUDA implementation of a GPU-based line…
▽ More
This article is a sequel to "GPU implementation of a ray-surface intersection algorithm in CUDA" (arXiv:2209.02878) [1]. Its main focus is PyCUDA which represents a Python scripting approach to GPU run-time code generation in the Compute Unified Device Architecture (CUDA) framework. It accompanies the open-source code distributed in GitHub which provides a PyCUDA implementation of a GPU-based line-segment, surface-triangle intersection test. The objective is to share a PyCUDA learning experience with people who are new to PyCUDA. Using the existing CUDA code and foundation from [1] as the starting point, we document the key changes made to facilitate a transition to PyCUDA. As the CUDA source for the ray-surface intersection test contains both host and device code and uses multiple kernel functions, these notes offer a substantive example and real-world perspective of what it is like to utilize PyCUDA. It delves into custom data structures such as binary radix tree and highlights some possible pitfalls. The case studies present a debugging strategy which may be used to examine complex C structures in device memory using standard Python tools without the CUDA-GDB debugger.
△ Less
Submitted 4 May, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Unsupervised ore/waste classification on open-cut mine faces using close-range hyperspectral data
Authors:
Lloyd Windrim,
Arman Melkumyan,
Richard J. Murphy,
Anna Chlingaryan,
Raymond Leung
Abstract:
The remote map** of minerals and discrimination of ore and waste on surfaces are important tasks for geological applications such as those in mining. Such tasks have become possible using ground-based, close-range hyperspectral sensors which can remotely measure the reflectance properties of the environment with high spatial and spectral resolution. However, autonomous map** of mineral spectra…
▽ More
The remote map** of minerals and discrimination of ore and waste on surfaces are important tasks for geological applications such as those in mining. Such tasks have become possible using ground-based, close-range hyperspectral sensors which can remotely measure the reflectance properties of the environment with high spatial and spectral resolution. However, autonomous map** of mineral spectra measured on an open-cut mine face remains a challenging problem due to the subtleness of differences in spectral absorption features between mineral and rock classes as well as variability in the illumination of the scene. An additional layer of difficulty arises when there is no annotated data available to train a supervised learning algorithm. A pipeline for unsupervised map** of spectra on a mine face is proposed which draws from several recent advances in the hyperspectral machine learning literature. The proposed pipeline brings together unsupervised and self-supervised algorithms in a unified system to map minerals on a mine face without the need for human-annotated training data. The pipeline is evaluated with a hyperspectral image dataset of an open-cut mine face comprising mineral ore martite and non-mineralised shale. The combined system is shown to produce a superior map to its constituent algorithms, and the consistency of its map** capability is demonstrated using data acquired at two different times of day.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Learning bias corrections for climate models using deep neural operators
Authors:
Aniruddha Bora,
Khemraj Shukla,
Shixuan Zhang,
Bryce Harrop,
Ruby Leung,
George Em Karniadakis
Abstract:
Numerical simulation for climate modeling resolving all important scales is a computationally taxing process. Therefore, to circumvent this issue a low resolution simulation is performed, which is subsequently corrected for bias using reanalyzed data (ERA5), known as nudging correction. The existing implementation for nudging correction uses a relaxation based method for the algebraic difference b…
▽ More
Numerical simulation for climate modeling resolving all important scales is a computationally taxing process. Therefore, to circumvent this issue a low resolution simulation is performed, which is subsequently corrected for bias using reanalyzed data (ERA5), known as nudging correction. The existing implementation for nudging correction uses a relaxation based method for the algebraic difference between low resolution and ERA5 data. In this study, we replace the bias correction process with a surrogate model based on the Deep Operator Network (DeepONet). DeepONet (Deep Operator Neural Network) learns the map** from the state before nudging (a functional) to the nudging tendency (another functional). The nudging tendency is a very high dimensional data albeit having many low energy modes. Therefore, the DeepoNet is combined with a convolution based auto-encoder-decoder (AED) architecture in order to learn the nudging tendency in a lower dimensional latent space efficiently. The accuracy of the DeepONet model is tested against the nudging tendency obtained from the E3SMv2 (Energy Exascale Earth System Model) and shows good agreement. The overarching goal of this work is to deploy the DeepONet model in an online setting and replace the nudging module in the E3SM loop for better efficiency and accuracy.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Automation and AI Technology in Surface Mining With a Brief Introduction to Open-Pit Operations in the Pilbara
Authors:
Raymond Leung,
Andrew J Hill,
Arman Melkumyan
Abstract:
This survey article provides a synopsis on some of the engineering problems, technological innovations, robotic development and automation efforts encountered in the mining industry -- particularly in the Pilbara iron-ore region of Western Australia. The goal is to paint the technology landscape and highlight issues relevant to an engineering audience to raise awareness of AI and automation trends…
▽ More
This survey article provides a synopsis on some of the engineering problems, technological innovations, robotic development and automation efforts encountered in the mining industry -- particularly in the Pilbara iron-ore region of Western Australia. The goal is to paint the technology landscape and highlight issues relevant to an engineering audience to raise awareness of AI and automation trends in mining. It assumes the reader has no prior knowledge of mining and builds context gradually through focused discussion and short summaries of common open-pit mining operations. The principal activities that take place may be categorized in terms of resource development, mine-, rail- and port operations. From mineral exploration to ore shipment, there are roughly nine steps in between. These include: geological assessment, mine planning and development, production drilling and assaying, blasting and excavation, transportation of ore and waste, crush and screen, stockpile and load-out, rail network distribution, and ore-car dum**. The objective is to describe these processes and provide insights on some of the challenges/opportunities from the perspective of a decade-long industry-university R&D partnership.
△ Less
Submitted 15 October, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
GPU implementation of a ray-surface intersection algorithm in CUDA (Compute Unified Device Architecture)
Authors:
Raymond Leung
Abstract:
These notes accompany the open-source code published in GitHub which implements a GPU-based line-segment, surface-triangle intersection algorithm in CUDA. It mentions some relevant works and discusses issues specific to this implementation. The goal is to provide software documentation and greater clarity on collision buffer management which is sometimes omitted in online literature. For real-worl…
▽ More
These notes accompany the open-source code published in GitHub which implements a GPU-based line-segment, surface-triangle intersection algorithm in CUDA. It mentions some relevant works and discusses issues specific to this implementation. The goal is to provide software documentation and greater clarity on collision buffer management which is sometimes omitted in online literature. For real-world applications, CPU-based implementations of the test are often deemed too slow to be useful. In contrast, the code described here targets Nvidia GPU devices and offers a solution that is vastly more efficient and scalable. The main API is also wrapped in Python. This geometry test is applied in various engineering problems, so the software developed can be reused in new situations.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Empirical observations on the effects of data transformation in machine learning classification of geological domains
Authors:
Raymond Leung
Abstract:
In the literature, a large body of work advocates the use of log-ratio transformation for multivariate statistical analysis of compositional data. In contrast, few studies have looked at how data transformation changes the efficacy of machine learning classifiers within geoscience. This letter presents experiment results and empirical observations to further explore this issue. The objective is to…
▽ More
In the literature, a large body of work advocates the use of log-ratio transformation for multivariate statistical analysis of compositional data. In contrast, few studies have looked at how data transformation changes the efficacy of machine learning classifiers within geoscience. This letter presents experiment results and empirical observations to further explore this issue. The objective is to study the effects of data transformation on geozone classification performance when machine learning (ML) classifiers/estimators are trained using geochemical data. The training input consists of exploration hole assay samples obtained from a Pilbara iron-ore deposit in Western Australia, and geozone labels assigned based on stratigraphic units, the absence or presence and type of mineralization. The ML techniques considered are multinomial logistic regression, Gaussian naïve Bayes, kNN, linear support vector classifier, RBF-SVM, gradient boosting and extreme GB, random forest (RF) and multi-layer perceptron (MLP). The transformations examined include isometric log-ratio (ILR), center log-ratio (CLR) coupled with principal component analysis (PCA) or independent component analysis (ICA), and a manifold learning approach based on local linear embedding (LLE). The results reveal that different ML classifiers exhibit varying sensitivity to these transformations, with some clearly more advantageous or deleterious than others. Overall, the best performing candidate is ILR which is unsurprising considering the compositional nature of the data. The performance of pairwise log-ratio (PWLR) transformation is better than ILR for ensemble and tree-based learners such as boosting and RF; but worse for MLP, SVM and other classifiers.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
Surface War** Incorporating Machine Learning Assisted Domain Likelihood Estimation: A New Paradigm in Mine Geology Modelling and Automation
Authors:
Raymond Leung,
Mehala Balamurali,
Alexander Lowe
Abstract:
This paper illustrates an application of machine learning (ML) within a complex system that performs grade estimation. In surface mining, assay measurements taken from production drilling often provide useful information that allows initially inaccurate surfaces created using sparse exploration data to be revised and subsequently improved. Recently, a Bayesian war** technique has been proposed t…
▽ More
This paper illustrates an application of machine learning (ML) within a complex system that performs grade estimation. In surface mining, assay measurements taken from production drilling often provide useful information that allows initially inaccurate surfaces created using sparse exploration data to be revised and subsequently improved. Recently, a Bayesian war** technique has been proposed to reshape modeled surfaces using geochemical and spatial constraints imposed by newly acquired blasthole data. This paper focuses on incorporating machine learning into this war** framework to make the likelihood computation generalizable. The technique works by adjusting the position of vertices on the surface to maximize the integrity of modeled geological boundaries with respect to sparse geochemical observations. Its foundation is laid by a Bayesian derivation in which the geological domain likelihood given the chemistry, p(g|c), plays a similar role to p(y(c)|g). This observation allows a manually calibrated process centered around the latter to be automated since ML techniques may be used to estimate the former in a data-driven way. Machine learning performance is evaluated for gradient boosting, neural network, random forest and other classifiers in a binary and multi-class context using precision and recall rates. Once ML likelihood estimators are integrated in the surface war** framework, surface sha** performance is evaluated using unseen data by examining the categorical distribution of test samples located above and below the warped surface. Large-scale validation experiments are performed to assess the overall efficacy of ML assisted surface war** as a fully integrated component within an ore grade estimation system where the posterior mean is obtained via Gaussian Process inference with a Matern 3/2 kernel.
△ Less
Submitted 13 September, 2021; v1 submitted 15 February, 2021;
originally announced March 2021.
-
Subsurface Boundary Geometry Modeling: Applying Computational Physics, Computer Vision and Signal Processing Techniques to Geoscience
Authors:
Raymond Leung
Abstract:
This paper describes an interdisciplinary approach to geometry modeling of geospatial boundaries. The objective is to extract surfaces from irregular spatial patterns using differential geometry and obtain coherent directional predictions along the boundary of extracted surfaces to enable more targeted sampling and exploration. Specific difficulties of the data include sparsity, incompleteness, ca…
▽ More
This paper describes an interdisciplinary approach to geometry modeling of geospatial boundaries. The objective is to extract surfaces from irregular spatial patterns using differential geometry and obtain coherent directional predictions along the boundary of extracted surfaces to enable more targeted sampling and exploration. Specific difficulties of the data include sparsity, incompleteness, causality and resolution disparity. Surface slopes are estimated using only sparse samples from cross-sections within a geological domain with no other information at intermediate depths. From boundary detection to subsurface reconstruction, processes are automated in between. The key problems to be solved are boundary extraction, region correspondence and propagation of the boundaries via contour morphing. Established techniques from computational physics, computer vision and signal processing are used with appropriate modifications to address challenges in each area. To facilitate boundary extraction, an edge map synthesis procedure is presented. This works with connected component analysis, anisotropic diffusion and active contours to convert unordered points into regularized boundaries. For region correspondence, component relationships are handled via graphical decomposition. FFT-based spatial alignment strategies are used in region merging and splitting scenarios. Shape changes between aligned regions are described by contour metamorphosis. Specifically, local spatial deformation is modeled by PDE and computed using level-set methods. Directional predictions are obtained using particle trajectories by following the evolving boundary. However, when a branching point is encountered, particles may lose track of the wavefront. To overcome this, a curvelet backtracking algorithm has been proposed to recover information for boundary segments without particle coverage to minimize shape distortion.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
Bayesian Surface War** Approach For Rectifying Geological Boundaries Using Displacement Likelihood And Evidence From Geochemical Assays
Authors:
Raymond Leung,
Alexander Lowe,
Anna Chlingaryan,
Arman Melkumyan,
John Zigman
Abstract:
This paper presents a Bayesian framework for manipulating mesh surfaces with the aim of improving the positional integrity of the geological boundaries that they seek to represent. The assumption is that these surfaces, created initially using sparse data, capture the global trend and provide a reasonable approximation of the stratigraphic, mineralisation and other types of boundaries for mining e…
▽ More
This paper presents a Bayesian framework for manipulating mesh surfaces with the aim of improving the positional integrity of the geological boundaries that they seek to represent. The assumption is that these surfaces, created initially using sparse data, capture the global trend and provide a reasonable approximation of the stratigraphic, mineralisation and other types of boundaries for mining exploration, but they are locally inaccurate at scales typically required for grade estimation. The proposed methodology makes local spatial corrections automatically to maximise the agreement between the modelled surfaces and observed samples. Where possible, vertices on a mesh surface are moved to provide a clear delineation, for instance, between ore and waste material across the boundary based on spatial and compositional analysis; using assay measurements collected from densely spaced, geo-registered blast holes. The maximum a posteriori (MAP) solution ultimately considers the chemistry observation likelihood in a given domain. Furthermore, it is guided by an apriori spatial structure which embeds geological domain knowledge and determines the likelihood of a displacement estimate. The results demonstrate that increasing surface fidelity can significantly improve grade estimation performance based on large-scale model validation.
△ Less
Submitted 30 March, 2021; v1 submitted 29 May, 2020;
originally announced May 2020.
-
Modelling Orebody Structures: Block Merging Algorithms and Block Model Spatial Restructuring Strategies Given Mesh Surfaces of Geological Boundaries
Authors:
Raymond Leung
Abstract:
This paper describes a framework for capturing geological structures in a 3D block model and improving its spatial fidelity given new mesh surfaces. Using surfaces that represent geological boundaries, the objectives are to identify areas where refinement is needed, increase spatial resolution to minimize surface approximation error, reduce redundancy to increase the compactness of the model and i…
▽ More
This paper describes a framework for capturing geological structures in a 3D block model and improving its spatial fidelity given new mesh surfaces. Using surfaces that represent geological boundaries, the objectives are to identify areas where refinement is needed, increase spatial resolution to minimize surface approximation error, reduce redundancy to increase the compactness of the model and identify the geological domain on a block-by-block basis. These objectives are fulfilled by four system components which perform block-surface overlap detection, spatial structure decomposition, sub-blocks consolidation and block tagging, respectively. The main contributions are a coordinate-ascent merging algorithm and a flexible architecture for updating the spatial structure of a block model when given multiple surfaces, which emphasizes the ability to selectively retain or modify previously assigned block labels. The techniques employed include block-surface intersection analysis based on the separable axis theorem and ray-tracing for establishing the location of blocks relative to surfaces. To demonstrate the robustness and applicability of the proposed block merging strategy in a more narrow setting, it is used to reduce block fragmentation in an existing model where surfaces are not given and the minimum block size is fixed. To obtain further insight, a systematic comparison with octree subblocking subsequently illustrates the inherent constraints of dyadic hierarchical decomposition and the importance of inter-scale merging. The results show the proposed method produces merged blocks with less extreme aspect ratios and is highly amenable to parallel processing. The overall framework is applicable to orebody modelling given geological boundaries, and 3D segmentation more generally, where there is a need to delineate spatial regions using mesh surfaces within a block model.
△ Less
Submitted 2 September, 2020; v1 submitted 12 January, 2020;
originally announced January 2020.
-
A convolutional autoencoder approach for mining features in cellular electron cryo-tomograms and weakly supervised coarse segmentation
Authors:
Xiangrui Zeng,
Miguel Ricardo Leung,
Tzviya Zeev-Ben-Mordehai,
Min Xu
Abstract:
Cellular electron cryo-tomography enables the 3D visualization of cellular organization in the near-native state and at submolecular resolution. However, the contents of cellular tomograms are often complex, making it difficult to automatically isolate different in situ cellular components. In this paper, we propose a convolutional autoencoder-based unsupervised approach to provide a coarse groupi…
▽ More
Cellular electron cryo-tomography enables the 3D visualization of cellular organization in the near-native state and at submolecular resolution. However, the contents of cellular tomograms are often complex, making it difficult to automatically isolate different in situ cellular components. In this paper, we propose a convolutional autoencoder-based unsupervised approach to provide a coarse grou** of 3D small subvolumes extracted from tomograms. We demonstrate that the autoencoder can be used for efficient and coarse characterization of features of macromolecular complexes and surfaces, such as membranes. In addition, the autoencoder can be used to detect non-cellular features related to sample preparation and data collection, such as carbon edges from the grid and tomogram boundaries. The autoencoder is also able to detect patterns that may indicate spatial interactions between cellular components. Furthermore, we demonstrate that our autoencoder can be used for weakly supervised semantic segmentation of cellular components, requiring a very small amount of manual annotation.
△ Less
Submitted 28 December, 2017; v1 submitted 15 June, 2017;
originally announced June 2017.