Search | arXiv e-print repository

arXiv:2402.18484 [pdf, other]

A non-intrusive machine learning framework for debiasing long-time coarse resolution climate simulations and quantifying rare events statistics

Authors: Benedikt Barthel Sorensen, Alexis Charalampopoulos, Shixuan Zhang, Bryce Harrop, Ruby Leung, Themistoklis Sapsis

Abstract: Due to the rapidly changing climate, the frequency and severity of extreme weather is expected to increase over the coming decades. As fully-resolved climate simulations remain computationally intractable, policy makers must rely on coarse-models to quantify risk for extremes. However, coarse models suffer from inherent bias due to the ignored "sub-grid" scales. We propose a framework to non-intru… ▽ More Due to the rapidly changing climate, the frequency and severity of extreme weather is expected to increase over the coming decades. As fully-resolved climate simulations remain computationally intractable, policy makers must rely on coarse-models to quantify risk for extremes. However, coarse models suffer from inherent bias due to the ignored "sub-grid" scales. We propose a framework to non-intrusively debias coarse-resolution climate predictions using neural-network (NN) correction operators. Previous efforts have attempted to train such operators using loss functions that match statistics. However, this approach falls short with events that have longer return period than that of the training data, since the reference statistics have not converged. Here, the scope is to formulate a learning method that allows for correction of dynamics and quantification of extreme events with longer return period than the training data. The key obstacle is the chaotic nature of the underlying dynamics. To overcome this challenge, we introduce a dynamical systems approach where the correction operator is trained using reference data and a coarse model simulation nudged towards that reference. The method is demonstrated on debiasing an under-resolved quasi-geostrophic model and the Energy Exascale Earth System Model (E3SM). For the former, our method enables the quantification of events that have return period two orders longer than the training data. For the latter, when trained on 8 years of ERA5 data, our approach is able to correct the coarse E3SM output to closely reflect the 36-year ERA5 statistics for all prognostic variables and significantly reduce their spatial biases. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2306.04709 [pdf]

Improved statistical benchmarking of digital pathology models using pairwise frames evaluation

Authors: Ylaine Gerardin, John Shamshoian, Judy Shen, Nhat Le, Jamie Prezioso, John Abel, Isaac Finberg, Daniel Borders, Raymond Biju, Michael Nercessian, Vaed Prasad, Joseph Lee, Spencer Wyman, Sid Gupta, Abigail Emerson, Bahar Rahsepar, Darpan Sanghavi, Ryan Leung, Limin Yu, Archit Khosla, Amaro Taylor-Weiner

Abstract: Nested pairwise frames is a method for relative benchmarking of cell or tissue digital pathology models against manual pathologist annotations on a set of sampled patches. At a high level, the method compares agreement between a candidate model and pathologist annotations with agreement among pathologists' annotations. This evaluation framework addresses fundamental issues of data size and annotat… ▽ More Nested pairwise frames is a method for relative benchmarking of cell or tissue digital pathology models against manual pathologist annotations on a set of sampled patches. At a high level, the method compares agreement between a candidate model and pathologist annotations with agreement among pathologists' annotations. This evaluation framework addresses fundamental issues of data size and annotator variability in using manual pathologist annotations as a source of ground truth for model validation. We implemented nested pairwise frames evaluation for tissue classification, cell classification, and cell count prediction tasks and show results for cell and tissue models deployed on an H&E-stained melanoma dataset. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 10 pages, 7 figures

arXiv:2305.01867 [pdf, other]

An experience with PyCUDA: Refactoring an existing implementation of a ray-surface intersection algorithm

Authors: Raymond Leung

Abstract: This article is a sequel to "GPU implementation of a ray-surface intersection algorithm in CUDA" (arXiv:2209.02878) [1]. Its main focus is PyCUDA which represents a Python scripting approach to GPU run-time code generation in the Compute Unified Device Architecture (CUDA) framework. It accompanies the open-source code distributed in GitHub which provides a PyCUDA implementation of a GPU-based line… ▽ More This article is a sequel to "GPU implementation of a ray-surface intersection algorithm in CUDA" (arXiv:2209.02878) [1]. Its main focus is PyCUDA which represents a Python scripting approach to GPU run-time code generation in the Compute Unified Device Architecture (CUDA) framework. It accompanies the open-source code distributed in GitHub which provides a PyCUDA implementation of a GPU-based line-segment, surface-triangle intersection test. The objective is to share a PyCUDA learning experience with people who are new to PyCUDA. Using the existing CUDA code and foundation from [1] as the starting point, we document the key changes made to facilitate a transition to PyCUDA. As the CUDA source for the ray-surface intersection test contains both host and device code and uses multiple kernel functions, these notes offer a substantive example and real-world perspective of what it is like to utilize PyCUDA. It delves into custom data structures such as binary radix tree and highlights some possible pitfalls. The case studies present a debugging strategy which may be used to examine complex C structures in device memory using standard Python tools without the CUDA-GDB debugger. △ Less

Submitted 4 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

Comments: 14 pages. Keywords: PyCUDA, Python scripting, GPU Run-Time Code Generation (RTCG), ray-mesh intersection, open-source code, learning, shared experience

arXiv:2302.04936 [pdf, other]

doi 10.1016/j.gsf.2023.101562

Unsupervised ore/waste classification on open-cut mine faces using close-range hyperspectral data

Authors: Lloyd Windrim, Arman Melkumyan, Richard J. Murphy, Anna Chlingaryan, Raymond Leung

Abstract: The remote map** of minerals and discrimination of ore and waste on surfaces are important tasks for geological applications such as those in mining. Such tasks have become possible using ground-based, close-range hyperspectral sensors which can remotely measure the reflectance properties of the environment with high spatial and spectral resolution. However, autonomous map** of mineral spectra… ▽ More The remote map** of minerals and discrimination of ore and waste on surfaces are important tasks for geological applications such as those in mining. Such tasks have become possible using ground-based, close-range hyperspectral sensors which can remotely measure the reflectance properties of the environment with high spatial and spectral resolution. However, autonomous map** of mineral spectra measured on an open-cut mine face remains a challenging problem due to the subtleness of differences in spectral absorption features between mineral and rock classes as well as variability in the illumination of the scene. An additional layer of difficulty arises when there is no annotated data available to train a supervised learning algorithm. A pipeline for unsupervised map** of spectra on a mine face is proposed which draws from several recent advances in the hyperspectral machine learning literature. The proposed pipeline brings together unsupervised and self-supervised algorithms in a unified system to map minerals on a mine face without the need for human-annotated training data. The pipeline is evaluated with a hyperspectral image dataset of an open-cut mine face comprising mineral ore martite and non-mineralised shale. The combined system is shown to produce a superior map to its constituent algorithms, and the consistency of its map** capability is demonstrated using data acquired at two different times of day. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: Manuscript has been accepted for publication in Geoscience Frontiers. Keywords: Hyperspectral imaging, remote sensing, mineral map**, machine learning, convolutional neural networks, transfer learning, data augmentation, illumination invariance

Journal ref: Geoscience Frontiers 14 (2023) 101562

arXiv:2302.03173 [pdf, other]

Learning bias corrections for climate models using deep neural operators

Authors: Aniruddha Bora, Khemraj Shukla, Shixuan Zhang, Bryce Harrop, Ruby Leung, George Em Karniadakis

Abstract: Numerical simulation for climate modeling resolving all important scales is a computationally taxing process. Therefore, to circumvent this issue a low resolution simulation is performed, which is subsequently corrected for bias using reanalyzed data (ERA5), known as nudging correction. The existing implementation for nudging correction uses a relaxation based method for the algebraic difference b… ▽ More Numerical simulation for climate modeling resolving all important scales is a computationally taxing process. Therefore, to circumvent this issue a low resolution simulation is performed, which is subsequently corrected for bias using reanalyzed data (ERA5), known as nudging correction. The existing implementation for nudging correction uses a relaxation based method for the algebraic difference between low resolution and ERA5 data. In this study, we replace the bias correction process with a surrogate model based on the Deep Operator Network (DeepONet). DeepONet (Deep Operator Neural Network) learns the map** from the state before nudging (a functional) to the nudging tendency (another functional). The nudging tendency is a very high dimensional data albeit having many low energy modes. Therefore, the DeepoNet is combined with a convolution based auto-encoder-decoder (AED) architecture in order to learn the nudging tendency in a lower dimensional latent space efficiently. The accuracy of the DeepONet model is tested against the nudging tendency obtained from the E3SMv2 (Energy Exascale Earth System Model) and shows good agreement. The overarching goal of this work is to deploy the DeepONet model in an online setting and replace the nudging module in the E3SM loop for better efficiency and accuracy. △ Less

Submitted 6 February, 2023; originally announced February 2023.

arXiv:2301.09771 [pdf, other]

doi 10.1109/MRA.2023.3328457

Automation and AI Technology in Surface Mining With a Brief Introduction to Open-Pit Operations in the Pilbara

Authors: Raymond Leung, Andrew J Hill, Arman Melkumyan

Abstract: This survey article provides a synopsis on some of the engineering problems, technological innovations, robotic development and automation efforts encountered in the mining industry -- particularly in the Pilbara iron-ore region of Western Australia. The goal is to paint the technology landscape and highlight issues relevant to an engineering audience to raise awareness of AI and automation trends… ▽ More This survey article provides a synopsis on some of the engineering problems, technological innovations, robotic development and automation efforts encountered in the mining industry -- particularly in the Pilbara iron-ore region of Western Australia. The goal is to paint the technology landscape and highlight issues relevant to an engineering audience to raise awareness of AI and automation trends in mining. It assumes the reader has no prior knowledge of mining and builds context gradually through focused discussion and short summaries of common open-pit mining operations. The principal activities that take place may be categorized in terms of resource development, mine-, rail- and port operations. From mineral exploration to ore shipment, there are roughly nine steps in between. These include: geological assessment, mine planning and development, production drilling and assaying, blasting and excavation, transportation of ore and waste, crush and screen, stockpile and load-out, rail network distribution, and ore-car dum**. The objective is to describe these processes and provide insights on some of the challenges/opportunities from the perspective of a decade-long industry-university R&D partnership. △ Less

Submitted 15 October, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: Accepted manuscript. Paper provides insights on state-of-the-art technologies and future trends. Keywords: Mining automation, robotics, intelligent systems, machine learning, remote sensing, geostatistics, planning, scheduling, optimization, modelling, geology, complex systems. Document: 20 pages, 5 figures, 2 tables

Journal ref: IEEE Robotics & Automation Magazine (2023)

arXiv:2209.02878 [pdf, other]

GPU implementation of a ray-surface intersection algorithm in CUDA (Compute Unified Device Architecture)

Authors: Raymond Leung

Abstract: These notes accompany the open-source code published in GitHub which implements a GPU-based line-segment, surface-triangle intersection algorithm in CUDA. It mentions some relevant works and discusses issues specific to this implementation. The goal is to provide software documentation and greater clarity on collision buffer management which is sometimes omitted in online literature. For real-worl… ▽ More These notes accompany the open-source code published in GitHub which implements a GPU-based line-segment, surface-triangle intersection algorithm in CUDA. It mentions some relevant works and discusses issues specific to this implementation. The goal is to provide software documentation and greater clarity on collision buffer management which is sometimes omitted in online literature. For real-world applications, CPU-based implementations of the test are often deemed too slow to be useful. In contrast, the code described here targets Nvidia GPU devices and offers a solution that is vastly more efficient and scalable. The main API is also wrapped in Python. This geometry test is applied in various engineering problems, so the software developed can be reused in new situations. △ Less

Submitted 6 September, 2022; originally announced September 2022.

Comments: 11 pages. Keywords: Moller-Trumbore algorithm, ray-triangle intersection, linear bounding volume hierarchy, binary radix tree, bounding box collision detection, parallel computing, GP-GPU, CUDA

arXiv:2106.05855 [pdf, other]

Empirical observations on the effects of data transformation in machine learning classification of geological domains

Authors: Raymond Leung

Abstract: In the literature, a large body of work advocates the use of log-ratio transformation for multivariate statistical analysis of compositional data. In contrast, few studies have looked at how data transformation changes the efficacy of machine learning classifiers within geoscience. This letter presents experiment results and empirical observations to further explore this issue. The objective is to… ▽ More In the literature, a large body of work advocates the use of log-ratio transformation for multivariate statistical analysis of compositional data. In contrast, few studies have looked at how data transformation changes the efficacy of machine learning classifiers within geoscience. This letter presents experiment results and empirical observations to further explore this issue. The objective is to study the effects of data transformation on geozone classification performance when machine learning (ML) classifiers/estimators are trained using geochemical data. The training input consists of exploration hole assay samples obtained from a Pilbara iron-ore deposit in Western Australia, and geozone labels assigned based on stratigraphic units, the absence or presence and type of mineralization. The ML techniques considered are multinomial logistic regression, Gaussian naïve Bayes, kNN, linear support vector classifier, RBF-SVM, gradient boosting and extreme GB, random forest (RF) and multi-layer perceptron (MLP). The transformations examined include isometric log-ratio (ILR), center log-ratio (CLR) coupled with principal component analysis (PCA) or independent component analysis (ICA), and a manifold learning approach based on local linear embedding (LLE). The results reveal that different ML classifiers exhibit varying sensitivity to these transformations, with some clearly more advantageous or deleterious than others. Overall, the best performing candidate is ILR which is unsurprising considering the compositional nature of the data. The performance of pairwise log-ratio (PWLR) transformation is better than ILR for ensemble and tree-based learners such as boosting and RF; but worse for MLP, SVM and other classifiers. △ Less

Submitted 4 June, 2021; originally announced June 2021.

Comments: Keywords: Compositional data, supervised learning, geological domain, likelihood estimation, classification performance, effects of data transformation. 10 page article, 2 figures, 7 tables

arXiv:2103.03923 [pdf, other]

doi 10.1007/s11004-021-09967-5

Surface War** Incorporating Machine Learning Assisted Domain Likelihood Estimation: A New Paradigm in Mine Geology Modelling and Automation

Authors: Raymond Leung, Mehala Balamurali, Alexander Lowe

Abstract: This paper illustrates an application of machine learning (ML) within a complex system that performs grade estimation. In surface mining, assay measurements taken from production drilling often provide useful information that allows initially inaccurate surfaces created using sparse exploration data to be revised and subsequently improved. Recently, a Bayesian war** technique has been proposed t… ▽ More This paper illustrates an application of machine learning (ML) within a complex system that performs grade estimation. In surface mining, assay measurements taken from production drilling often provide useful information that allows initially inaccurate surfaces created using sparse exploration data to be revised and subsequently improved. Recently, a Bayesian war** technique has been proposed to reshape modeled surfaces using geochemical and spatial constraints imposed by newly acquired blasthole data. This paper focuses on incorporating machine learning into this war** framework to make the likelihood computation generalizable. The technique works by adjusting the position of vertices on the surface to maximize the integrity of modeled geological boundaries with respect to sparse geochemical observations. Its foundation is laid by a Bayesian derivation in which the geological domain likelihood given the chemistry, p(g|c), plays a similar role to p(y(c)|g). This observation allows a manually calibrated process centered around the latter to be automated since ML techniques may be used to estimate the former in a data-driven way. Machine learning performance is evaluated for gradient boosting, neural network, random forest and other classifiers in a binary and multi-class context using precision and recall rates. Once ML likelihood estimators are integrated in the surface war** framework, surface sha** performance is evaluated using unseen data by examining the categorical distribution of test samples located above and below the warped surface. Large-scale validation experiments are performed to assess the overall efficacy of ML assisted surface war** as a fully integrated component within an ore grade estimation system where the posterior mean is obtained via Gaussian Process inference with a Matern 3/2 kernel. △ Less

Submitted 13 September, 2021; v1 submitted 15 February, 2021; originally announced March 2021.

Comments: Keywords: Bayesian computation, machine learning, ensemble classifiers, neural network, mesh geometry, surface war**, geochemistry, domain likelihood, geological boundaries. 23 pages, 15 figures, 11 tables

ACM Class: I.3.5; I.2.1; G.3; J.2

Journal ref: Mathematical Geosciences (2021)

arXiv:2006.03752 [pdf, other]

doi 10.1109/ACCESS.2019.2951605

Subsurface Boundary Geometry Modeling: Applying Computational Physics, Computer Vision and Signal Processing Techniques to Geoscience

Authors: Raymond Leung

Abstract: This paper describes an interdisciplinary approach to geometry modeling of geospatial boundaries. The objective is to extract surfaces from irregular spatial patterns using differential geometry and obtain coherent directional predictions along the boundary of extracted surfaces to enable more targeted sampling and exploration. Specific difficulties of the data include sparsity, incompleteness, ca… ▽ More This paper describes an interdisciplinary approach to geometry modeling of geospatial boundaries. The objective is to extract surfaces from irregular spatial patterns using differential geometry and obtain coherent directional predictions along the boundary of extracted surfaces to enable more targeted sampling and exploration. Specific difficulties of the data include sparsity, incompleteness, causality and resolution disparity. Surface slopes are estimated using only sparse samples from cross-sections within a geological domain with no other information at intermediate depths. From boundary detection to subsurface reconstruction, processes are automated in between. The key problems to be solved are boundary extraction, region correspondence and propagation of the boundaries via contour morphing. Established techniques from computational physics, computer vision and signal processing are used with appropriate modifications to address challenges in each area. To facilitate boundary extraction, an edge map synthesis procedure is presented. This works with connected component analysis, anisotropic diffusion and active contours to convert unordered points into regularized boundaries. For region correspondence, component relationships are handled via graphical decomposition. FFT-based spatial alignment strategies are used in region merging and splitting scenarios. Shape changes between aligned regions are described by contour metamorphosis. Specifically, local spatial deformation is modeled by PDE and computed using level-set methods. Directional predictions are obtained using particle trajectories by following the evolving boundary. However, when a branching point is encountered, particles may lose track of the wavefront. To overcome this, a curvelet backtracking algorithm has been proposed to recover information for boundary segments without particle coverage to minimize shape distortion. △ Less

Submitted 5 June, 2020; originally announced June 2020.

Comments: Keywords: Interdisciplinary research, active contours, backtracking, contour morphing, directional prediction, particle trajectories, spatial correspondence, subsurface boundaries, wavefront propagation. 23 page article, 17 figures

ACM Class: J.2; I.3.5

Journal ref: IEEE Access 7 (2019) 161680-161696

arXiv:2005.14427 [pdf, other]

doi 10.1145/3476979

Bayesian Surface War** Approach For Rectifying Geological Boundaries Using Displacement Likelihood And Evidence From Geochemical Assays

Authors: Raymond Leung, Alexander Lowe, Anna Chlingaryan, Arman Melkumyan, John Zigman

Abstract: This paper presents a Bayesian framework for manipulating mesh surfaces with the aim of improving the positional integrity of the geological boundaries that they seek to represent. The assumption is that these surfaces, created initially using sparse data, capture the global trend and provide a reasonable approximation of the stratigraphic, mineralisation and other types of boundaries for mining e… ▽ More This paper presents a Bayesian framework for manipulating mesh surfaces with the aim of improving the positional integrity of the geological boundaries that they seek to represent. The assumption is that these surfaces, created initially using sparse data, capture the global trend and provide a reasonable approximation of the stratigraphic, mineralisation and other types of boundaries for mining exploration, but they are locally inaccurate at scales typically required for grade estimation. The proposed methodology makes local spatial corrections automatically to maximise the agreement between the modelled surfaces and observed samples. Where possible, vertices on a mesh surface are moved to provide a clear delineation, for instance, between ore and waste material across the boundary based on spatial and compositional analysis; using assay measurements collected from densely spaced, geo-registered blast holes. The maximum a posteriori (MAP) solution ultimately considers the chemistry observation likelihood in a given domain. Furthermore, it is guided by an apriori spatial structure which embeds geological domain knowledge and determines the likelihood of a displacement estimate. The results demonstrate that increasing surface fidelity can significantly improve grade estimation performance based on large-scale model validation. △ Less

Submitted 30 March, 2021; v1 submitted 29 May, 2020; originally announced May 2020.

Comments: Keywords: Geochemistry-based Bayesian deformable surface (GC-BDS) model, Bayesian computation, mesh geometry, surface war**, spatial correction, displacement likelihood, geological boundaries, model integrity. 19 page article, 15 figures, 3 tables

ACM Class: I.3.5; G.3; J.2

Journal ref: ACM Transactions on Spatial Algorithms and Systems, 2021

arXiv:2001.04023 [pdf, other]

doi 10.5311/JOSIS.2020.21.582

Modelling Orebody Structures: Block Merging Algorithms and Block Model Spatial Restructuring Strategies Given Mesh Surfaces of Geological Boundaries

Authors: Raymond Leung

Abstract: This paper describes a framework for capturing geological structures in a 3D block model and improving its spatial fidelity given new mesh surfaces. Using surfaces that represent geological boundaries, the objectives are to identify areas where refinement is needed, increase spatial resolution to minimize surface approximation error, reduce redundancy to increase the compactness of the model and i… ▽ More This paper describes a framework for capturing geological structures in a 3D block model and improving its spatial fidelity given new mesh surfaces. Using surfaces that represent geological boundaries, the objectives are to identify areas where refinement is needed, increase spatial resolution to minimize surface approximation error, reduce redundancy to increase the compactness of the model and identify the geological domain on a block-by-block basis. These objectives are fulfilled by four system components which perform block-surface overlap detection, spatial structure decomposition, sub-blocks consolidation and block tagging, respectively. The main contributions are a coordinate-ascent merging algorithm and a flexible architecture for updating the spatial structure of a block model when given multiple surfaces, which emphasizes the ability to selectively retain or modify previously assigned block labels. The techniques employed include block-surface intersection analysis based on the separable axis theorem and ray-tracing for establishing the location of blocks relative to surfaces. To demonstrate the robustness and applicability of the proposed block merging strategy in a more narrow setting, it is used to reduce block fragmentation in an existing model where surfaces are not given and the minimum block size is fixed. To obtain further insight, a systematic comparison with octree subblocking subsequently illustrates the inherent constraints of dyadic hierarchical decomposition and the importance of inter-scale merging. The results show the proposed method produces merged blocks with less extreme aspect ratios and is highly amenable to parallel processing. The overall framework is applicable to orebody modelling given geological boundaries, and 3D segmentation more generally, where there is a need to delineate spatial regions using mesh surfaces within a block model. △ Less

Submitted 2 September, 2020; v1 submitted 12 January, 2020; originally announced January 2020.

Comments: Keywords: Block merging algorithms, block model structure, spatial restructuring, mesh surfaces, subsurface modelling, geological structures, sub-blocking, boundary correction, domain identification, iterative refinement, geospatial information system. 27 page article, 26 figures, 6 tables, plus supplementary material (17 pages)

ACM Class: J.2; I.3.5; I.3.8

Journal ref: Journal of Spatial Information Science 21 (2020) 137-174

arXiv:1706.04970 [pdf, other]

doi 10.1016/j.jsb.2017.12.015

A convolutional autoencoder approach for mining features in cellular electron cryo-tomograms and weakly supervised coarse segmentation

Authors: Xiangrui Zeng, Miguel Ricardo Leung, Tzviya Zeev-Ben-Mordehai, Min Xu

Abstract: Cellular electron cryo-tomography enables the 3D visualization of cellular organization in the near-native state and at submolecular resolution. However, the contents of cellular tomograms are often complex, making it difficult to automatically isolate different in situ cellular components. In this paper, we propose a convolutional autoencoder-based unsupervised approach to provide a coarse groupi… ▽ More Cellular electron cryo-tomography enables the 3D visualization of cellular organization in the near-native state and at submolecular resolution. However, the contents of cellular tomograms are often complex, making it difficult to automatically isolate different in situ cellular components. In this paper, we propose a convolutional autoencoder-based unsupervised approach to provide a coarse grou** of 3D small subvolumes extracted from tomograms. We demonstrate that the autoencoder can be used for efficient and coarse characterization of features of macromolecular complexes and surfaces, such as membranes. In addition, the autoencoder can be used to detect non-cellular features related to sample preparation and data collection, such as carbon edges from the grid and tomogram boundaries. The autoencoder is also able to detect patterns that may indicate spatial interactions between cellular components. Furthermore, we demonstrate that our autoencoder can be used for weakly supervised semantic segmentation of cellular components, requiring a very small amount of manual annotation. △ Less

Submitted 28 December, 2017; v1 submitted 15 June, 2017; originally announced June 2017.

Comments: Accepted by Journal of Structural Biology

Showing 1–13 of 13 results for author: Leung, R