Search | arXiv e-print repository

Explainable AI Integrated Feature Engineering for Wildfire Prediction

Authors: Di Fan, Ayan Biswas, James Paul Ahrens

Abstract: Wildfires present intricate challenges for prediction, necessitating the use of sophisticated machine learning techniques for effective modeling\cite{jain2020review}. In our research, we conducted a thorough assessment of various machine learning algorithms for both classification and regression tasks relevant to predicting wildfires. We found that for classifying different types or stages of wild… ▽ More Wildfires present intricate challenges for prediction, necessitating the use of sophisticated machine learning techniques for effective modeling\cite{jain2020review}. In our research, we conducted a thorough assessment of various machine learning algorithms for both classification and regression tasks relevant to predicting wildfires. We found that for classifying different types or stages of wildfires, the XGBoost model outperformed others in terms of accuracy and robustness. Meanwhile, the Random Forest regression model showed superior results in predicting the extent of wildfire-affected areas, excelling in both prediction error and explained variance. Additionally, we developed a hybrid neural network model that integrates numerical data and image information for simultaneous classification and regression. To gain deeper insights into the decision-making processes of these models and identify key contributing features, we utilized eXplainable Artificial Intelligence (XAI) techniques, including TreeSHAP, LIME, Partial Dependence Plots (PDP), and Gradient-weighted Class Activation Map** (Grad-CAM). These interpretability tools shed light on the significance and interplay of various features, highlighting the complex factors influencing wildfire predictions. Our study not only demonstrates the effectiveness of specific machine learning models in wildfire-related tasks but also underscores the critical role of model transparency and interpretability in environmental science applications. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: arXiv admin note: text overlap with arXiv:2307.09615 by other authors

arXiv:2403.08317 [pdf, other]

From Channel Measurement to Training Data for PHY Layer AI Applications

Authors: Michael Zentarra, Julian Ahrens, Lia Ahrens

Abstract: Learning-based techniques such as artificial intelligence (AI) and machine learning (ML) play an increasingly important role in the development of future communication networks. The success of a learning algorithm depends on the quality and quantity of the available training data. In the physical layer (PHY), channel information data can be obtained either through measurement campaigns or through… ▽ More Learning-based techniques such as artificial intelligence (AI) and machine learning (ML) play an increasingly important role in the development of future communication networks. The success of a learning algorithm depends on the quality and quantity of the available training data. In the physical layer (PHY), channel information data can be obtained either through measurement campaigns or through simulations based on predefined channel models. Performing measurements can be time consuming while only gaining information about one specific position or scenario. Simulated data, on the other hand, are more generalized and reflect in most cases not a real environment but instead, a statistical approximation based on a mathematical model. This paper presents a procedure for acquiring channel data by means of fast and flexible software defined radio (SDR) based channel measurements along with a method for a parameter extraction that provides configuration input to the simulator. The procedure from the measurement to the simulated channel data is demonstrated in two exemplary propagation scenarios. It is shown, that in both cases the simulated data is in good accordance to the measurements △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: This is a preprint, the full paper has been accepted by the Workshop on Next Generation Networks and Applications 2022 (NGNA)

arXiv:2309.16980 [pdf, other]

Analyzing Impact of Data Reduction Techniques on Visualization for AMR Applications Using AMReX Framework

Authors: Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, James Ahrens, Dingwen Tao

Abstract: Today's scientific simulations generate exceptionally large volumes of data, challenging the capacities of available I/O bandwidth and storage space. This necessitates a substantial reduction in data volume, for which error-bounded lossy compression has emerged as a highly effective strategy. A crucial metric for assessing the efficacy of lossy compression is visualization. Despite extensive resea… ▽ More Today's scientific simulations generate exceptionally large volumes of data, challenging the capacities of available I/O bandwidth and storage space. This necessitates a substantial reduction in data volume, for which error-bounded lossy compression has emerged as a highly effective strategy. A crucial metric for assessing the efficacy of lossy compression is visualization. Despite extensive research on the impact of compression on visualization, there is a notable gap in the literature concerning the effects of compression on the visualization of Adaptive Mesh Refinement (AMR) data. AMR has proven to be a potent solution for addressing the rising computational intensity and the explosive growth in data volume that requires storage and transmission. However, the hierarchical and multi-resolution characteristics of AMR data introduce unique challenges to its visualization, and these challenges are further compounded when data compression comes into play. This article delves into the intricacies of how data compression influences and introduces novel challenges to the visualization of AMR data. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2308.11724 [pdf, other]

MolSieve: A Progressive Visual Analytics System for Molecular Dynamics Simulations

Authors: Rostyslav Hnatyshyn, Jieqiong Zhao, Danny Perez, James Ahrens, Ross Maciejewski

Abstract: Molecular Dynamics (MD) simulations are ubiquitous in cutting-edge physio-chemical research. They provide critical insights into how a physical system evolves over time given a model of interatomic interactions. Understanding a system's evolution is key to selecting the best candidates for new drugs, materials for manufacturing, and countless other practical applications. With today's technology,… ▽ More Molecular Dynamics (MD) simulations are ubiquitous in cutting-edge physio-chemical research. They provide critical insights into how a physical system evolves over time given a model of interatomic interactions. Understanding a system's evolution is key to selecting the best candidates for new drugs, materials for manufacturing, and countless other practical applications. With today's technology, these simulations can encompass millions of unit transitions between discrete molecular structures, spanning up to several milliseconds of real time. Attempting to perform a brute-force analysis with data-sets of this size is not only computationally impractical, but would not shed light on the physically-relevant features of the data. Moreover, there is a need to analyze simulation ensembles in order to compare similar processes in differing environments. These problems call for an approach that is analytically transparent, computationally efficient, and flexible enough to handle the variety found in materials based research. In order to address these problems, we introduce MolSieve, a progressive visual analytics system that enables the comparison of multiple long-duration simulations. Using MolSieve, analysts are able to quickly identify and compare regions of interest within immense simulations through its combination of control charts, data-reduction techniques, and highly informative visual components. A simple programming interface is provided which allows experts to fit MolSieve to their needs. To demonstrate the efficacy of our approach, we present two case studies of MolSieve and report on findings from domain collaborators. △ Less

Submitted 5 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: Updated references to GPCCA

arXiv:2307.09609 [pdf, other]

AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications

Authors: Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, Sian **, Houjun Tang, Jean Sexton, Sheng Di, Zarija Lukić, Kai Zhao, Bo Fang, Franck Cappello, James Ahrens, Dingwen Tao

Abstract: As supercomputers advance towards exascale capabilities, computational intensity increases significantly, and the volume of data requiring storage and transmission experiences exponential growth. Adaptive Mesh Refinement (AMR) has emerged as an effective solution to address these two challenges. Concurrently, error-bounded lossy compression is recognized as one of the most efficient approaches to… ▽ More As supercomputers advance towards exascale capabilities, computational intensity increases significantly, and the volume of data requiring storage and transmission experiences exponential growth. Adaptive Mesh Refinement (AMR) has emerged as an effective solution to address these two challenges. Concurrently, error-bounded lossy compression is recognized as one of the most efficient approaches to tackle the latter issue. Despite their respective advantages, few attempts have been made to investigate how AMR and error-bounded lossy compression can function together. To this end, this study presents a novel in-situ lossy compression framework that employs the HDF5 filter to improve both I/O costs and boost compression quality for AMR applications. We implement our solution into the AMReX framework and evaluate on two real-world AMR applications, Nyx and WarpX, on the Summit supercomputer. Experiments with 4096 CPU cores demonstrate that AMRIC improves the compression ratio by up to 81X and the I/O performance by up to 39X over AMReX's original compression solution. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 12 pages, 18 figures, 3 tables, accepted by ACM/IEEE SC '23

arXiv:2301.06027 [pdf]

Pluto's Surface Map** using Unsupervised Learning from Near-Infrared Observations of LEISA/Ralph

Authors: A. Emran, C. M. Dalle Ore, C. J. Ahrens, M. K. H. Khan, V. F. Chevrier, D. P. Cruikshank

Abstract: We map the surface of Pluto using an unsupervised machine learning technique using the near-infrared observations of the LEISA/Ralph instrument onboard NASA's New Horizons spacecraft. The principal component reduced Gaussian mixture model was implemented to investigate the geographic distribution of the surface units across the dwarf planet. We also present the likelihood of each surface unit at t… ▽ More We map the surface of Pluto using an unsupervised machine learning technique using the near-infrared observations of the LEISA/Ralph instrument onboard NASA's New Horizons spacecraft. The principal component reduced Gaussian mixture model was implemented to investigate the geographic distribution of the surface units across the dwarf planet. We also present the likelihood of each surface unit at the image pixel level. Average I/F spectra of each unit were analyzed -- in terms of the position and strengths of absorption bands of abundant volatiles such as N${}_{2}$, CH${}_{4}$, and CO and nonvolatile H${}_{2}$O -- to connect the unit to surface composition, geology, and geographic location. The distribution of surface units shows a latitudinal pattern with distinct surface compositions of volatiles -- consistent with the existing literature. However, previous map** efforts were based primarily on compositional analysis using spectral indices (indicators) or implementation of complex radiative transfer models, which need (prior) expert knowledge, label data, or optical constants of representative endmembers. We prove that an application of unsupervised learning in this instance renders a satisfactory result in map** the spatial distribution of ice compositions without any prior information or label data. Thus, such an application is specifically advantageous for a planetary surface map** when label data are poorly constrained or completely unknown, because an understanding of surface material distribution is vital for volatile transport modeling at the planetary scale. We emphasize that the unsupervised learning used in this study has wide applicability and can be expanded to other planetary bodies of the Solar System for map** surface material distribution. △ Less

Submitted 15 January, 2023; originally announced January 2023.

Comments: Accepted for publication in The Planetary Science Journal. 50 pages, 17 figures including appendix

arXiv:2301.01901 [pdf, other]

doi 10.1109/TPDS.2023.3339474

TAC+: Optimizing Error-Bounded Lossy Compression for 3D AMR Simulations

Authors: Daoce Wang, Jesus Pulido, Pascal Grosset, Sian **, Jiannan Tian, Kai Zhao, James Ahrens, Dingwen Tao

Abstract: Today's scientific simulations require significant data volume reduction because of the enormous amounts of data produced and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simula… ▽ More Today's scientific simulations require significant data volume reduction because of the enormous amounts of data produced and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simulation data. Unlike the previous work that only leverages 1D compression, in this work, we propose an approach (TAC) to leverage high-dimensional SZ compression for each refinement level of AMR data. To remove the data redundancy across different levels, we propose several pre-process strategies and adaptively use them based on the data features. We further optimize TAC to TAC+ by improving the lossless encoding stage of SZ compression to handle many small AMR data blocks after the pre-processing efficiently. Experiments on 10 AMR datasets from three real-world large-scale AMR simulations demonstrate that TAC+ can improve the compression ratio by up to 4.9$\times$ under the same data distortion, compared to the state-of-the-art method. In addition, we leverage the flexibility of our approach to tune the error bound for each level, which achieves much lower data distortion on two application-specific metrics. △ Less

Submitted 5 December, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: 18 pages, 30 figures, 5 tables, accepted by IEEE TPDS. arXiv admin note: substantial text overlap with arXiv:2204.00711

arXiv:2211.00584 [pdf, ps, other]

Ambisonic Encoding of Signals From Equatorial Microphone Arrays

Authors: Jens Ahrens

Abstract: The equatorial microphone array presented in (Ahrens et al., 2021) computes a spherical harmonic (SH) representation of a sound field based on pressure sensors along the equator of a rigid spherical baffle. The original formulation uses complex-valued SH basis functions. This is inconvenient if the SH representation of the captured sound field is intended to be stored in time domain by means of re… ▽ More The equatorial microphone array presented in (Ahrens et al., 2021) computes a spherical harmonic (SH) representation of a sound field based on pressure sensors along the equator of a rigid spherical baffle. The original formulation uses complex-valued SH basis functions. This is inconvenient if the SH representation of the captured sound field is intended to be stored in time domain by means of real-valued audio signals as it is common in the spatial audio format of ambisonics. The present document summarizes the modifications that need to be applied to the mathematical formulation from (Ahrens et al., 2021) to produce an ambisonic representation of the captured sound field that is compatible with the established ambisonic software tools like SPARTA and the IEM Plugin Suite. An example MATLAB script that implements this formulation is provided. △ Less

Submitted 18 September, 2022; originally announced November 2022.

arXiv:2211.00583 [pdf, ps, other]

Ambisonic Encoding of Signals From Spherical Microphone Arrays

Authors: Jens Ahrens

Abstract: This document illustrates how to process the signals from the microphones of a rigid-sphere higher-order ambisonic microphone array so that they are encoded with N3D normalization and ACN channel order and thereby can be used with the standard ambisonic software tools such as SPARTA and the IEM Plugin Suite. A MATLAB script is provided. This document illustrates how to process the signals from the microphones of a rigid-sphere higher-order ambisonic microphone array so that they are encoded with N3D normalization and ACN channel order and thereby can be used with the standard ambisonic software tools such as SPARTA and the IEM Plugin Suite. A MATLAB script is provided. △ Less

Submitted 18 September, 2022; originally announced November 2022.

arXiv:2207.09733 [pdf, other]

doi 10.1109/TASLP.2023.3240657

Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses

Authors: Thomas Deppisch, Sebastià V. Amengual Garí, Paul Calamia, Jens Ahrens

Abstract: Psychoacoustic experiments have shown that directional properties of the direct sound, salient reflections, and the late reverberation of an acoustic room response can have a distinct influence on the auditory perception of a given room. Spatial room impulse responses (SRIRs) capture those properties and thus are used for direction-dependent room acoustic analysis and virtual acoustic rendering. T… ▽ More Psychoacoustic experiments have shown that directional properties of the direct sound, salient reflections, and the late reverberation of an acoustic room response can have a distinct influence on the auditory perception of a given room. Spatial room impulse responses (SRIRs) capture those properties and thus are used for direction-dependent room acoustic analysis and virtual acoustic rendering. This work proposes a subspace method that decomposes SRIRs into a direct part, which comprises the direct sound and the salient reflections, and a residual, to facilitate enhanced analysis and rendering methods by providing individual access to these components. The proposed method is based on the generalized singular value decomposition and interprets the residual as noise that is to be separated from the other components of the reverberation. Large generalized singular values are attributed to the direct part, which is then obtained as a low-rank approximation of the SRIR. By advancing from the end of the SRIR toward the beginning while iteratively updating the residual estimate, the method adapts to spatio-temporal variations of the residual. The method is evaluated using a spatio-spectral error measure and simulated SRIRs of different rooms, microphone arrays, and ratios of direct sound to residual energy. The proposed method creates lower errors than existing approaches in all tested scenarios, including a scenario with two simultaneous reflections. A case study with measured SRIRs shows the applicability of the method under real-world acoustic conditions. A reference implementation is provided. △ Less

Submitted 31 January, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 927-942, 2023

arXiv:2204.00711 [pdf, other]

doi 10.1145/3502181.3531458

TAC: Optimizing Error-Bounded Lossy Compression for Three-Dimensional Adaptive Mesh Refinement Simulations

Authors: Daoce Wang, Jesus Pulido, Pascal Grosset, Sian **, Jiannan Tian, James Ahrens, Dingwen Tao

Abstract: Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement… ▽ More Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simulation data. Unlike the previous work that only leverages 1D compression, in this work, we propose to leverage high-dimensional (e.g., 3D) compression for each refinement level of AMR data. To remove the data redundancy across different levels, we propose three pre-process strategies and adaptively use them based on the data characteristics. Experiments on seven AMR datasets from a real-world large-scale AMR simulation demonstrate that our proposed approach can improve the compression ratio by up to 3.3X under the same data distortion, compared to the state-of-the-art method. In addition, we leverage the flexibility of our approach to tune the error bound for each level, which achieves much lower data distortion on two application-specific metrics. △ Less

Submitted 6 May, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

Comments: 13 pages, 19 figures, 3 tables, published by ACM HPDC 2022

arXiv:2202.04393 [pdf, other]

Binaural Audio Rendering in the Spherical Harmonic Domain: A Summary of the Mathematics and its Pitfalls

Authors: Jens Ahrens

Abstract: The present document reviews the mathematics behind binaural rendering of sound fields that are available as spherical harmonic expansion coefficients. This process is also known as binaural ambisonic decoding. We highlight that the details entail some amount peculiarity so that one has to be well aware of the precise definitions that are chosen for some of the involved quantities to obtain a cons… ▽ More The present document reviews the mathematics behind binaural rendering of sound fields that are available as spherical harmonic expansion coefficients. This process is also known as binaural ambisonic decoding. We highlight that the details entail some amount peculiarity so that one has to be well aware of the precise definitions that are chosen for some of the involved quantities to obtain a consistent formulation. We also discuss what sets of definitions produce ambisonic signals that are compatible with the most common software tools that are available. △ Less

Submitted 14 September, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

arXiv:2201.02557 [pdf, other]

doi 10.1016/j.jocs.2022.101773

In Situ Data Summaries for Flexible Feature Analysis in Large-Scale Multiphase Flow Simulations

Authors: Soumya Dutta, Terece Turton, David Rogers, Jordan Musser, James Ahrens, Ann Almgren

Abstract: The study of multiphase flow is essential for understanding the complex interactions of various materials. In particular, when designing chemical reactors such as fluidized bed reactors (FBR), a detailed understanding of the hydrodynamics is critical for optimizing reactor performance and stability. An FBR allows experts to conduct different types of chemical reactions involving multiphase materia… ▽ More The study of multiphase flow is essential for understanding the complex interactions of various materials. In particular, when designing chemical reactors such as fluidized bed reactors (FBR), a detailed understanding of the hydrodynamics is critical for optimizing reactor performance and stability. An FBR allows experts to conduct different types of chemical reactions involving multiphase materials, especially interaction between gas and solids. During such complex chemical processes, formation of void regions in the reactor, generally termed as bubbles, is an important phenomenon. Study of these bubbles has a deep implication in predicting the reactor's overall efficiency. But physical experiments needed to understand bubble dynamics are costly and non-trivial. Therefore, to study such chemical processes and bubble dynamics, a state-of-the-art massively parallel computational fluid dynamics discrete element model (CFD-DEM), MFIX-Exa is being developed for simulating multiphase flows. Despite the proven accuracy of MFIX-Exa in modeling bubbling phenomena, the very-large size of the output data prohibits the use of traditional post hoc analysis capabilities in both storage and I/O time. To address these issues and allow the application scientists to explore the bubble dynamics in an efficient and timely manner, we have developed an end-to-end visual analytics pipeline that enables in situ detection of bubbles using statistical techniques, followed by a flexible and interactive visual exploration of bubble dynamics in the post hoc analysis phase. Positive feedback from the experts has indicated the efficacy of the proposed approach for exploring bubble dynamics in very-large scale multiphase flow simulations. △ Less

Submitted 7 January, 2022; originally announced January 2022.

Journal ref: Journal of Computational Science, 2022

arXiv:2104.00178 [pdf, other]

doi 10.1145/3431379.3460653

Adaptive Configuration of In Situ Lossy Compression for Cosmology Simulations via Fine-Grained Rate-Quality Modeling

Authors: Sian **, Jesus Pulido, Pascal Grosset, Jiannan Tian, Dingwen Tao, James Ahrens

Abstract: Extreme-scale cosmological simulations have been widely used by today's researchers and scientists on leadership supercomputers. A new generation of error-bounded lossy compressors has been used in workflows to reduce storage requirements and minimize the impact of throughput limitations while saving large snapshots of high-fidelity data for post-hoc analysis. In this paper, we propose to adaptive… ▽ More Extreme-scale cosmological simulations have been widely used by today's researchers and scientists on leadership supercomputers. A new generation of error-bounded lossy compressors has been used in workflows to reduce storage requirements and minimize the impact of throughput limitations while saving large snapshots of high-fidelity data for post-hoc analysis. In this paper, we propose to adaptively provide compression configurations to compute partitions of cosmological simulations with newly designed post-analysis aware rate-quality modeling. The contribution is fourfold: (1) We propose a novel adaptive approach to select feasible error bounds for different partitions, showing the possibility and efficiency of adaptively configuring lossy compression for each partition individually. (2) We build models to estimate the overall loss of post-analysis result due to lossy compression and to estimate compression ratio, based on the property of each partition. (3) We develop an efficient optimization guideline to determine the best-fit configuration of error bounds combination in order to maximize the compression ratio under acceptable post-analysis quality loss. (4) Our approach introduces negligible overheads for feature extraction and error-bound optimization for each partition, enabling post-analysis-aware in situ lossy compression for cosmological simulations. Experiments show that our proposed models are highly accurate and reliable. Our fine-grained adaptive configuration approach improves the compression ratio of up to 73% on the tested datasets with the same post-analysis distortion with only 1% performance overhead. △ Less

Submitted 20 April, 2021; v1 submitted 31 March, 2021; originally announced April 2021.

Comments: 13 pages, 19 figures, 2 tables, published by HPDC'21

arXiv:2010.03936 [pdf, other]

Cinema Darkroom: A Deferred Rendering Framework for Large-Scale Datasets

Authors: Jonas Lukasczyk, Christoph Garth, Matthew Larsen, Wito Engelke, Ingrid Hotz, David Rogers, James Ahrens, Ross Maciejewski

Abstract: This paper presents a framework that fully leverages the advantages of a deferred rendering approach for the interactive visualization of large-scale datasets. Geometry buffers (G-Buffers) are generated and stored in situ, and shading is performed post hoc in an interactive image-based rendering front end. This decoupled framework has two major advantages. First, the G-Buffers only need to be comp… ▽ More This paper presents a framework that fully leverages the advantages of a deferred rendering approach for the interactive visualization of large-scale datasets. Geometry buffers (G-Buffers) are generated and stored in situ, and shading is performed post hoc in an interactive image-based rendering front end. This decoupled framework has two major advantages. First, the G-Buffers only need to be computed and stored once---which corresponds to the most expensive part of the rendering pipeline. Second, the stored G-Buffers can later be consumed in an image-based rendering front end that enables users to interactively adjust various visualization parameters---such as the applied color map or the strength of ambient occlusion---where suitable choices are often not known a priori. This paper demonstrates the use of Cinema Darkroom on several real-world datasets, highlighting CD's ability to effectively decouple the complexity and size of the dataset from its visualization. △ Less

Submitted 8 October, 2020; originally announced October 2020.

arXiv:2004.00224 [pdf, other]

doi 10.1109/IPDPS47924.2020.00021

Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations

Authors: Sian **, Pascal Grosset, Christopher M. Biwer, Jesus Pulido, Jiannan Tian, Dingwen Tao, James Ahrens

Abstract: To help understand our universe better, researchers and scientists currently run extreme-scale cosmology simulations on leadership supercomputers. However, such simulations can generate large amounts of scientific data, which often result in expensive costs in data associated with data movement and storage. Lossy compression techniques have become attractive because they significantly reduce data… ▽ More To help understand our universe better, researchers and scientists currently run extreme-scale cosmology simulations on leadership supercomputers. However, such simulations can generate large amounts of scientific data, which often result in expensive costs in data associated with data movement and storage. Lossy compression techniques have become attractive because they significantly reduce data size and can maintain high data fidelity for post-analysis. In this paper, we propose to use GPU-based lossy compression for extreme-scale cosmological simulations. Our contributions are threefold: (1) we implement multiple GPU-based lossy compressors to our opensource compression benchmark and analysis framework named Foresight; (2) we use Foresight to comprehensively evaluate the practicality of using GPU-based lossy compression on two real-world extreme-scale cosmology simulations, namely HACC and Nyx, based on a series of assessment metrics; and (3) we develop a general optimization guideline on how to determine the best-fit configurations for different lossy compressors and cosmological simulations. Experiments show that GPU-based lossy compression can provide necessary accuracy on post-analysis for cosmological simulations and high compression ratio of 5~15x on the tested datasets, as well as much higher compression and decompression throughput than CPU-based compressors. △ Less

Submitted 2 July, 2020; v1 submitted 1 April, 2020; originally announced April 2020.

Comments: 11 pages, 10 figures, published by IEEE IPDPS '20

arXiv:1912.03587 [pdf, other]

Deep Learning-Based Feature-Aware Data Modeling for Complex Physics Simulations

Authors: Qun Liu, Subhashis Hazarika, John M. Patchett, James Paul Ahrens, Ayan Biswas

Abstract: Data modeling and reduction for in situ is important. Feature-driven methods for in situ data analysis and reduction are a priority for future exascale machines as there are currently very few such methods. We investigate a deep-learning based workflow that targets in situ data processing using autoencoders. We propose a Residual Autoencoder integrated Residual in Residual Dense Block (RRDB) to ob… ▽ More Data modeling and reduction for in situ is important. Feature-driven methods for in situ data analysis and reduction are a priority for future exascale machines as there are currently very few such methods. We investigate a deep-learning based workflow that targets in situ data processing using autoencoders. We propose a Residual Autoencoder integrated Residual in Residual Dense Block (RRDB) to obtain better performance. Our proposed framework compressed our test data into 66 KB from 2.1 MB per 3D volume timestep. △ Less

Submitted 7 December, 2019; originally announced December 2019.

Comments: Accepted as a research poster at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC19)

arXiv:1909.04824 [pdf]

A Machine Learning Method for Prediction of Multipath Channels

Authors: Julian Ahrens, Lia Ahrens, Hans D. Schotten

Abstract: In this paper, a machine learning method for predicting the evolution of a mobile communication channel based on a specific type of convolutional neural network is developed and evaluated in a simulated multipath transmission scenario. The simulation and channel estimation are designed to replicate real-world scenarios and common measurements supported by reference signals in modern cellular netwo… ▽ More In this paper, a machine learning method for predicting the evolution of a mobile communication channel based on a specific type of convolutional neural network is developed and evaluated in a simulated multipath transmission scenario. The simulation and channel estimation are designed to replicate real-world scenarios and common measurements supported by reference signals in modern cellular networks. The capability of the predictor meets the requirements that a deployment of the developed method in a radio resource scheduler of a base station poses. Possible applications of the method are discussed. △ Less

Submitted 1 March, 2020; v1 submitted 10 September, 2019; originally announced September 2019.

Comments: 7 pages, 2 tables, 7 figures

arXiv:1907.11762 [pdf, other]

doi 10.3390/e21070699

Multivariate Pointwise Information-Driven Data Sampling and Visualization

Authors: Soumya Dutta, Ayan Biswas, James Ahrens

Abstract: With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while preserving the important data properties so that the reduced data can answer domain-specific querie… ▽ More With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while preserving the important data properties so that the reduced data can answer domain-specific queries involving multiple variables with sufficient accuracy. While analyzing complex scientific events, domain experts often analyze and visualize two or more variables together to obtain a better understanding of the characteristics of the data features. Therefore, data summarization techniques are required to analyze multi-variable relationships in detail and then perform data reduction such that the important features involving multiple variables are preserved in the reduced data. To achieve this, in this work, we propose a data sub-sampling algorithm for performing statistical data summarization that leverages pointwise information theoretic measures to quantify the statistical association of data points considering multiple variables and generates a sub-sampled data that preserves the statistical association among multi-variables. Using such reduced sampled data, we show that multivariate feature query and analysis can be done effectively. The efficacy of the proposed multivariate association driven sampling algorithm is presented by applying it on several scientific data sets. △ Less

Submitted 26 July, 2019; originally announced July 2019.

Comments: 25 pages

Journal ref: Entropy, Volume 21, Issue 7, Year 2019

arXiv:1811.12119 [pdf, other]

doi 10.1186/s13634-019-0619-3

A Machine-Learning Phase Classification Scheme for Anomaly Detection in Signals with Periodic Characteristics

Authors: Lia Ahrens, Julian Ahrens, Hans D. Schotten

Abstract: In this paper we propose a novel machine-learning method for anomaly detection applicable to data with periodic characteristics where randomly varying period lengths are explicitly allowed. A multi-dimensional time series analysis is conducted by training a data-adapted classifier consisting of deep convolutional neural networks performing phase classification. The entire algorithm including data… ▽ More In this paper we propose a novel machine-learning method for anomaly detection applicable to data with periodic characteristics where randomly varying period lengths are explicitly allowed. A multi-dimensional time series analysis is conducted by training a data-adapted classifier consisting of deep convolutional neural networks performing phase classification. The entire algorithm including data pre-processing, period detection, segmentation, and even dynamic adjustment of the neural networks is implemented for fully automatic execution. The proposed method is evaluated on three example datasets from the areas of cardiology, intrusion detection, and signal processing, presenting reasonable performance. △ Less

Submitted 27 March, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

Comments: 25 pages, 15 figures

arXiv:1806.07382 [pdf, other]

In situ TensorView: In situ Visualization of Convolutional Neural Networks

Authors: Xinyu Chen, Qiang Guan, Li-Ta Lo, Simon Su, James Ahrens, Trilce Estrada

Abstract: Convolutional Neural Networks(CNNs) are complex systems. They are trained so they can adapt their internal connections to recognize images, texts and more. It is both interesting and helpful to visualize the dynamics within such deep artificial neural networks so that people can understand how these artificial networks are learning and making predictions. In the field of scientific simulations, vi… ▽ More Convolutional Neural Networks(CNNs) are complex systems. They are trained so they can adapt their internal connections to recognize images, texts and more. It is both interesting and helpful to visualize the dynamics within such deep artificial neural networks so that people can understand how these artificial networks are learning and making predictions. In the field of scientific simulations, visualization tools like Paraview have long been utilized to provide insights and understandings. We present in situ TensorView to visualize the training and functioning of CNNs as if they are systems of scientific simulations. In situ TensorView is a loosely coupled in situ visualization open framework that provides multiple viewers to help users to visualize and understand their networks. It leverages the capability of co-processing from Paraview to provide real-time visualization during training and predicting phases. This avoid heavy I/O overhead for visualizing large dynamic systems. Only a small number of lines of codes are injected in TensorFlow framework. The visualization can provide guidance to adjust the architecture of networks, or compress the pre-trained networks. We showcase visualizing the training of LeNet-5 and VGG16 using in situ TensorView. △ Less

Submitted 16 June, 2018; originally announced June 2018.

arXiv:1804.05796 [pdf, other]

An AI-driven Malfunction Detection Concept for NFV Instances in 5G

Authors: Julian Ahrens, Mathias Strufe, Lia Ahrens, Hans D. Schotten

Abstract: Efficient network management is one of the key challenges of the constantly growing and increasingly complex wide area networks (WAN). The paradigm shift towards virtualized (NFV) and software defined networks (SDN) in the next generation of mobile networks (5G), as well as the latest scientific insights in the field of Artificial Intelligence (AI) enable the transition from manually managed netwo… ▽ More Efficient network management is one of the key challenges of the constantly growing and increasingly complex wide area networks (WAN). The paradigm shift towards virtualized (NFV) and software defined networks (SDN) in the next generation of mobile networks (5G), as well as the latest scientific insights in the field of Artificial Intelligence (AI) enable the transition from manually managed networks nowadays to fully autonomic and dynamic self-organized networks (SON). This helps to meet the KPIs and reduce at the same time operational costs (OPEX). In this paper, an AI driven concept is presented for the malfunction detection in NFV applications with the help of semi-supervised learning. For this purpose, a profile of the application under test is created. This profile then is used as a reference to detect abnormal behaviour. For example, if there is a bug in the updated version of the app, it is now possible to react autonomously and roll-back the NFV app to a previous version in order to avoid network outages. △ Less

Submitted 16 April, 2018; originally announced April 2018.

Comments: Submitted to 23. VDE/ITG Fachtagung Mobilkommunikation, 5G, SON, AI, ML, NFV, SDN, WAN

arXiv:1712.06790 [pdf, other]

Build and Execution Environment (BEE): an Encapsulated Environment Enabling HPC Applications Running Everywhere

Authors: Jieyang Chen, Qiang Guan, Xin Liang, Louis James Vernon, Allen McPherson, Li-Ta Lo, Zizhong Chen, James Paul Ahrens

Abstract: Variations in High Performance Computing (HPC) system software configurations mean that applications are typically configured and built for specific HPC environments. Building applications can require a significant investment of time and effort for application users and requires application users to have additional technical knowledge. Linux container technologies such as Docker and Charliecloud b… ▽ More Variations in High Performance Computing (HPC) system software configurations mean that applications are typically configured and built for specific HPC environments. Building applications can require a significant investment of time and effort for application users and requires application users to have additional technical knowledge. Linux container technologies such as Docker and Charliecloud bring great benefits to the application development, build and deployment processes. While cloud platforms already widely support containers, HPC systems still have non-uniform support of container technologies. In this work, we propose a unified runtime framework -- Build and Execution Environment (BEE) across both HPC and cloud platforms that allows users to run their containerized HPC applications across all supported platforms without modification. We design four BEE backends for four different classes of HPC or cloud platform so that together they cover the majority of mainstream computing platforms for HPC users. Evaluations show that BEE provides an easy-to-use unified user interface, execution environment, and comparable performance. △ Less

Submitted 27 February, 2021; v1 submitted 19 December, 2017; originally announced December 2017.

arXiv:1310.2289 [pdf, other]

doi 10.1109/SSIAI.2012.6202488

Subband coding for large-scale scientific simulation data using JPEG 2000

Authors: Christopher M. Brislawn, Jonathan L. Woodring, Susan M. Mniszewski, David E. DeMarle, James P. Ahrens

Abstract: The ISO/IEC JPEG 2000 image coding standard is a family of source coding algorithms targeting high-resolution image communications. JPEG 2000 features highly scalable embedded coding features that allow one to interactively zoom out to reduced resolution thumbnails of enormous data sets or to zoom in on highly localized regions of interest with very economical communications and rendering requirem… ▽ More The ISO/IEC JPEG 2000 image coding standard is a family of source coding algorithms targeting high-resolution image communications. JPEG 2000 features highly scalable embedded coding features that allow one to interactively zoom out to reduced resolution thumbnails of enormous data sets or to zoom in on highly localized regions of interest with very economical communications and rendering requirements. While intended for fixed-precision input data, the implementation of the irreversible version of the standard is often done internally in floating point arithmetic. Moreover, the standard is designed to support high-bit-depth data. Part 2 of the standard also provides support for three-dimensional data sets such as multicomponent or volumetric imagery. These features make JPEG 2000 an appealing candidate for highly scalable communications coding and visualization of two- and three-dimensional data produced by scientific simulation software. We present results of initial experiments applying JPEG 2000 to scientific simulation data produced by the Parallel Ocean Program (POP) global ocean circulation model, highlighting both the promise and the many challenges this approach holds for scientific visualization applications. △ Less

Submitted 23 October, 2014; v1 submitted 8 October, 2013; originally announced October 2013.

Comments: 4 pages, 5 figures. Version 2: added BibTeX citation (BibTeX_citation.txt) as an ancillary file

Report number: LA-UR-12-1352 MSC Class: 94A29 ACM Class: E.4

Journal ref: Proceedings IEEE Southwest Symposium on Image Analysis and Interpretation, Santa Fe, NM: IEEE Computer Society, April 2012, pp. 201-204

Showing 1–24 of 24 results for author: Ahrens, J