-
Explainable AI Integrated Feature Engineering for Wildfire Prediction
Authors:
Di Fan,
Ayan Biswas,
James Paul Ahrens
Abstract:
Wildfires present intricate challenges for prediction, necessitating the use of sophisticated machine learning techniques for effective modeling\cite{jain2020review}. In our research, we conducted a thorough assessment of various machine learning algorithms for both classification and regression tasks relevant to predicting wildfires. We found that for classifying different types or stages of wild…
▽ More
Wildfires present intricate challenges for prediction, necessitating the use of sophisticated machine learning techniques for effective modeling\cite{jain2020review}. In our research, we conducted a thorough assessment of various machine learning algorithms for both classification and regression tasks relevant to predicting wildfires. We found that for classifying different types or stages of wildfires, the XGBoost model outperformed others in terms of accuracy and robustness. Meanwhile, the Random Forest regression model showed superior results in predicting the extent of wildfire-affected areas, excelling in both prediction error and explained variance. Additionally, we developed a hybrid neural network model that integrates numerical data and image information for simultaneous classification and regression. To gain deeper insights into the decision-making processes of these models and identify key contributing features, we utilized eXplainable Artificial Intelligence (XAI) techniques, including TreeSHAP, LIME, Partial Dependence Plots (PDP), and Gradient-weighted Class Activation Map** (Grad-CAM). These interpretability tools shed light on the significance and interplay of various features, highlighting the complex factors influencing wildfire predictions. Our study not only demonstrates the effectiveness of specific machine learning models in wildfire-related tasks but also underscores the critical role of model transparency and interpretability in environmental science applications.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
From Channel Measurement to Training Data for PHY Layer AI Applications
Authors:
Michael Zentarra,
Julian Ahrens,
Lia Ahrens
Abstract:
Learning-based techniques such as artificial intelligence (AI) and machine learning (ML) play an increasingly important role in the development of future communication networks. The success of a learning algorithm depends on the quality and quantity of the available training data. In the physical layer (PHY), channel information data can be obtained either through measurement campaigns or through…
▽ More
Learning-based techniques such as artificial intelligence (AI) and machine learning (ML) play an increasingly important role in the development of future communication networks. The success of a learning algorithm depends on the quality and quantity of the available training data. In the physical layer (PHY), channel information data can be obtained either through measurement campaigns or through simulations based on predefined channel models. Performing measurements can be time consuming while only gaining information about one specific position or scenario. Simulated data, on the other hand, are more generalized and reflect in most cases not a real environment but instead, a statistical approximation based on a mathematical model. This paper presents a procedure for acquiring channel data by means of fast and flexible software defined radio (SDR) based channel measurements along with a method for a parameter extraction that provides configuration input to the simulator. The procedure from the measurement to the simulated channel data is demonstrated in two exemplary propagation scenarios. It is shown, that in both cases the simulated data is in good accordance to the measurements
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Analyzing Impact of Data Reduction Techniques on Visualization for AMR Applications Using AMReX Framework
Authors:
Daoce Wang,
Jesus Pulido,
Pascal Grosset,
Jiannan Tian,
James Ahrens,
Dingwen Tao
Abstract:
Today's scientific simulations generate exceptionally large volumes of data, challenging the capacities of available I/O bandwidth and storage space. This necessitates a substantial reduction in data volume, for which error-bounded lossy compression has emerged as a highly effective strategy. A crucial metric for assessing the efficacy of lossy compression is visualization. Despite extensive resea…
▽ More
Today's scientific simulations generate exceptionally large volumes of data, challenging the capacities of available I/O bandwidth and storage space. This necessitates a substantial reduction in data volume, for which error-bounded lossy compression has emerged as a highly effective strategy. A crucial metric for assessing the efficacy of lossy compression is visualization. Despite extensive research on the impact of compression on visualization, there is a notable gap in the literature concerning the effects of compression on the visualization of Adaptive Mesh Refinement (AMR) data. AMR has proven to be a potent solution for addressing the rising computational intensity and the explosive growth in data volume that requires storage and transmission. However, the hierarchical and multi-resolution characteristics of AMR data introduce unique challenges to its visualization, and these challenges are further compounded when data compression comes into play. This article delves into the intricacies of how data compression influences and introduces novel challenges to the visualization of AMR data.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
MolSieve: A Progressive Visual Analytics System for Molecular Dynamics Simulations
Authors:
Rostyslav Hnatyshyn,
Jieqiong Zhao,
Danny Perez,
James Ahrens,
Ross Maciejewski
Abstract:
Molecular Dynamics (MD) simulations are ubiquitous in cutting-edge physio-chemical research. They provide critical insights into how a physical system evolves over time given a model of interatomic interactions. Understanding a system's evolution is key to selecting the best candidates for new drugs, materials for manufacturing, and countless other practical applications. With today's technology,…
▽ More
Molecular Dynamics (MD) simulations are ubiquitous in cutting-edge physio-chemical research. They provide critical insights into how a physical system evolves over time given a model of interatomic interactions. Understanding a system's evolution is key to selecting the best candidates for new drugs, materials for manufacturing, and countless other practical applications. With today's technology, these simulations can encompass millions of unit transitions between discrete molecular structures, spanning up to several milliseconds of real time. Attempting to perform a brute-force analysis with data-sets of this size is not only computationally impractical, but would not shed light on the physically-relevant features of the data. Moreover, there is a need to analyze simulation ensembles in order to compare similar processes in differing environments. These problems call for an approach that is analytically transparent, computationally efficient, and flexible enough to handle the variety found in materials based research. In order to address these problems, we introduce MolSieve, a progressive visual analytics system that enables the comparison of multiple long-duration simulations. Using MolSieve, analysts are able to quickly identify and compare regions of interest within immense simulations through its combination of control charts, data-reduction techniques, and highly informative visual components. A simple programming interface is provided which allows experts to fit MolSieve to their needs. To demonstrate the efficacy of our approach, we present two case studies of MolSieve and report on findings from domain collaborators.
△ Less
Submitted 5 September, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications
Authors:
Daoce Wang,
Jesus Pulido,
Pascal Grosset,
Jiannan Tian,
Sian **,
Houjun Tang,
Jean Sexton,
Sheng Di,
Zarija Lukić,
Kai Zhao,
Bo Fang,
Franck Cappello,
James Ahrens,
Dingwen Tao
Abstract:
As supercomputers advance towards exascale capabilities, computational intensity increases significantly, and the volume of data requiring storage and transmission experiences exponential growth. Adaptive Mesh Refinement (AMR) has emerged as an effective solution to address these two challenges. Concurrently, error-bounded lossy compression is recognized as one of the most efficient approaches to…
▽ More
As supercomputers advance towards exascale capabilities, computational intensity increases significantly, and the volume of data requiring storage and transmission experiences exponential growth. Adaptive Mesh Refinement (AMR) has emerged as an effective solution to address these two challenges. Concurrently, error-bounded lossy compression is recognized as one of the most efficient approaches to tackle the latter issue. Despite their respective advantages, few attempts have been made to investigate how AMR and error-bounded lossy compression can function together. To this end, this study presents a novel in-situ lossy compression framework that employs the HDF5 filter to improve both I/O costs and boost compression quality for AMR applications. We implement our solution into the AMReX framework and evaluate on two real-world AMR applications, Nyx and WarpX, on the Summit supercomputer. Experiments with 4096 CPU cores demonstrate that AMRIC improves the compression ratio by up to 81X and the I/O performance by up to 39X over AMReX's original compression solution.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Pluto's Surface Map** using Unsupervised Learning from Near-Infrared Observations of LEISA/Ralph
Authors:
A. Emran,
C. M. Dalle Ore,
C. J. Ahrens,
M. K. H. Khan,
V. F. Chevrier,
D. P. Cruikshank
Abstract:
We map the surface of Pluto using an unsupervised machine learning technique using the near-infrared observations of the LEISA/Ralph instrument onboard NASA's New Horizons spacecraft. The principal component reduced Gaussian mixture model was implemented to investigate the geographic distribution of the surface units across the dwarf planet. We also present the likelihood of each surface unit at t…
▽ More
We map the surface of Pluto using an unsupervised machine learning technique using the near-infrared observations of the LEISA/Ralph instrument onboard NASA's New Horizons spacecraft. The principal component reduced Gaussian mixture model was implemented to investigate the geographic distribution of the surface units across the dwarf planet. We also present the likelihood of each surface unit at the image pixel level. Average I/F spectra of each unit were analyzed -- in terms of the position and strengths of absorption bands of abundant volatiles such as N${}_{2}$, CH${}_{4}$, and CO and nonvolatile H${}_{2}$O -- to connect the unit to surface composition, geology, and geographic location. The distribution of surface units shows a latitudinal pattern with distinct surface compositions of volatiles -- consistent with the existing literature. However, previous map** efforts were based primarily on compositional analysis using spectral indices (indicators) or implementation of complex radiative transfer models, which need (prior) expert knowledge, label data, or optical constants of representative endmembers. We prove that an application of unsupervised learning in this instance renders a satisfactory result in map** the spatial distribution of ice compositions without any prior information or label data. Thus, such an application is specifically advantageous for a planetary surface map** when label data are poorly constrained or completely unknown, because an understanding of surface material distribution is vital for volatile transport modeling at the planetary scale. We emphasize that the unsupervised learning used in this study has wide applicability and can be expanded to other planetary bodies of the Solar System for map** surface material distribution.
△ Less
Submitted 15 January, 2023;
originally announced January 2023.
-
TAC+: Optimizing Error-Bounded Lossy Compression for 3D AMR Simulations
Authors:
Daoce Wang,
Jesus Pulido,
Pascal Grosset,
Sian **,
Jiannan Tian,
Kai Zhao,
James Ahrens,
Dingwen Tao
Abstract:
Today's scientific simulations require significant data volume reduction because of the enormous amounts of data produced and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simula…
▽ More
Today's scientific simulations require significant data volume reduction because of the enormous amounts of data produced and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simulation data. Unlike the previous work that only leverages 1D compression, in this work, we propose an approach (TAC) to leverage high-dimensional SZ compression for each refinement level of AMR data. To remove the data redundancy across different levels, we propose several pre-process strategies and adaptively use them based on the data features. We further optimize TAC to TAC+ by improving the lossless encoding stage of SZ compression to handle many small AMR data blocks after the pre-processing efficiently. Experiments on 10 AMR datasets from three real-world large-scale AMR simulations demonstrate that TAC+ can improve the compression ratio by up to 4.9$\times$ under the same data distortion, compared to the state-of-the-art method. In addition, we leverage the flexibility of our approach to tune the error bound for each level, which achieves much lower data distortion on two application-specific metrics.
△ Less
Submitted 5 December, 2023; v1 submitted 4 January, 2023;
originally announced January 2023.
-
Ambisonic Encoding of Signals From Equatorial Microphone Arrays
Authors:
Jens Ahrens
Abstract:
The equatorial microphone array presented in (Ahrens et al., 2021) computes a spherical harmonic (SH) representation of a sound field based on pressure sensors along the equator of a rigid spherical baffle. The original formulation uses complex-valued SH basis functions. This is inconvenient if the SH representation of the captured sound field is intended to be stored in time domain by means of re…
▽ More
The equatorial microphone array presented in (Ahrens et al., 2021) computes a spherical harmonic (SH) representation of a sound field based on pressure sensors along the equator of a rigid spherical baffle. The original formulation uses complex-valued SH basis functions. This is inconvenient if the SH representation of the captured sound field is intended to be stored in time domain by means of real-valued audio signals as it is common in the spatial audio format of ambisonics. The present document summarizes the modifications that need to be applied to the mathematical formulation from (Ahrens et al., 2021) to produce an ambisonic representation of the captured sound field that is compatible with the established ambisonic software tools like SPARTA and the IEM Plugin Suite. An example MATLAB script that implements this formulation is provided.
△ Less
Submitted 18 September, 2022;
originally announced November 2022.
-
Ambisonic Encoding of Signals From Spherical Microphone Arrays
Authors:
Jens Ahrens
Abstract:
This document illustrates how to process the signals from the microphones of a rigid-sphere higher-order ambisonic microphone array so that they are encoded with N3D normalization and ACN channel order and thereby can be used with the standard ambisonic software tools such as SPARTA and the IEM Plugin Suite. A MATLAB script is provided.
This document illustrates how to process the signals from the microphones of a rigid-sphere higher-order ambisonic microphone array so that they are encoded with N3D normalization and ACN channel order and thereby can be used with the standard ambisonic software tools such as SPARTA and the IEM Plugin Suite. A MATLAB script is provided.
△ Less
Submitted 18 September, 2022;
originally announced November 2022.
-
Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses
Authors:
Thomas Deppisch,
Sebastià V. Amengual Garí,
Paul Calamia,
Jens Ahrens
Abstract:
Psychoacoustic experiments have shown that directional properties of the direct sound, salient reflections, and the late reverberation of an acoustic room response can have a distinct influence on the auditory perception of a given room. Spatial room impulse responses (SRIRs) capture those properties and thus are used for direction-dependent room acoustic analysis and virtual acoustic rendering. T…
▽ More
Psychoacoustic experiments have shown that directional properties of the direct sound, salient reflections, and the late reverberation of an acoustic room response can have a distinct influence on the auditory perception of a given room. Spatial room impulse responses (SRIRs) capture those properties and thus are used for direction-dependent room acoustic analysis and virtual acoustic rendering. This work proposes a subspace method that decomposes SRIRs into a direct part, which comprises the direct sound and the salient reflections, and a residual, to facilitate enhanced analysis and rendering methods by providing individual access to these components. The proposed method is based on the generalized singular value decomposition and interprets the residual as noise that is to be separated from the other components of the reverberation. Large generalized singular values are attributed to the direct part, which is then obtained as a low-rank approximation of the SRIR. By advancing from the end of the SRIR toward the beginning while iteratively updating the residual estimate, the method adapts to spatio-temporal variations of the residual. The method is evaluated using a spatio-spectral error measure and simulated SRIRs of different rooms, microphone arrays, and ratios of direct sound to residual energy. The proposed method creates lower errors than existing approaches in all tested scenarios, including a scenario with two simultaneous reflections. A case study with measured SRIRs shows the applicability of the method under real-world acoustic conditions. A reference implementation is provided.
△ Less
Submitted 31 January, 2023; v1 submitted 20 July, 2022;
originally announced July 2022.
-
TAC: Optimizing Error-Bounded Lossy Compression for Three-Dimensional Adaptive Mesh Refinement Simulations
Authors:
Daoce Wang,
Jesus Pulido,
Pascal Grosset,
Sian **,
Jiannan Tian,
James Ahrens,
Dingwen Tao
Abstract:
Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement…
▽ More
Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simulation data. Unlike the previous work that only leverages 1D compression, in this work, we propose to leverage high-dimensional (e.g., 3D) compression for each refinement level of AMR data. To remove the data redundancy across different levels, we propose three pre-process strategies and adaptively use them based on the data characteristics. Experiments on seven AMR datasets from a real-world large-scale AMR simulation demonstrate that our proposed approach can improve the compression ratio by up to 3.3X under the same data distortion, compared to the state-of-the-art method. In addition, we leverage the flexibility of our approach to tune the error bound for each level, which achieves much lower data distortion on two application-specific metrics.
△ Less
Submitted 6 May, 2022; v1 submitted 1 April, 2022;
originally announced April 2022.
-
Binaural Audio Rendering in the Spherical Harmonic Domain: A Summary of the Mathematics and its Pitfalls
Authors:
Jens Ahrens
Abstract:
The present document reviews the mathematics behind binaural rendering of sound fields that are available as spherical harmonic expansion coefficients. This process is also known as binaural ambisonic decoding. We highlight that the details entail some amount peculiarity so that one has to be well aware of the precise definitions that are chosen for some of the involved quantities to obtain a cons…
▽ More
The present document reviews the mathematics behind binaural rendering of sound fields that are available as spherical harmonic expansion coefficients. This process is also known as binaural ambisonic decoding. We highlight that the details entail some amount peculiarity so that one has to be well aware of the precise definitions that are chosen for some of the involved quantities to obtain a consistent formulation. We also discuss what sets of definitions produce ambisonic signals that are compatible with the most common software tools that are available.
△ Less
Submitted 14 September, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
In Situ Data Summaries for Flexible Feature Analysis in Large-Scale Multiphase Flow Simulations
Authors:
Soumya Dutta,
Terece Turton,
David Rogers,
Jordan Musser,
James Ahrens,
Ann Almgren
Abstract:
The study of multiphase flow is essential for understanding the complex interactions of various materials. In particular, when designing chemical reactors such as fluidized bed reactors (FBR), a detailed understanding of the hydrodynamics is critical for optimizing reactor performance and stability. An FBR allows experts to conduct different types of chemical reactions involving multiphase materia…
▽ More
The study of multiphase flow is essential for understanding the complex interactions of various materials. In particular, when designing chemical reactors such as fluidized bed reactors (FBR), a detailed understanding of the hydrodynamics is critical for optimizing reactor performance and stability. An FBR allows experts to conduct different types of chemical reactions involving multiphase materials, especially interaction between gas and solids. During such complex chemical processes, formation of void regions in the reactor, generally termed as bubbles, is an important phenomenon. Study of these bubbles has a deep implication in predicting the reactor's overall efficiency. But physical experiments needed to understand bubble dynamics are costly and non-trivial. Therefore, to study such chemical processes and bubble dynamics, a state-of-the-art massively parallel computational fluid dynamics discrete element model (CFD-DEM), MFIX-Exa is being developed for simulating multiphase flows. Despite the proven accuracy of MFIX-Exa in modeling bubbling phenomena, the very-large size of the output data prohibits the use of traditional post hoc analysis capabilities in both storage and I/O time. To address these issues and allow the application scientists to explore the bubble dynamics in an efficient and timely manner, we have developed an end-to-end visual analytics pipeline that enables in situ detection of bubbles using statistical techniques, followed by a flexible and interactive visual exploration of bubble dynamics in the post hoc analysis phase. Positive feedback from the experts has indicated the efficacy of the proposed approach for exploring bubble dynamics in very-large scale multiphase flow simulations.
△ Less
Submitted 7 January, 2022;
originally announced January 2022.
-
Adaptive Configuration of In Situ Lossy Compression for Cosmology Simulations via Fine-Grained Rate-Quality Modeling
Authors:
Sian **,
Jesus Pulido,
Pascal Grosset,
Jiannan Tian,
Dingwen Tao,
James Ahrens
Abstract:
Extreme-scale cosmological simulations have been widely used by today's researchers and scientists on leadership supercomputers. A new generation of error-bounded lossy compressors has been used in workflows to reduce storage requirements and minimize the impact of throughput limitations while saving large snapshots of high-fidelity data for post-hoc analysis. In this paper, we propose to adaptive…
▽ More
Extreme-scale cosmological simulations have been widely used by today's researchers and scientists on leadership supercomputers. A new generation of error-bounded lossy compressors has been used in workflows to reduce storage requirements and minimize the impact of throughput limitations while saving large snapshots of high-fidelity data for post-hoc analysis. In this paper, we propose to adaptively provide compression configurations to compute partitions of cosmological simulations with newly designed post-analysis aware rate-quality modeling. The contribution is fourfold: (1) We propose a novel adaptive approach to select feasible error bounds for different partitions, showing the possibility and efficiency of adaptively configuring lossy compression for each partition individually. (2) We build models to estimate the overall loss of post-analysis result due to lossy compression and to estimate compression ratio, based on the property of each partition. (3) We develop an efficient optimization guideline to determine the best-fit configuration of error bounds combination in order to maximize the compression ratio under acceptable post-analysis quality loss. (4) Our approach introduces negligible overheads for feature extraction and error-bound optimization for each partition, enabling post-analysis-aware in situ lossy compression for cosmological simulations. Experiments show that our proposed models are highly accurate and reliable. Our fine-grained adaptive configuration approach improves the compression ratio of up to 73% on the tested datasets with the same post-analysis distortion with only 1% performance overhead.
△ Less
Submitted 20 April, 2021; v1 submitted 31 March, 2021;
originally announced April 2021.
-
Cinema Darkroom: A Deferred Rendering Framework for Large-Scale Datasets
Authors:
Jonas Lukasczyk,
Christoph Garth,
Matthew Larsen,
Wito Engelke,
Ingrid Hotz,
David Rogers,
James Ahrens,
Ross Maciejewski
Abstract:
This paper presents a framework that fully leverages the advantages of a deferred rendering approach for the interactive visualization of large-scale datasets. Geometry buffers (G-Buffers) are generated and stored in situ, and shading is performed post hoc in an interactive image-based rendering front end. This decoupled framework has two major advantages. First, the G-Buffers only need to be comp…
▽ More
This paper presents a framework that fully leverages the advantages of a deferred rendering approach for the interactive visualization of large-scale datasets. Geometry buffers (G-Buffers) are generated and stored in situ, and shading is performed post hoc in an interactive image-based rendering front end. This decoupled framework has two major advantages. First, the G-Buffers only need to be computed and stored once---which corresponds to the most expensive part of the rendering pipeline. Second, the stored G-Buffers can later be consumed in an image-based rendering front end that enables users to interactively adjust various visualization parameters---such as the applied color map or the strength of ambient occlusion---where suitable choices are often not known a priori. This paper demonstrates the use of Cinema Darkroom on several real-world datasets, highlighting CD's ability to effectively decouple the complexity and size of the dataset from its visualization.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations
Authors:
Sian **,
Pascal Grosset,
Christopher M. Biwer,
Jesus Pulido,
Jiannan Tian,
Dingwen Tao,
James Ahrens
Abstract:
To help understand our universe better, researchers and scientists currently run extreme-scale cosmology simulations on leadership supercomputers. However, such simulations can generate large amounts of scientific data, which often result in expensive costs in data associated with data movement and storage. Lossy compression techniques have become attractive because they significantly reduce data…
▽ More
To help understand our universe better, researchers and scientists currently run extreme-scale cosmology simulations on leadership supercomputers. However, such simulations can generate large amounts of scientific data, which often result in expensive costs in data associated with data movement and storage. Lossy compression techniques have become attractive because they significantly reduce data size and can maintain high data fidelity for post-analysis. In this paper, we propose to use GPU-based lossy compression for extreme-scale cosmological simulations. Our contributions are threefold: (1) we implement multiple GPU-based lossy compressors to our opensource compression benchmark and analysis framework named Foresight; (2) we use Foresight to comprehensively evaluate the practicality of using GPU-based lossy compression on two real-world extreme-scale cosmology simulations, namely HACC and Nyx, based on a series of assessment metrics; and (3) we develop a general optimization guideline on how to determine the best-fit configurations for different lossy compressors and cosmological simulations. Experiments show that GPU-based lossy compression can provide necessary accuracy on post-analysis for cosmological simulations and high compression ratio of 5~15x on the tested datasets, as well as much higher compression and decompression throughput than CPU-based compressors.
△ Less
Submitted 2 July, 2020; v1 submitted 1 April, 2020;
originally announced April 2020.
-
Deep Learning-Based Feature-Aware Data Modeling for Complex Physics Simulations
Authors:
Qun Liu,
Subhashis Hazarika,
John M. Patchett,
James Paul Ahrens,
Ayan Biswas
Abstract:
Data modeling and reduction for in situ is important. Feature-driven methods for in situ data analysis and reduction are a priority for future exascale machines as there are currently very few such methods. We investigate a deep-learning based workflow that targets in situ data processing using autoencoders. We propose a Residual Autoencoder integrated Residual in Residual Dense Block (RRDB) to ob…
▽ More
Data modeling and reduction for in situ is important. Feature-driven methods for in situ data analysis and reduction are a priority for future exascale machines as there are currently very few such methods. We investigate a deep-learning based workflow that targets in situ data processing using autoencoders. We propose a Residual Autoencoder integrated Residual in Residual Dense Block (RRDB) to obtain better performance. Our proposed framework compressed our test data into 66 KB from 2.1 MB per 3D volume timestep.
△ Less
Submitted 7 December, 2019;
originally announced December 2019.
-
A Machine Learning Method for Prediction of Multipath Channels
Authors:
Julian Ahrens,
Lia Ahrens,
Hans D. Schotten
Abstract:
In this paper, a machine learning method for predicting the evolution of a mobile communication channel based on a specific type of convolutional neural network is developed and evaluated in a simulated multipath transmission scenario. The simulation and channel estimation are designed to replicate real-world scenarios and common measurements supported by reference signals in modern cellular netwo…
▽ More
In this paper, a machine learning method for predicting the evolution of a mobile communication channel based on a specific type of convolutional neural network is developed and evaluated in a simulated multipath transmission scenario. The simulation and channel estimation are designed to replicate real-world scenarios and common measurements supported by reference signals in modern cellular networks. The capability of the predictor meets the requirements that a deployment of the developed method in a radio resource scheduler of a base station poses. Possible applications of the method are discussed.
△ Less
Submitted 1 March, 2020; v1 submitted 10 September, 2019;
originally announced September 2019.
-
Multivariate Pointwise Information-Driven Data Sampling and Visualization
Authors:
Soumya Dutta,
Ayan Biswas,
James Ahrens
Abstract:
With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while preserving the important data properties so that the reduced data can answer domain-specific querie…
▽ More
With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while preserving the important data properties so that the reduced data can answer domain-specific queries involving multiple variables with sufficient accuracy. While analyzing complex scientific events, domain experts often analyze and visualize two or more variables together to obtain a better understanding of the characteristics of the data features. Therefore, data summarization techniques are required to analyze multi-variable relationships in detail and then perform data reduction such that the important features involving multiple variables are preserved in the reduced data. To achieve this, in this work, we propose a data sub-sampling algorithm for performing statistical data summarization that leverages pointwise information theoretic measures to quantify the statistical association of data points considering multiple variables and generates a sub-sampled data that preserves the statistical association among multi-variables. Using such reduced sampled data, we show that multivariate feature query and analysis can be done effectively. The efficacy of the proposed multivariate association driven sampling algorithm is presented by applying it on several scientific data sets.
△ Less
Submitted 26 July, 2019;
originally announced July 2019.
-
A Machine-Learning Phase Classification Scheme for Anomaly Detection in Signals with Periodic Characteristics
Authors:
Lia Ahrens,
Julian Ahrens,
Hans D. Schotten
Abstract:
In this paper we propose a novel machine-learning method for anomaly detection applicable to data with periodic characteristics where randomly varying period lengths are explicitly allowed. A multi-dimensional time series analysis is conducted by training a data-adapted classifier consisting of deep convolutional neural networks performing phase classification. The entire algorithm including data…
▽ More
In this paper we propose a novel machine-learning method for anomaly detection applicable to data with periodic characteristics where randomly varying period lengths are explicitly allowed. A multi-dimensional time series analysis is conducted by training a data-adapted classifier consisting of deep convolutional neural networks performing phase classification. The entire algorithm including data pre-processing, period detection, segmentation, and even dynamic adjustment of the neural networks is implemented for fully automatic execution. The proposed method is evaluated on three example datasets from the areas of cardiology, intrusion detection, and signal processing, presenting reasonable performance.
△ Less
Submitted 27 March, 2019; v1 submitted 29 November, 2018;
originally announced November 2018.
-
In situ TensorView: In situ Visualization of Convolutional Neural Networks
Authors:
Xinyu Chen,
Qiang Guan,
Li-Ta Lo,
Simon Su,
James Ahrens,
Trilce Estrada
Abstract:
Convolutional Neural Networks(CNNs) are complex systems. They are trained so they can adapt their internal connections to recognize images, texts and more. It is both interesting and helpful to visualize the dynamics within such deep artificial neural networks so that people can understand how these artificial networks are learning and making predictions. In the field of scientific simulations, vi…
▽ More
Convolutional Neural Networks(CNNs) are complex systems. They are trained so they can adapt their internal connections to recognize images, texts and more. It is both interesting and helpful to visualize the dynamics within such deep artificial neural networks so that people can understand how these artificial networks are learning and making predictions. In the field of scientific simulations, visualization tools like Paraview have long been utilized to provide insights and understandings. We present in situ TensorView to visualize the training and functioning of CNNs as if they are systems of scientific simulations. In situ TensorView is a loosely coupled in situ visualization open framework that provides multiple viewers to help users to visualize and understand their networks. It leverages the capability of co-processing from Paraview to provide real-time visualization during training and predicting phases. This avoid heavy I/O overhead for visualizing large dynamic systems. Only a small number of lines of codes are injected in TensorFlow framework. The visualization can provide guidance to adjust the architecture of networks, or compress the pre-trained networks. We showcase visualizing the training of LeNet-5 and VGG16 using in situ TensorView.
△ Less
Submitted 16 June, 2018;
originally announced June 2018.
-
An AI-driven Malfunction Detection Concept for NFV Instances in 5G
Authors:
Julian Ahrens,
Mathias Strufe,
Lia Ahrens,
Hans D. Schotten
Abstract:
Efficient network management is one of the key challenges of the constantly growing and increasingly complex wide area networks (WAN). The paradigm shift towards virtualized (NFV) and software defined networks (SDN) in the next generation of mobile networks (5G), as well as the latest scientific insights in the field of Artificial Intelligence (AI) enable the transition from manually managed netwo…
▽ More
Efficient network management is one of the key challenges of the constantly growing and increasingly complex wide area networks (WAN). The paradigm shift towards virtualized (NFV) and software defined networks (SDN) in the next generation of mobile networks (5G), as well as the latest scientific insights in the field of Artificial Intelligence (AI) enable the transition from manually managed networks nowadays to fully autonomic and dynamic self-organized networks (SON). This helps to meet the KPIs and reduce at the same time operational costs (OPEX). In this paper, an AI driven concept is presented for the malfunction detection in NFV applications with the help of semi-supervised learning. For this purpose, a profile of the application under test is created. This profile then is used as a reference to detect abnormal behaviour. For example, if there is a bug in the updated version of the app, it is now possible to react autonomously and roll-back the NFV app to a previous version in order to avoid network outages.
△ Less
Submitted 16 April, 2018;
originally announced April 2018.
-
Build and Execution Environment (BEE): an Encapsulated Environment Enabling HPC Applications Running Everywhere
Authors:
Jieyang Chen,
Qiang Guan,
Xin Liang,
Louis James Vernon,
Allen McPherson,
Li-Ta Lo,
Zizhong Chen,
James Paul Ahrens
Abstract:
Variations in High Performance Computing (HPC) system software configurations mean that applications are typically configured and built for specific HPC environments. Building applications can require a significant investment of time and effort for application users and requires application users to have additional technical knowledge. Linux container technologies such as Docker and Charliecloud b…
▽ More
Variations in High Performance Computing (HPC) system software configurations mean that applications are typically configured and built for specific HPC environments. Building applications can require a significant investment of time and effort for application users and requires application users to have additional technical knowledge. Linux container technologies such as Docker and Charliecloud bring great benefits to the application development, build and deployment processes. While cloud platforms already widely support containers, HPC systems still have non-uniform support of container technologies. In this work, we propose a unified runtime framework -- Build and Execution Environment (BEE) across both HPC and cloud platforms that allows users to run their containerized HPC applications across all supported platforms without modification. We design four BEE backends for four different classes of HPC or cloud platform so that together they cover the majority of mainstream computing platforms for HPC users. Evaluations show that BEE provides an easy-to-use unified user interface, execution environment, and comparable performance.
△ Less
Submitted 27 February, 2021; v1 submitted 19 December, 2017;
originally announced December 2017.
-
Subband coding for large-scale scientific simulation data using JPEG 2000
Authors:
Christopher M. Brislawn,
Jonathan L. Woodring,
Susan M. Mniszewski,
David E. DeMarle,
James P. Ahrens
Abstract:
The ISO/IEC JPEG 2000 image coding standard is a family of source coding algorithms targeting high-resolution image communications. JPEG 2000 features highly scalable embedded coding features that allow one to interactively zoom out to reduced resolution thumbnails of enormous data sets or to zoom in on highly localized regions of interest with very economical communications and rendering requirem…
▽ More
The ISO/IEC JPEG 2000 image coding standard is a family of source coding algorithms targeting high-resolution image communications. JPEG 2000 features highly scalable embedded coding features that allow one to interactively zoom out to reduced resolution thumbnails of enormous data sets or to zoom in on highly localized regions of interest with very economical communications and rendering requirements. While intended for fixed-precision input data, the implementation of the irreversible version of the standard is often done internally in floating point arithmetic. Moreover, the standard is designed to support high-bit-depth data. Part 2 of the standard also provides support for three-dimensional data sets such as multicomponent or volumetric imagery. These features make JPEG 2000 an appealing candidate for highly scalable communications coding and visualization of two- and three-dimensional data produced by scientific simulation software. We present results of initial experiments applying JPEG 2000 to scientific simulation data produced by the Parallel Ocean Program (POP) global ocean circulation model, highlighting both the promise and the many challenges this approach holds for scientific visualization applications.
△ Less
Submitted 23 October, 2014; v1 submitted 8 October, 2013;
originally announced October 2013.