Search | arXiv e-print repository

The Monte Carlo Computational Summit -- October 25 & 26, 2023 -- Notre Dame, Indiana, USA

Authors: Joanna Piper Morgan, Alexander Mote, Samuel Lee Pasmann, Gavin Ridley, Todd Palmer, Kyle E. Niemeyer, Ryan McClarren

Abstract: The Monte Carlo Computational Summit was held on the campus of the University of Notre Dame in South Bend, Indiana, USA on 25--26 October 2023. The goals of the summit were to discuss algorithmic and software alterations required for successfully porting respective code bases to exascale-class computing hardware, compare software engineering techniques used by various code teams, and consider the… ▽ More The Monte Carlo Computational Summit was held on the campus of the University of Notre Dame in South Bend, Indiana, USA on 25--26 October 2023. The goals of the summit were to discuss algorithmic and software alterations required for successfully porting respective code bases to exascale-class computing hardware, compare software engineering techniques used by various code teams, and consider the adoption of industry-standard benchmark problems to better facilitate code-to-code performance comparisons. A large portion of the meeting included candid discussions of direct experiences with approaches that have and have not worked. Participants reported that identifying and implementing suitable Monte Carlo algorithms for GPUs continues to be a sticking point. They also report significant difficulty porting existing algorithms between GPU APIs (specifically Nvidia CUDA to AMD ROCm). To better compare code-to-code performance, participants decided to design a C5G7-like benchmark problem with a defined figure of merit, with the expectation of adding more benchmarks in the future. Problem specifications and results will eventually be hosted in a public repository and will be open to submissions by all Monte Carlo transport codes capable of running the benchmark problem. The participants also identified the need to explore the intermediate and long-term future of the Monte Carlo neutron transport community and how best to modernize and contextualize Monte Carlo as a useful tool in modern industry. Overall the summit was considered to be a success by the organizers and participants, and the group shared a strong desire for future, potentially larger, Monte Carlo summits. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: conference report

arXiv:2401.08874 [pdf, other]

Investigating a single-domain approach for modeling coupled porous solid-fluid systems: applications in buoyant reacting plume formation and ignition

Authors: Diba Behnoudfar, Kyle E. Niemeyer

Abstract: Many natural and industrial processes involve mixed porous-solid fluid domains where multiple physics of reactions, heat transfer, and fluid flow interact over disparate length scales, such as the combustion of multi-species solid fuels. This range of problems covers small-scale fuel burning to large-scale forest fires when plant canopies can be modeled as porous media. Although many modeling stud… ▽ More Many natural and industrial processes involve mixed porous-solid fluid domains where multiple physics of reactions, heat transfer, and fluid flow interact over disparate length scales, such as the combustion of multi-species solid fuels. This range of problems covers small-scale fuel burning to large-scale forest fires when plant canopies can be modeled as porous media. Although many modeling studies so far have concentrated on detailed physics within the single fluid or porous phase, few consider both phases, in part due to the challenge in determining suitable boundary conditions between the regions, particularly in turbulent flows where eddies might penetrate the porous region. In this work, we develop a single-domain approach that eliminates the need for boundary conditions at the interface, and numerically study scenarios involving porous-solids and a surrounding fluid. Similar to the methods used in large eddy simulation, the flow is averaged over a small spatial volume--but over the entire domain. We focus on the ignition and related interfacial phenomena, a problem that has rarely been studied in detail from a modeling standpoint. After verifying and validating the model, we examine the emission of buoyant reacting plumes from the surface of a heated solid and the near-field flow dynamics. We observed indications of flow instabilities similar to those seen in Rayleigh-Taylor and Kelvin-Helmholtz phenomena. Our analysis highlighted that the inflectional velocity profile close to the interface triggers the generation of vorticity due to viscous torque, linked with Kelvin-Helmholtz instabilities. Gravitational and baroclinic torques play key roles in vortex growth in the surrounding fluid region. These flow characteristics could significantly influence the mixing of oxidizer and fuel, ignition processes, and fire propagation. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: Submitted to Proceedings of the Combustion Institute

arXiv:2306.07847 [pdf, other]

Hybrid-Delta Tracking on a Structured Mesh in MCATK

Authors: J. P. Morgan, Travis J. Trahan, Timothy P. Burke, Colin J. Josey, Kyle E. Niemeyer

Abstract: Monte Carlo Application Toolkit (MCATK) commonly uses surface tracking on a structured mesh to compute scalar fluxes. In this mode, higher fidelity requires more mesh cells and isotopes and thus more computational overhead -- since every time a particle changes cells, new cross-sections must be found for all materials in a given cell -- even if no collision occurs in that cell. We implement a hybr… ▽ More Monte Carlo Application Toolkit (MCATK) commonly uses surface tracking on a structured mesh to compute scalar fluxes. In this mode, higher fidelity requires more mesh cells and isotopes and thus more computational overhead -- since every time a particle changes cells, new cross-sections must be found for all materials in a given cell -- even if no collision occurs in that cell. We implement a hybrid version of Woodcock (delta) tracking on this imposed mesh to alleviate the number of cross-section lookups. This algorithm computes an energy-dependent microscopic majorant cross section is computed for the problem. Each time a particle enters a new cell, rather than computing a true macroscopic cross-section over all isotopes in the cell, the microscopic majorant cross-section is simply multiplied by the total number density of the cell to obtain a macroscopic majorant cross-section for the cell. Delta tracking is then performed within that single cell. This increases performance with minimal code changes, speeding up the solve time by a factor of 1.5 -- 1.75 for k-eigenvalue simulations and 1.2 -- 1.6 for fixed source simulations in a series of materially complex criticality benchmarks. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: 8 pages, 4 figures, M&C 2023 ANS conference

arXiv:2305.13555 [pdf, other]

Exploring One-Cell Inversion Method for Transient Transport on GPU

Authors: J. P. Morgan, Ilham Variansyah, Todd S. Palmer, Kyle E. Niemeyer

Abstract: To find deterministic solutions to the transient $S_N$ neutron transport equation, iterative schemes are typically used to treat the scattering (and fission) source terms. We explore the one-cell inversion iteration scheme to do this on the GPU and make comparisons to a source iteration scheme. We examine convergence behavior, through the analysis of spectral radii, of both one-cell inversion and… ▽ More To find deterministic solutions to the transient $S_N$ neutron transport equation, iterative schemes are typically used to treat the scattering (and fission) source terms. We explore the one-cell inversion iteration scheme to do this on the GPU and make comparisons to a source iteration scheme. We examine convergence behavior, through the analysis of spectral radii, of both one-cell inversion and source iterations. To further boost the GPU parallel efficiency, we derive a higher-order discretization method, simple corner balance (in space) and multiple balance (in time), to add more work to the threads and gain accuracy. Fourier analysis on this higher-order numerical method shows that it is unconditionally stable, but it can produce negative flux alterations that are critically damped through time. We explore a whole-problem (in all angle and all cell) sparse linear algebra framework, for both iterative schemes, to quickly produce performant code for GPUs. Despite one-cell inversion requiring additional iterations to convergence, those iterations can be done faster to provide a significant speedup over source iteration in quadrature sets at or below $S_{128}$. Going forward we will produce a two-dimensional implementation of this code to experiment with memory and performance impacts of a whole-problem framework including methods of synthetic acceleration and pre-conditioners for this scheme, then we will begin making direct comparisons to traditionally implemented source iteration in production code. △ Less

Submitted 9 August, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: 11 pages, 4 figures, M&C 2023 ANS conference

arXiv:2305.07636 [pdf, other]

Development of MC/DC: a performant, scalable, and portable Python-based Monte Carlo neutron transport code

Authors: Ilham Variansyah, J. P. Morgan, Jordan Northrop, Kyle E. Niemeyer, Ryan G. McClarren

Abstract: We discuss the current development of MC/DC (Monte Carlo Dynamic Code). MC/DC is primarily designed to serve as an exploratory Python-based MC transport code. However, it seeks to offer improved performance, massive scalability, and backend portability by leveraging Python code-generation libraries and implementing an innovative abstraction strategy and compilation scheme. Here, we verify MC/DC ca… ▽ More We discuss the current development of MC/DC (Monte Carlo Dynamic Code). MC/DC is primarily designed to serve as an exploratory Python-based MC transport code. However, it seeks to offer improved performance, massive scalability, and backend portability by leveraging Python code-generation libraries and implementing an innovative abstraction strategy and compilation scheme. Here, we verify MC/DC capabilities and perform an initial performance assessment. We found that MC/DC can run hundreds of times faster than its pure Python mode and about 2.5 times slower, but with comparable parallel scaling, than the high-performance MC code Shift for simple problems. Finally, to further exercise MC/DC's time-dependent MC transport capabilities, we propose a challenge problem based on the C5G7-TD benchmark model. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: 11 pages, 9 figures, M&C 2023 ANS conference

arXiv:2206.03836 [pdf, other]

doi 10.1080/13647830.2022.2071170

Smoldering combustion in cellulose and hemicellulose mixtures: Examining the roles of density, fuel composition, oxygen concentration, and moisture content

Authors: W. Jayani Jayasuriya, Tejas Chandrashekhar Mulky, Kyle E. Niemeyer

Abstract: Smoldering combustion plays a key role in wildfires in forests, grasslands, and peatlands due to its common occurrence in porous fuels like peat and duff. As a consequence, understanding smoldering behavior in these fuels is crucial. Such fuels are generally composed of cellulose, hemicellulose, and lignin. Here we present an updated computational model for simulating smoldering combustion in cell… ▽ More Smoldering combustion plays a key role in wildfires in forests, grasslands, and peatlands due to its common occurrence in porous fuels like peat and duff. As a consequence, understanding smoldering behavior in these fuels is crucial. Such fuels are generally composed of cellulose, hemicellulose, and lignin. Here we present an updated computational model for simulating smoldering combustion in cellulose and hemicellulose mixtures. We used this model to examine changes in smoldering propagation speed and peak temperatures with varying fuel composition and density. For a given fuel composition, increases in density decrease the propagation speed and increase mean peak temperature; for a given density, increases in hemicellulose content increase both propagation speed and peak temperature. We also examined the role of natural fuel expansion with the addition of water. Without expansion, addition of moisture content reduces the propagation speed primarily due to increasing (wet) fuel density. However, with fuel expansion similar to that observed in peat, the propagation speed increases due to the overall drop in fuel density. Finally, we studied the influence of fuel composition on critical moisture content of ignition and extinction: mixtures dominated by hemicellulose have 10% higher critical moisture content due to the increase in peak temperature. △ Less

Submitted 10 May, 2022; originally announced June 2022.

Comments: 25 pages, 16 figures. Combustion Theory and Modelling (2022)

arXiv:2205.05681 [pdf, other]

doi 10.1016/j.cpc.2022.108409

Accelerating reactive-flow simulations using vectorized chemistry integration

Authors: Nicholas J. Curtis, Kyle E. Niemeyer, Chih-Jen Sung

Abstract: The high cost of chemistry integration is a significant computational bottleneck for realistic reactive-flow simulations using operator splitting. Here we present a methodology to accelerate the solution of the chemical kinetic ordinary differential equations using single-instruction, multiple-data vector processing on CPUs using the OpenCL framework. First, we compared several vectorized integrat… ▽ More The high cost of chemistry integration is a significant computational bottleneck for realistic reactive-flow simulations using operator splitting. Here we present a methodology to accelerate the solution of the chemical kinetic ordinary differential equations using single-instruction, multiple-data vector processing on CPUs using the OpenCL framework. First, we compared several vectorized integration algorithms using chemical kinetic source terms and analytical Jacobians from the pyJac software against a widely used integration code, CVODEs. Next, we extended the OpenFOAM computational fluid dynamics library to incorporate the vectorized solvers, and we compared the accuracy of a fourth-order linearly implicit integrator -- both in vectorized form and a corresponding method native to OpenFOAM -- with the community standard chemical kinetics library Cantera. We then applied our methodology to a variety of chemical kinetic models, turbulent intensities, and simulation scales to examine a range of engineering and scientific scale problems, including (pseudo) steady-state as well as time-dependent Reynolds-averaged Navier--Stokes simulations of the Sandia flame D and the Volvo Flygmotor bluff-body stabilized, premixed flame. Subsequently, we compared the performance of the vectorized and native OpenFOAM integrators over the studied models and simulations and found that our vectorized approach performs up to 33--35x faster than the native OpenFOAM solver with high accuracy. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: 43 pages, 6 figures

arXiv:2108.08302 [pdf, other]

doi 10.1080/13647830.2022.2049882

Assessing diffusion model impacts on enstrophy and flame structure in turbulent lean premixed flames

Authors: Aaron J. Fillo, Peter E. Hamlington, Kyle E. Niemeyer

Abstract: Diffusive transport of mass occurs at small scales in turbulent premixed flames. As a result, multicomponent mass diffusion, which is often neglected in direct numerical simulations (DNS) of premixed combustion, has the potential to impact both turbulence and flame characteristics at small scales. In this study, we evaluate these impacts by examining enstrophy dynamics and the internal structure o… ▽ More Diffusive transport of mass occurs at small scales in turbulent premixed flames. As a result, multicomponent mass diffusion, which is often neglected in direct numerical simulations (DNS) of premixed combustion, has the potential to impact both turbulence and flame characteristics at small scales. In this study, we evaluate these impacts by examining enstrophy dynamics and the internal structure of the flame for lean premixed hydrogen-air combustion, neglecting secondary Soret and Dufour effects. We performed three-dimensional DNS of these flames by implementing the Stefan-Maxwell equations in the code NGA to represent multicomponent mass transport, and we simulated statistically planar lean premixed hydrogen-air flames using both mixture-averaged and multicomponent models. The mixture-averaged model underpredicts the peak enstrophy by up to 13% in the flame front. Comparing the enstrophy budgets of these flames, the multicomponent simulation yields larger peak magnitudes compared to the mixture-averaged simulation in the reaction zone, showing differences of 17% and 14% in the normalized stretching and viscous effects terms. In the super-adiabatic regions of the flame, the mixture-averaged model overpredicts the viscous effects by up to 13%. To assess the effect of these differences on flame structure, we reconstructed the average local internal structure of the turbulent flame through statistical analysis of the scalar gradient field. Based on this analysis, we show that large differences in viscous effects contribute to significant differences in the average local flame structure between the two models. △ Less

Submitted 25 February, 2022; v1 submitted 18 August, 2021; originally announced August 2021.

Comments: 15 pages, 6 figures

arXiv:2105.10332 [pdf, other]

The Two-Dimensional Swept Rule Applied on Heterogeneous Architectures

Authors: Anthony S. Walker, Kyle E. Niemeyer

Abstract: The partial differential equations describing compressible fluid flows can be notoriously difficult to resolve on a pragmatic scale and often require the use of high performance computing systems and/or accelerators. However, these systems face scaling issues such as latency, the fixed cost of communicating information between devices in the system. The swept rule is a technique designed to minimi… ▽ More The partial differential equations describing compressible fluid flows can be notoriously difficult to resolve on a pragmatic scale and often require the use of high performance computing systems and/or accelerators. However, these systems face scaling issues such as latency, the fixed cost of communicating information between devices in the system. The swept rule is a technique designed to minimize these costs by obtaining a solution to unsteady equations at as many possible spatial locations and times prior to communicating. In this study, we implemented and tested the swept rule for solving two-dimensional problems on heterogeneous computing systems across two distinct systems. Our solver showed a speedup range of 0.22-2.71 for the heat diffusion equation and 0.52-1.46 for the compressible Euler equations. We can conclude from this study that the swept rule offers both potential for speedups and slowdowns and that care should be taken when designing such a solver to maximize benefits. These results can help make decisions to maximize these benefits and inform designs. △ Less

Submitted 1 April, 2021; originally announced May 2021.

Comments: 18 pages, 11 figures

arXiv:2009.09840 [pdf, other]

doi 10.1016/j.combustflame.2020.09.013

Assessing the impact of multicomponent diffusion in direct numerical simulations of premixed, high-Karlovitz, turbulent flames

Authors: Aaron J. Fillo, Jason Schlup, Guillaume Blanquart, Kyle E. Niemeyer

Abstract: Implementing multicomponent diffusion models in numerical combustion studies is computationally expensive; to reduce cost, numerical simulations commonly use mixture-averaged diffusion treatments or simpler models. However, the accuracy and appropriateness of mixture-averaged diffusion has not been verified for three-dimensional, turbulent, premixed flames. In this study we evaluated the role of m… ▽ More Implementing multicomponent diffusion models in numerical combustion studies is computationally expensive; to reduce cost, numerical simulations commonly use mixture-averaged diffusion treatments or simpler models. However, the accuracy and appropriateness of mixture-averaged diffusion has not been verified for three-dimensional, turbulent, premixed flames. In this study we evaluated the role of multicomponent mass diffusion in premixed, three-dimensional high Karlovitz-number hydrogen, n-heptane, and toluene flames, representing a range of fuel Lewis numbers. We also studied a premixed, unstable two-dimensional hydrogen flame due to the importance of diffusion effects in such cases. Our comparison of diffusion flux vectors revealed differences of 10-20% on average between the mixture-averaged and multicomponent diffusion models, and greater than 40% in regions of high flame curvature. Overall, however, the mixture-averaged model produces small differences in diffusion flux compared with global turbulent flame statistics. To evaluate the impact of these differences between the two models, we compared normalized turbulent flame speeds and conditional means of species mass fraction and source term. We found differences of 5-20% in the mean normalized turbulent flame speeds, which seem to correspond to differences of 5-10% in the peak fuel source terms. Our results motivate further study into whether the mixture-averaged diffusion model is always appropriate for DNS of premixed turbulent flames. △ Less

Submitted 14 October, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

Comments: 50 pages, 7 figures. arXiv admin note: text overlap with arXiv:1808.05463

MSC Class: 80A25; 80A19 ACM Class: I.6.4; J.2

Journal ref: Combust. Flame 223 (2021) 216-229

arXiv:1811.08282 [pdf, other]

Applying the swept rule for solving explicit partial differential equations on heterogeneous computing systems

Authors: Daniel J. Magee, Anthony S. Walker, Kyle E. Niemeyer

Abstract: Applications that exploit the architectural details of high-performance computing (HPC) systems have become increasingly invaluable in academia and industry over the past two decades. The most important hardware development of the last decade in HPC has been the General Purpose Graphics Processing Unit (GPGPU), a class of massively parallel devices that now contributes the majority of computationa… ▽ More Applications that exploit the architectural details of high-performance computing (HPC) systems have become increasingly invaluable in academia and industry over the past two decades. The most important hardware development of the last decade in HPC has been the General Purpose Graphics Processing Unit (GPGPU), a class of massively parallel devices that now contributes the majority of computational power in the top 500 supercomputers. As these systems grow, small costs such as latency---due to the fixed cost of memory accesses and communication---accumulate in a large simulation and become a significant barrier to performance. The swept time-space decomposition rule is a communication-avoiding technique for time-step** stencil update formulas that attempts to reduce latency costs. This work extends the swept rule by targeting heterogeneous, CPU/GPU architectures representing current and future HPC systems. We compare our approach to a naive decomposition scheme with two test equations using an MPI+CUDA pattern on 40 processes over two nodes containing one GPU. The swept rule produces a factor of 1.9 to 23 speedup for the heat equation and a factor of 1.1 to 2.0 speedup for the Euler equations, using the same processors and work distribution, and with the best possible configurations. These results show the potential effectiveness of the swept rule for different equations and numerical schemes on massively parallel computing systems that incur substantial latency costs. △ Less

Submitted 13 May, 2020; v1 submitted 14 November, 2018; originally announced November 2018.

Comments: 24 pages, 9 figures. Accepted for publication by the Journal of Supercomputing

arXiv:1809.02509 [pdf, other]

doi 10.1029/2018MS001486

Effects of Langmuir Turbulence on Upper Ocean Carbonate Chemistry

Authors: Katherine M. Smith, Peter E. Hamlington, Kyle E. Niemeyer, Baylor Fox-Kemper, Nicole S. Lovenduski

Abstract: Effects of wave-driven Langmuir turbulence on the air-sea flux of carbon dioxide (CO$_2$) are examined using large eddy simulations featuring actively reacting carbonate chemistry in the ocean mixed layer at small scales. Four strengths of Langmuir turbulence are examined with three types of carbonate chemistry: time-dependent, instantaneous equilibrium chemistry, and no reactions. The time-depend… ▽ More Effects of wave-driven Langmuir turbulence on the air-sea flux of carbon dioxide (CO$_2$) are examined using large eddy simulations featuring actively reacting carbonate chemistry in the ocean mixed layer at small scales. Four strengths of Langmuir turbulence are examined with three types of carbonate chemistry: time-dependent, instantaneous equilibrium chemistry, and no reactions. The time-dependent model is obtained by reducing a detailed eight-species chemical mechanism using computational singular perturbation analysis, resulting in a quasi-steady-state approximation for hydrogen ion (H$^+$), i.e., fixed pH. The reduced mechanism is then integrated in two half-time steps before and after the advection solve using a Runge--Kutta--Chebyshev scheme that is robust for stiff systems of differential equations. The simulations show that, as the strength of Langmuir turbulence increases, CO$_2$ fluxes are enhanced by rapid overturning of the near-surface layer, which rivals the removal rate of CO$_2$ by time-dependent reactions. Equilibrium chemistry and non-reactive models are found to bring more and less carbon, respectively, into the ocean as compared to the more realistic time-dependent model. These results have implications for Earth system models that either neglect Langmuir turbulence or use equilibrium, instead of time-dependent, chemical mechanisms. △ Less

Submitted 7 September, 2018; originally announced September 2018.

Comments: 25 pages, 9 figures

arXiv:1809.01029 [pdf, other]

doi 10.1016/j.combustflame.2018.09.008

Using SIMD and SIMT vectorization to evaluate sparse chemical kinetic Jacobian matrices and thermochemical source terms

Authors: Nicholas J. Curtis, Kyle E. Niemeyer, Chih-Jen Sung

Abstract: Accurately predicting key combustion phenomena in reactive-flow simulations, e.g., lean blow-out, extinction/ignition limits and pollutant formation, necessitates the use of detailed chemical kinetics. The large size and high levels of numerical stiffness typically present in chemical kinetic models relevant to transportation/power-generation applications make the efficient evaluation/factorizatio… ▽ More Accurately predicting key combustion phenomena in reactive-flow simulations, e.g., lean blow-out, extinction/ignition limits and pollutant formation, necessitates the use of detailed chemical kinetics. The large size and high levels of numerical stiffness typically present in chemical kinetic models relevant to transportation/power-generation applications make the efficient evaluation/factorization of the chemical kinetic Jacobian and thermochemical source-terms critical to the performance of reactive-flow codes. Here we investigate the performance of vectorized evaluation of constant-pressure/volume thermochemical source-term and sparse/dense chemical kinetic Jacobians using single-instruction, multiple-data (SIMD) and single-instruction, multiple thread (SIMT) paradigms. These are implemented in pyJac, an open-source, reproducible code generation platform. A new formulation of the chemical kinetic governing equations was derived and verified, resulting in Jacobian sparsities of 28.6-92.0% for the tested models. Speedups of 3.40-4.08x were found for shallow-vectorized OpenCL source-rate evaluation compared with a parallel OpenMP code on an avx2 central processing unit (CPU), increasing to 6.63-9.44x and 3.03-4.23x for sparse and dense chemical kinetic Jacobian evaluation, respectively. Furthermore, the effect of data-ordering was investigated and a storage pattern specifically formulated for vectorized evaluation was proposed; as well, the effect of the constant pressure/volume assumptions and varying vector widths were studied on source-term evaluation performance. Speedups reached up to 17.60x and 45.13x for dense and sparse evaluation on the GPU, and up to 55.11x and 245.63x on the CPU over a first-order finite-difference Jacobian approach. Further, dense Jacobian evaluation was up to 19.56x and 2.84x times faster than a previous version of pyJac on a CPU and GPU, respectively. △ Less

Submitted 4 September, 2018; originally announced September 2018.

Comments: 53 pages, 13 figures

Journal ref: Combust. Flame 198 (2018) 186-204

arXiv:1808.05463 [pdf, other]

doi 10.1016/j.jcp.2019.109185

A fast, low-memory, and stable algorithm for implementing multicomponent transport in direct numerical simulations

Authors: Aaron J. Fillo, Jason Schlup, Guillaume Beardsell, Guillaume Blanquart, Kyle E. Niemeyer

Abstract: Implementing multicomponent diffusion models in reacting-flow simulations is computationally expensive due to the challenges involved in calculating diffusion coefficients. Instead, mixture-averaged diffusion treatments are typically used to avoid these costs. However, to our knowledge, the accuracy and appropriateness of the mixture-averaged diffusion models has not been verified for three-dimens… ▽ More Implementing multicomponent diffusion models in reacting-flow simulations is computationally expensive due to the challenges involved in calculating diffusion coefficients. Instead, mixture-averaged diffusion treatments are typically used to avoid these costs. However, to our knowledge, the accuracy and appropriateness of the mixture-averaged diffusion models has not been verified for three-dimensional turbulent premixed flames. In this study we propose a fast,efficient, low-memory algorithm and use that to evaluate the role of multicomponent mass diffusion in reacting-flow simulations. Direct numerical simulation of these flames is performed by implementing the Stefan-Maxwell equations in NGA. A semi-implicit algorithm decreases the computational expense of inverting the full multicomponent ordinary diffusion array while maintaining accuracy and fidelity. We first verify the method by performing one-dimensional simulations of premixed hydrogen flames and compare with matching cases in Cantera. We demonstrate the algorithm to be stable, and its performance scales approximately with the number of species squared. Then, as an initial study of multicomponent diffusion, we simulate premixed, three-dimensional turbulent hydrogen flames, neglecting secondary Soret and Dufour effects. Simulation conditions are carefully selected to match previously published results and ensure valid comparison. Our results show that using the mixture-averaged diffusion assumption leads to a 15% under-prediction of the normalized turbulent flame speed for a premixed hydrogen-air flame. This difference in the turbulent flame speed motivates further study into using the mixture-averaged diffusion assumption for DNS of moderate-to-high Karlovitz number flames. △ Less

Submitted 6 November, 2019; v1 submitted 18 July, 2018; originally announced August 2018.

Comments: 36 pages, 14 figures

Journal ref: J. Comput. Phys. 406 (2020) 109185

arXiv:1806.08396 [pdf, other]

doi 10.1016/j.proci.2018.06.164

Computational study of the effects of density, fuel content, and moisture content on smoldering propagation of cellulose and hemicellulose mixtures

Authors: Tejas Chandrashekhar Mulky, Kyle E. Niemeyer

Abstract: Smoldering combustion plays an important role in forest and wildland fires. Fires from smoldering combustion can last for long periods of time, emit more pollutants, and be difficult to extinguish. This makes the study of smoldering in woody fuels and forest duff important. Cellulose, hemicellulose, and lignin are the major constituents in these type of fuels, in different proportions for differen… ▽ More Smoldering combustion plays an important role in forest and wildland fires. Fires from smoldering combustion can last for long periods of time, emit more pollutants, and be difficult to extinguish. This makes the study of smoldering in woody fuels and forest duff important. Cellulose, hemicellulose, and lignin are the major constituents in these type of fuels, in different proportions for different fuels. In this paper, we developed a 1-D model using the open-source software Gpyro to study the smoldering combustion of cellulose and hemicellulose mixtures. We first validated our simulations against experimentally obtained values of propagation speed for mixtures with fuel compositions including 100%, 75%, 50%, and 25% cellulose, with the remaining proportion of hemicellulose. Then, we studied the effects of varying fuel composition, density, and moisture content on smoldering combustion. We find that propagation speed of smoldering increased with decreases in density and increases in hemicellulose content, which we attribute to the role of oxygen diffusion. Propagation speed increased with moisture content for pure cellulose up to a certain limiting value, after which the propagation speed dropped by up to 70%. The mean peak temperature of smoldering increased with increases in hemicellulose content and density, and decreased with increasing moisture content. △ Less

Submitted 21 June, 2018; originally announced June 2018.

Comments: 16 pages, 5 figures

Journal ref: Proc. Combust. Inst. 37 (2019) 4091-4098

arXiv:1806.06982 [pdf, other]

doi 10.1021/acs.energyfuels.8b01313

FACE gasoline surrogates formulated by an enhanced multivariate optimization framework

Authors: Shane R. Daly, Kyle E. Niemeyer, William J. Cannella, Christopher L. Hagen

Abstract: Design and optimization of higher efficiency, lower-emission internal combustion engines are highly dependent on fuel chemistry. Resolving chemistry for complex fuels, like gasoline, is challenging. A solution is to study a fuel surrogate: a blend of a small number of well-characterized hydrocarbons to represent real fuels by emulating their thermophysical and chemical kinetics properties. In the… ▽ More Design and optimization of higher efficiency, lower-emission internal combustion engines are highly dependent on fuel chemistry. Resolving chemistry for complex fuels, like gasoline, is challenging. A solution is to study a fuel surrogate: a blend of a small number of well-characterized hydrocarbons to represent real fuels by emulating their thermophysical and chemical kinetics properties. In the current study, an existing gasoline surrogate formulation algorithm is further enhanced by incorporating novel chemometric models. These models use infrared spectra of hydrocarbon fuels to predict octane numbers, and are valid for a wide array of neat hydrocarbons and mixtures of such. This work leverages 14 hydrocarbon species to form tailored surrogate palettes for the Fuels for Advanced Combustion Engine (FACE) gasolines, including candidate component species not previously considered: n-pentane, 2-methylpentane, 1-pentene, cyclohexane, and o-xylene. We evaluate the performance of "full" and "reduced" surrogates for the 10 fuels for advanced combustion engine (FACE) gasolines, containing between 8-12 and 4-7 components, respectively. These surrogates match the target properties of the real fuels, on average, within 5 %. This close agreement demonstrates that the algorithm can design surrogates matching the wide array of target properties: octane numbers, density, hydrogen-to-carbon ratio, distillation characteristics, and proportions of carbon-carbon bond types. We also compare our surrogates to those available in literature (FACE gasolines A, C, F, G, I and J). Overall, the approach demonstrated here offers a promising method to better design surrogates for gasoline-like fuels with a wide array of properties. △ Less

Submitted 18 June, 2018; originally announced June 2018.

Comments: 48 pages, 7 figures

Journal ref: Energy Fuels 32 (2018) 7916-7932

arXiv:1708.02232 [pdf, other]

doi 10.1016/j.combustflame.2017.11.018

Assessing impacts of discrepancies in model parameters on autoignition model performance: a case study using butanol

Authors: Sai Krishna Sirumalla, Morgan A. Mayer, Kyle E. Niemeyer, Richard H. West

Abstract: Side-by-side comparison of detailed kinetic models using a new tool to aid recognition of species structures reveals significant discrepancies in the published rates of many reactions and thermochemistry of many species. We present a first automated assessment of the impact of these varying parameters on observable quantities of interest---in this case, autoignition delay---using literature experi… ▽ More Side-by-side comparison of detailed kinetic models using a new tool to aid recognition of species structures reveals significant discrepancies in the published rates of many reactions and thermochemistry of many species. We present a first automated assessment of the impact of these varying parameters on observable quantities of interest---in this case, autoignition delay---using literature experimental data. A recent kinetic model for the isomers of butanol was imported into a common database. Individual reaction rate and thermodynamic parameters of species were varied using values encountered in combustion models from recent literature. The effects of over 1600 alternative parameters were considered. Separately, experimental data were collected from recent publications and converted into the standard YAML-based ChemKED format. The Cantera-based model validation tool, PyTeCK, was used to automatically simulate autoignition using the generated models and experimental data, to judge the performance of the models. Taken individually, most of the parameter substitutions have little effect on the overall model performance, although a handful have quite large effects, and are investigated more thoroughly. Additionally, models varying multiple parameters simultaneously were evolved using a genetic algorithm to give fastest and slowest autoignition delay times, showing that changes exceeding a factor of 10 in ignition delay time are possible by cherry-picking from only accepted, published parameters. All data and software used in this study are available openly. △ Less

Submitted 24 December, 2017; v1 submitted 7 August, 2017; originally announced August 2017.

Comments: 25 pages, 4 figures; More parameter sources found (Tables 8 and 10) and Supplementary Material expanded with list of models

Journal ref: Combust. Flame 190 (2018) 284-292

arXiv:1706.02043 [pdf, other]

doi 10.1021/acs.energyfuels.6b01857

Reduced chemistry for butanol isomers at engine-relevant conditions

Authors: Xin Hui, Kyle E. Niemeyer, Kyle B. Brady, Chih-Jen Sung

Abstract: Butanol has received significant research attention as a second-generation biofuel in the past few years. In the present study, skeletal mechanisms for four butanol isomers were generated from two widely accepted, well-validated detailed chemical kinetic models for the butanol isomers. The detailed models were reduced using a two-stage approach consisting of the directed relation graph with error… ▽ More Butanol has received significant research attention as a second-generation biofuel in the past few years. In the present study, skeletal mechanisms for four butanol isomers were generated from two widely accepted, well-validated detailed chemical kinetic models for the butanol isomers. The detailed models were reduced using a two-stage approach consisting of the directed relation graph with error propagation and sensitivity analysis. During the reduction process, issues were encountered with pressure-dependent reactions formulated using the logarithmic pressure interpolation approach; these issues are discussed and recommendations made to avoid ambiguity in its future implementation in mechanism development. The performance of the skeletal mechanisms generated here was compared with that of detailed mechanisms in simulations of autoignition delay times, laminar flame speeds, and perfectly stirred reactor temperature response curves and extinction residence times, over a wide range of pressures, temperatures, and equivalence ratios. The detailed and skeletal mechanisms agreed well, demonstrating the adequacy of the resulting reduced chemistry for all the butanol isomers in predicting global combustion phenomena. In addition, the skeletal mechanisms closely predicted the time-histories of fuel mass fractions in homogeneous compression-ignition engine simulations. The performance of each butanol isomer was additionally compared with that of a gasoline surrogate with an antiknock index of 87 in a homogeneous compression-ignition engine simulation. The gasoline surrogate was consumed faster than any of the butanol isomers, with tert-butanol exhibiting the slowest fuel consumption rate. While n-butanol and isobutanol displayed the most similar consumption profiles relative to the gasoline surrogate, the two literature chemical kinetic models predicted different orderings. △ Less

Submitted 7 June, 2017; originally announced June 2017.

Comments: 39 pages, 16 figures. Supporting information available via https://doi.org/10.1021/acs.energyfuels.6b01857

MSC Class: 80A30 (Primary); 80A25 (Secondary)

Journal ref: Energy Fuels 31 (2017) 867-881

arXiv:1706.01987 [pdf, other]

doi 10.1002/kin.21142

ChemKED: a human- and machine-readable data standard for chemical kinetics experiments

Authors: Bryan W. Weber, Kyle E. Niemeyer

Abstract: Fundamental experimental measurements of quantities such as ignition delay times, laminar flame speeds, and species profiles (among others) serve important roles in understanding fuel chemistry and validating chemical kinetic models. However, despite both the importance and abundance of such information in the literature, the community lacks a widely adopted standard format for this data. This imp… ▽ More Fundamental experimental measurements of quantities such as ignition delay times, laminar flame speeds, and species profiles (among others) serve important roles in understanding fuel chemistry and validating chemical kinetic models. However, despite both the importance and abundance of such information in the literature, the community lacks a widely adopted standard format for this data. This impedes both sharing and wide use by the community. Here we introduce a new chemical kinetics experimental data format, ChemKED, and the related Python-based package for validating and working with ChemKED-formatted files called PyKED. We also review past and related efforts, and motivate the need for a new solution. ChemKED currently supports the representation of autoignition delay time measurements from shock tubes and rapid compression machines. ChemKED-formatted files contain all of the information needed to simulate experimental data points, including the uncertainty of the data. ChemKED is based on the YAML data serialization language, and is intended as a human- and machine-readable standard for easy creation and automated use. Development of ChemKED and PyKED occurs openly on GitHub under the BSD 3-clause license, and contributions from the community are welcome. Plans for future development include support for experimental data from laminar flame, jet stirred reactor, and speciation measurements. △ Less

Submitted 15 November, 2017; v1 submitted 6 June, 2017; originally announced June 2017.

Comments: 22 pages, accepted for publication in the International Journal of Chemical Kinetics

arXiv:1705.03162 [pdf, other]

doi 10.1016/j.jcp.2017.12.028

Accelerating solutions of one-dimensional unsteady PDEs with GPU-based swept time-space decomposition

Authors: Daniel J Magee, Kyle E Niemeyer

Abstract: The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution difficult to achieve in reasonable amounts of time---even in effectively parallelized solvers. Though the scale of t… ▽ More The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution difficult to achieve in reasonable amounts of time---even in effectively parallelized solvers. Though the scale of the problem relative to available computing power is the greatest impediment to accelerating these applications, significant performance gains can be achieved through careful attention to the details of memory communication and access. The swept time-space decomposition rule reduces communication between sub-domains by exhausting the domain of influence before communicating boundary values. Here we present a GPU implementation of the swept rule, which modifies the algorithm for improved performance on this processing architecture by prioritizing use of private (shared) memory, avoiding interblock communication, and overwriting unnecessary values. It shows significant improvement in the execution time of finite-difference solvers for one-dimensional unsteady PDEs, producing speedups of 2--9$\times$ for a range of problem sizes, respectively, compared with simple GPU versions and 7--300$\times$ compared with parallel CPU versions. However, for a more sophisticated one-dimensional system of equations discretized with a second-order finite-volume scheme, the swept rule performs 1.2--1.9$\times$ worse than a standard implementation for all problem sizes. △ Less

Submitted 10 November, 2017; v1 submitted 8 May, 2017; originally announced May 2017.

Comments: 25 pages, 10 figures

MSC Class: 65M55; 65N55; 68W10; 35Q35

Journal ref: J. Comput. Phys. 357 (2018) 338-352

arXiv:1612.02495 [pdf, other]

An initial investigation of the performance of GPU-based swept time-space decomposition

Authors: Daniel Magee, Kyle E Niemeyer

Abstract: Simulations of physical phenomena are essential to the expedient design of precision components in aerospace and other high-tech industries. These phenomena are often described by mathematical models involving partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution that is difficult to achieve in a reasonable amount of ti… ▽ More Simulations of physical phenomena are essential to the expedient design of precision components in aerospace and other high-tech industries. These phenomena are often described by mathematical models involving partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution that is difficult to achieve in a reasonable amount of time even in effectively parallelized solvers. Though the scale of the problem relative to available computing power is the greatest impediment to accelerating these applications, significant performance gains can be achieved through careful attention to the details of memory accesses. Parallelized PDE solvers are subject to a trade-off in memory management: store the solution for each timestep in abundant, global memory with high access costs or in a limited, private memory with low access costs that must be passed between nodes. The GPU implementation of swept time-space decomposition presented here mitigates this dilemma by using private (shared) memory, avoiding internode communication, and overwriting unnecessary values. It shows significant improvement in the execution time of the PDE solvers in one dimension achieving speedups of 6-2x for large and small problem sizes respectively compared to naive GPU versions and 7-300x compared to parallel CPU versions. △ Less

Submitted 3 January, 2017; v1 submitted 7 December, 2016; originally announced December 2016.

Comments: 14 pages; submitted to 2017 AIAA SciTech Forum

MSC Class: 65M55 (Primary); 35Q35 (Secondary) ACM Class: G.1.8; G.4; J.2

arXiv:1611.02274 [pdf, other]

doi 10.1007/978-3-319-06548-9_8

GPU-Based Parallel Integration of Large Numbers of Independent ODE Systems

Authors: Kyle E Niemeyer, Chih-Jen Sung

Abstract: The task of integrating a large number of independent ODE systems arises in various scientific and engineering areas. For nonstiff systems, common explicit integration algorithms can be used on GPUs, where individual GPU threads concurrently integrate independent ODEs with different initial conditions or parameters. One example is the fifth-order adaptive Runge-Kutta-Cash-Karp (RKCK) algorithm. In… ▽ More The task of integrating a large number of independent ODE systems arises in various scientific and engineering areas. For nonstiff systems, common explicit integration algorithms can be used on GPUs, where individual GPU threads concurrently integrate independent ODEs with different initial conditions or parameters. One example is the fifth-order adaptive Runge-Kutta-Cash-Karp (RKCK) algorithm. In the case of stiff ODEs, standard explicit algorithms require impractically small time-step sizes for stability reasons, and implicit algorithms are therefore commonly used instead to allow larger time steps and reduce the computational expense. However, typical high-order implicit algorithms based on backwards differentiation formulae (e.g., VODE, LSODE) involve complex logical flow that causes severe thread divergence when implemented on GPUs, limiting the performance. Therefore, alternate algorithms are needed. A GPU-based Runge-Kutta-Chebyshev (RKC) algorithm can handle moderate levels of stiffness and performs significantly faster than not only an equivalent CPU version but also a CPU-based implicit algorithm (VODE) based on results shown in the literature. In this chapter, we present the mathematical background, implementation details, and source code for the RKCK and RKC algorithms for use integrating large numbers of independent systems of ODEs on GPUs. In addition, brief performance comparisons are shown for each algorithm, demonstrating the potential benefit of moving to GPU-based ODE integrators. △ Less

Submitted 6 November, 2016; originally announced November 2016.

Comments: 21 pages, 2 figures

MSC Class: 80A32 (Primary) 80A30; 65L04; 65L06 (Secondary)

Journal ref: Numerical Computations with GPUs, Ch. 8 (2014) 159-182. V Kindratenko (Ed.)

arXiv:1608.05794 [pdf, other]

doi 10.1016/j.cpc.2018.01.015

Accelerating finite-rate chemical kinetics with coprocessors: comparing vectorization methods on GPUs, MICs, and CPUs

Authors: Christopher P. Stone, Andrew T. Alferman, Kyle E. Niemeyer

Abstract: Efficient ordinary differential equation solvers for chemical kinetics must take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and nonstiff Runge-Kutta solver are implemented using the single instruction, multiple thread (SIMT) and single instruction, mult… ▽ More Efficient ordinary differential equation solvers for chemical kinetics must take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and nonstiff Runge-Kutta solver are implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms with OpenCL. The performances of these parallel implementations were measured with three chemical kinetic models across several multicore and many-core platforms. Two runtime benchmarks were conducted to clearly determine any performance advantage offered by either method: evaluating the right-hand-side source terms in parallel, and integrating a series of constant-pressure homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded code. The SIMT model on the host and Phi was 13-35% slower than the baseline while the SIMT model on the GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased 2.5-2.7x with the SIMD implementations on the host CPU and 4.7-4.9x with the Xeon Phi coprocessor compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.4-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. △ Less

Submitted 28 August, 2017; v1 submitted 20 August, 2016; originally announced August 2016.

Comments: 32 pages, 11 figures

MSC Class: 80A32 (Primary) 80A30; 65L04; 65L06 (Secondary)

Journal ref: Comput. Phys. Comm. 226 (2018) 18-29

arXiv:1607.05079 [pdf, other]

doi 10.1016/j.combustflame.2009.12.022

Skeletal mechanism generation for surrogate fuels using directed relation graph with error propagation and sensitivity analysis

Authors: Kyle E. Niemeyer, Chih-Jen Sung, Mandhapati P. Raju

Abstract: A novel implementation for the skeletal reduction of large detailed reaction mechanisms using the directed relation graph with error propagation and sensitivity analysis (DRGEPSA) is developed and presented with examples for three hydrocarbon components, n-heptane, iso-octane, and n-decane, relevant to surrogate fuel development. DRGEPSA integrates two previously developed methods, directed relati… ▽ More A novel implementation for the skeletal reduction of large detailed reaction mechanisms using the directed relation graph with error propagation and sensitivity analysis (DRGEPSA) is developed and presented with examples for three hydrocarbon components, n-heptane, iso-octane, and n-decane, relevant to surrogate fuel development. DRGEPSA integrates two previously developed methods, directed relation graph-aided sensitivity analysis (DRGASA) and directed relation graph with error propagation (DRGEP), by first applying DRGEP to efficiently remove many unimportant species prior to sensitivity analysis to further remove unimportant species, producing an optimally small skeletal mechanism for a given error limit. It is illustrated that the combination of the DRGEP and DRGASA methods allows the DRGEPSA approach to overcome the weaknesses of each, specifically that DRGEP cannot identify all unimportant species and that DRGASA shields unimportant species from removal. Skeletal mechanisms for n-heptane and iso-octane generated using the DRGEP, DRGASA, and DRGEPSA methods are presented and compared to illustrate the improvement of DRGEPSA. From a detailed reaction mechanism for n-alkanes covering n-octane to n-hexadecane with 2115 species and 8157 reactions, two skeletal mechanisms for n-decane generated using DRGEPSA, one covering a comprehensive range of temperature, pressure, and equivalence ratio conditions for autoignition and the other limited to high temperatures, are presented and validated. The comprehensive skeletal mechanism consists of 202 species and 846 reactions and the high-temperature skeletal mechanism consists of 51 species and 256 reactions. Both mechanisms are further demonstrated to well reproduce the results of the detailed mechanism in perfectly-stirred reactor and laminar flame simulations over a wide range of conditions. △ Less

Submitted 4 July, 2016; originally announced July 2016.

Comments: 47 pages, 10 figures

MSC Class: 80A30 (Primary) 68R10; 80A25 (Secondary) ACM Class: G.2.2; I.6.5

Journal ref: Combust. Flame 157 (2010) 1760-1770

arXiv:1607.03884 [pdf, other]

doi 10.1016/j.combustflame.2017.02.005

An investigation of GPU-based stiff chemical kinetics integration methods

Authors: Nicholas J. Curtis, Kyle E. Niemeyer, Chih-Jen Sung

Abstract: A fifth-order implicit Runge-Kutta method and two fourth-order exponential integration methods equipped with Krylov subspace approximations were implemented for the GPU and paired with the analytical chemical kinetic Jacobian software pyJac. The performance of each algorithm was evaluated by integrating thermochemical state data sampled from stochastic partially stirred reactor simulations and com… ▽ More A fifth-order implicit Runge-Kutta method and two fourth-order exponential integration methods equipped with Krylov subspace approximations were implemented for the GPU and paired with the analytical chemical kinetic Jacobian software pyJac. The performance of each algorithm was evaluated by integrating thermochemical state data sampled from stochastic partially stirred reactor simulations and compared with the commonly used CPU-based implicit integrator CVODE. We estimated that the implicit Runge-Kutta method running on a single GPU is equivalent to CVODE running on 12-38 CPU cores for integration of a single global integration time step of 1e-6 s with hydrogen and methane models. In the stiffest case studied---the methane model with a global integration time step of 1e-4 s---thread divergence and higher memory traffic significantly decreased GPU performance to the equivalent of CVODE running on approximately three CPU cores. The exponential integration algorithms performed more slowly than the implicit integrators on both the CPU and GPU. Thread divergence and memory traffic were identified as the main limiters of GPU integrator performance, and techniques to mitigate these issues were discussed. Use of a finite-difference Jacobian on the GPU---in place of the analytical Jacobian provided by pyJac---greatly decreased integrator performance due to thread divergence, resulting in maximum slowdowns of 7.11-240.96 times; in comparison, the corresponding slowdowns on the CPU were just 1.39-2.61 times, underscoring the importance of use of an analytical Jacobian for efficient GPU integration. Finally, future research directions for working towards enabling realistic chemistry in reactive-flow simulations via GPU\slash SIMD accelerated stiff chemical kinetic integration were identified. △ Less

Submitted 14 February, 2017; v1 submitted 13 July, 2016; originally announced July 2016.

Comments: 34 pages, 6 figures; pdfLaTeX

MSC Class: 80A32 (Primary); 80A30; 65L04; 65L06 (Secondary)

Journal ref: Combust. Flame 179 (2017) 312-324

arXiv:1606.07802 [pdf, other]

doi 10.1016/j.combustflame.2010.12.010

On the importance of graph search algorithms for DRGEP-based mechanism reduction methods

Authors: Kyle E. Niemeyer, Chih-Jen Sung

Abstract: The importance of graph search algorithm choice to the directed relation graph with error propagation (DRGEP) method is studied by comparing basic and modified depth-first search, basic and R-value-based breadth-first search (RBFS), and Dijkstra's algorithm. By using each algorithm with DRGEP to produce skeletal mechanisms from a detailed mechanism for n-heptane with randomly-shuffled species orde… ▽ More The importance of graph search algorithm choice to the directed relation graph with error propagation (DRGEP) method is studied by comparing basic and modified depth-first search, basic and R-value-based breadth-first search (RBFS), and Dijkstra's algorithm. By using each algorithm with DRGEP to produce skeletal mechanisms from a detailed mechanism for n-heptane with randomly-shuffled species order, it is demonstrated that only Dijkstra's algorithm and RBFS produce results independent of species order. In addition, each algorithm is used with DRGEP to generate skeletal mechanisms for n-heptane covering a comprehensive range of autoignition conditions for pressure, temperature, and equivalence ratio. Dijkstra's algorithm combined with a coefficient scaling approach is demonstrated to produce the most compact skeletal mechanism with a similar performance compared to larger skeletal mechanisms resulting from the other algorithms. The computational efficiency of each algorithm is also compared by applying the DRGEP method with each search algorithm on the large detailed mechanism for n-alkanes covering n-octane to n-hexadecane with 2115 species and 8157 reactions. Dijkstra's algorithm implemented with a binary heap priority queue is demonstrated as the most efficient method, with a CPU cost two orders of magnitude less than the other search algorithms. △ Less

Submitted 23 June, 2016; originally announced June 2016.

Comments: 17 pages, 2 figures

MSC Class: 80A30 (Primary); 80A25 (Secondary)

Journal ref: Combustion and Flame 158(8) (2011) 1439-1443

arXiv:1606.07122 [pdf, other]

doi 10.1016/j.fuel.2016.06.097

Predicting fuel research octane number using Fourier-transform infrared absorption spectra of neat hydrocarbons

Authors: Shane R. Daly, Kyle E. Niemeyer, William J. Cannella, Christopher L. Hagen

Abstract: Liquid transportation fuels require costly and time-consuming tests to characterize metrics, such as Research Octane Number (RON) for gasoline. If fuel sale restrictions requiring use of standard Cooperative Fuel Research testing procedures do not apply, these tests may be avoided by using multivariate statistical models to predict RON and other quantities. Here we show that an accurate statistica… ▽ More Liquid transportation fuels require costly and time-consuming tests to characterize metrics, such as Research Octane Number (RON) for gasoline. If fuel sale restrictions requiring use of standard Cooperative Fuel Research testing procedures do not apply, these tests may be avoided by using multivariate statistical models to predict RON and other quantities. Here we show that an accurate statistical model for the RON of gasoline and gasoline-like fuels can be constructed by ensuring the representation of key functional groups in the spectroscopic data set are used to train the model. We found that a principal component regression model for RON based on IR absorbance and informed using neat and 134 mixtures of n-heptane, isooctane, toluene, ethanol, methylcyclohexane, and 1-hexene could predict RON for the 10 Coordinating Research Council Fuels for Advanced Combustion Engine (FACE) gasolines and 12 FACE gasoline blends with ethanol within 34.8+/-36.1 on average and 51.2 in the worst case. We next studied the effect of adding 28 additional minor components found in the FACE gasolines to the statistical model, and determined that it was necessary to add additional representatives of the branched alkane and aromatics classes to reduce model error. For example, adding 2,3-dimethylpentane and xylene to the previous model allowed it to predict RON for the 22 target fuels within 0.3+/-4.4 on average and 7.9 in the worst case. However, we determined that the specific choice of fuel in those classes mattered less than ensuring the representation of the relevant functional group. This work builds upon previous efforts by creating models informed by neat and surrogate fuels---rather than complex real fuels---that could predict the performance of complex unknown fuels. △ Less

Submitted 22 June, 2016; originally announced June 2016.

Comments: Accepted for publication in Fuel

Journal ref: Fuel 183 (2016) 359-365

arXiv:1605.03262 [pdf, other]

doi 10.1016/j.cpc.2017.02.004

pyJac: analytical Jacobian generator for chemical kinetics

Authors: Kyle E. Niemeyer, Nicholas J. Curtis, Chih-Jen Sung

Abstract: Accurate simulations of combustion phenomena require the use of detailed chemical kinetics in order to capture limit phenomena such as ignition and extinction as well as predict pollutant formation. However, the chemical kinetic models for hydrocarbon fuels of practical interest typically have large numbers of species and reactions and exhibit high levels of mathematical stiffness in the governing… ▽ More Accurate simulations of combustion phenomena require the use of detailed chemical kinetics in order to capture limit phenomena such as ignition and extinction as well as predict pollutant formation. However, the chemical kinetic models for hydrocarbon fuels of practical interest typically have large numbers of species and reactions and exhibit high levels of mathematical stiffness in the governing differential equations, particularly for larger fuel molecules. In order to integrate the stiff equations governing chemical kinetics, generally reactive-flow simulations rely on implicit algorithms that require frequent Jacobian matrix evaluations. Some in situ and a posteriori computational diagnostics methods also require accurate Jacobian matrices, including computational singular perturbation and chemical explosive mode analysis. Typically, finite differences numerically approximate these, but for larger chemical kinetic models this poses significant computational demands since the number of chemical source term evaluations scales with the square of species count. Furthermore, existing analytical Jacobian tools do not optimize evaluations or support emerging SIMD processors such as GPUs. Here we introduce pyJac, a Python-based open-source program that generates analytical Jacobian matrices for use in chemical kinetics modeling and analysis. As a demonstration, we first establish the correctness of the Jacobian matrices for kinetic models of hydrogen, methane, ethylene, and isopentanol oxidation, then demonstrate the performance achievable on CPUs and GPUs using pyJac via matrix evaluation timing comparisons. △ Less

Submitted 19 February, 2017; v1 submitted 10 May, 2016; originally announced May 2016.

Comments: 42 pages, 7 figures

Journal ref: Comput. Phys. Comm. 215 (2017) 188-203

arXiv:1410.0401 [pdf, other]

doi 10.1021/ef5022126

Reduced chemistry for a gasoline surrogate valid at engine-relevant conditions

Authors: Kyle E. Niemeyer, Chih-Jen Sung

Abstract: A detailed mechanism for the four-component RD387 gasoline surrogate developed by Lawrence Livermore National Laboratory has shown good agreement with experiments in engine-relevant conditions. However, with 1388 species and 5933 reversible reactions, this detailed mechanism is far too large to use in practical engine simulations. Therefore, reduction of the detailed mechanism was performed using… ▽ More A detailed mechanism for the four-component RD387 gasoline surrogate developed by Lawrence Livermore National Laboratory has shown good agreement with experiments in engine-relevant conditions. However, with 1388 species and 5933 reversible reactions, this detailed mechanism is far too large to use in practical engine simulations. Therefore, reduction of the detailed mechanism was performed using a multi-stage approach consisting of the DRGEPSA method, unimportant reaction elimination, isomer lum**, and analytic QSS reduction based on CSP analysis. A new greedy sensitivity analysis algorithm was developed and demonstrated to be capable of removing more species for the same error limit compared to the conventional sensitivity analysis used in DRG-based skeletal reduction methods. Using this new greedy algorithm, several skeletal and reduced mechanisms were developed at varying levels of complexity and for different target condition ranges. The final skeletal and reduced mechanisms consisted of 213 and 148 species, respectively, for a lean-to-stoichiometric, low-temperature HCCI-like range of conditions. For a lean-to-rich, high-temperature, SI/CI-like range of conditions, skeletal and reduced mechanisms were developed with 97 and 79 species, respectively. The skeletal and reduced mechanisms in this study were produced using an error limit of 10% and validated using homogeneous autoignition simulations over engine-relevant conditions - all showed good agreement in predicting ignition delay. Furthermore, extended validation was performed, including comparison of autoignition temperature profiles, PSR temperature response curves and extinction turning points, and laminar flame speed calculations. All the extended validation showed results within the 10% error limit, demonstrating the adequacy of the resulting reduced chemistry. △ Less

Submitted 14 January, 2015; v1 submitted 1 October, 2014; originally announced October 2014.

Comments: To appear in Energy & Fuels

MSC Class: 80A30 (Primary); 80A25 (Secondary)

Journal ref: Energy Fuels 29 (2015) 1172-1185

arXiv:1405.3745 [pdf, other]

doi 10.1016/j.combustflame.2014.05.001

Mechanism reduction for multicomponent surrogates: a case study using toluene reference fuels

Authors: Kyle E Niemeyer, Chih-Jen Sung

Abstract: Strategies and recommendations for performing skeletal reductions of multicomponent surrogate fuels are presented, through the generation and validation of skeletal mechanisms for a three-component toluene reference fuel. Using the directed relation graph with error propagation and sensitivity analysis method followed by a further unimportant reaction elimination stage, skeletal mechanisms valid o… ▽ More Strategies and recommendations for performing skeletal reductions of multicomponent surrogate fuels are presented, through the generation and validation of skeletal mechanisms for a three-component toluene reference fuel. Using the directed relation graph with error propagation and sensitivity analysis method followed by a further unimportant reaction elimination stage, skeletal mechanisms valid over comprehensive and high-temperature ranges of conditions were developed at varying levels of detail. These skeletal mechanisms were generated based on autoignition simulations, and validation using ignition delay predictions showed good agreement with the detailed mechanism in the target range of conditions. When validated using phenomena other than autoignition, such as perfectly stirred reactor and laminar flame propagation, tight error control or more restrictions on the reduction during the sensitivity analysis stage were needed to ensure good agreement. In addition, tight error limits were needed for close prediction of ignition delay when varying the mixture composition away from that used for the reduction. In homogeneous compression-ignition engine simulations, the skeletal mechanisms closely matched the point of ignition and accurately predicted species profiles for lean to stoichiometric conditions. Furthermore, the efficacy of generating a multicomponent skeletal mechanism was compared to combining skeletal mechanisms produced separately for neat fuel components; using the same error limits, the latter resulted in a larger skeletal mechanism size that also lacked important cross reactions between fuel components. Based on the present results, general guidelines for reducing detailed mechanisms for multicomponent fuels are discussed. △ Less

Submitted 15 May, 2014; originally announced May 2014.

Comments: Accepted for publication in Combustion and Flame

MSC Class: 80A30 (Primary) 80A25 (Secondary)

Journal ref: Combust. Flame 161 (2014) 2752-2764

arXiv:1309.3018 [pdf, other]

doi 10.1007/s11227-013-1015-7

Recent progress and challenges in exploiting graphics processors in computational fluid dynamics

Authors: Kyle E Niemeyer, Chih-Jen Sung

Abstract: The progress made in accelerating simulations of fluid flow using GPUs, and the challenges that remain, are surveyed. The review first provides an introduction to GPU computing and programming, and discusses various considerations for improved performance. Case studies comparing the performance of CPU- and GPU- based solvers for the Laplace and incompressible Navier-Stokes equations are performed… ▽ More The progress made in accelerating simulations of fluid flow using GPUs, and the challenges that remain, are surveyed. The review first provides an introduction to GPU computing and programming, and discusses various considerations for improved performance. Case studies comparing the performance of CPU- and GPU- based solvers for the Laplace and incompressible Navier-Stokes equations are performed in order to demonstrate the potential improvement even with simple codes. Recent efforts to accelerate CFD simulations using GPUs are reviewed for laminar, turbulent, and reactive flow solvers. Also, GPU implementations of the lattice Boltzmann method are reviewed. Finally, recommendations for implementing CFD codes on GPUs are given and remaining challenges are discussed, such as the need to develop new strategies and redesign algorithms to enable GPU acceleration. △ Less

Submitted 11 September, 2013; originally announced September 2013.

Comments: In press in the Journal of Supercomputing

MSC Class: 76-04

Journal ref: J. Supercomput. 67 (2014) 528-564

arXiv:1309.2710 [pdf, other]

doi 10.1016/j.jcp.2013.09.025

Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs

Authors: Kyle E Niemeyer, Chih-Jen Sung

Abstract: The chemical kinetics ODEs arising from operator-split reactive-flow simulations were solved on GPUs using explicit integration algorithms. Nonstiff chemical kinetics of a hydrogen oxidation mechanism (9 species and 38 irreversible reactions) were computed using the explicit fifth-order Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster than single- and six-core CPU ver… ▽ More The chemical kinetics ODEs arising from operator-split reactive-flow simulations were solved on GPUs using explicit integration algorithms. Nonstiff chemical kinetics of a hydrogen oxidation mechanism (9 species and 38 irreversible reactions) were computed using the explicit fifth-order Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster than single- and six-core CPU versions by factors of 126 and 25, respectively, for 524,288 ODEs. Moderately stiff kinetics, represented with mechanisms for hydrogen/carbon-monoxide (13 species and 54 irreversible reactions) and methane (53 species and 634 irreversible reactions) oxidation, were computed using the stabilized explicit second-order Runge-Kutta-Chebyshev (RKC) algorithm. The GPU-based RKC implementation demonstrated an increase in performance of nearly 59 and 10 times, for problem sizes consisting of 262,144 ODEs and larger, than the single- and six-core CPU-based RKC algorithms using the hydrogen/carbon-monoxide mechanism. With the methane mechanism, RKC-GPU performed more than 65 and 11 times faster, for problem sizes consisting of 131,072 ODEs and larger, than the single- and six-core RKC-CPU versions, and up to 57 times faster than the six-core CPU-based implicit VODE algorithm on 65,536 ODEs. In the presence of more severe stiffness, such as ethylene oxidation (111 species and 1566 irreversible reactions), RKC-GPU performed more than 17 times faster than RKC-CPU on six cores for 32,768 ODEs and larger, and at best 4.5 times faster than VODE on six CPU cores for 65,536 ODEs. With a larger time step size, RKC-GPU performed at best 2.5 times slower than six-core VODE for 8192 ODEs and larger. Therefore, the need for develo** new strategies for integrating stiff chemistry on GPUs was discussed. △ Less

Submitted 4 November, 2013; v1 submitted 10 September, 2013; originally announced September 2013.

Comments: 27 pages, LaTeX; corrected typos in Appendix equations A.10 and A.11

MSC Class: 80A32 (Primary) 80A30; 65L04; 65L06 (Secondary)

Journal ref: J. Comput. Phys. 256 (2014) 854-871

Showing 1–32 of 32 results for author: Niemeyer, K E