Search | arXiv e-print repository

Predictive Limitations of Physics-Informed Neural Networks in Vortex Shedding

Authors: Pi-Yueh Chuang, Lorena A. Barba

Abstract: The recent surge of interest in physics-informed neural network (PINN) methods has led to a wave of studies that attest to their potential for solving partial differential equations (PDEs) and predicting the dynamics of physical systems. However, the predictive limitations of PINNs have not been thoroughly investigated. We look at the flow around a 2D cylinder and find that data-free PINNs are una… ▽ More The recent surge of interest in physics-informed neural network (PINN) methods has led to a wave of studies that attest to their potential for solving partial differential equations (PDEs) and predicting the dynamics of physical systems. However, the predictive limitations of PINNs have not been thoroughly investigated. We look at the flow around a 2D cylinder and find that data-free PINNs are unable to predict vortex shedding. Data-driven PINN exhibits vortex shedding only while the training data (from a traditional CFD solver) is available, but reverts to the steady state solution when the data flow stops. We conducted dynamic mode decomposition and analyze the Koopman modes in the solutions obtained with PINNs versus a traditional fluid solver (PetIBM). The distribution of the Koopman eigenvalues on the complex plane suggests that PINN is numerically dispersive and diffusive. The PINN method reverts to the steady solution possibly as a consequence of spectral bias. This case study reaises concerns about the ability of PINNs to predict flows with instabilities, specifically vortex shedding. Our computational study supports the need for more theoretical work to analyze the numerical properties of PINN methods. The results in this paper are transparent and reproducible, with all data and code available in public repositories and persistent archives; links are provided in the paper repository at \url{https://github.com/barbagroup/jcs_paper_pinn}, and a Reproducibility Statement within the paper. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2303.08394 [pdf, other]

doi 10.1109/MCSE.2023.3258288

PyExaFMM: an exercise in designing high-performance software with Python and Numba

Authors: Srinath Kailasa, Tingyu Wang, Lorena A. Barba, Timo Betcke

Abstract: Numba is a game-changing compiler for high-performance computing with Python. It produces machine code that runs outside of the single-threaded Python interpreter and that fully utilizes the resources of modern CPUs. This means support for parallel multithreading and auto vectorization if available, as with compiled languages such as C++ or Fortran. In this article we document our experience devel… ▽ More Numba is a game-changing compiler for high-performance computing with Python. It produces machine code that runs outside of the single-threaded Python interpreter and that fully utilizes the resources of modern CPUs. This means support for parallel multithreading and auto vectorization if available, as with compiled languages such as C++ or Fortran. In this article we document our experience develo** PyExaFMM, a multithreaded Numba implementation of the Fast Multipole Method, an algorithm with a non-linear data structure and a large amount of data organization. We find that designing performant Numba code for complex algorithms can be as challenging as writing in a compiled language. △ Less

Submitted 13 April, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 10 pages, 3 figures

MSC Class: 68-04 ACM Class: D.2.2

Journal ref: Computing in Science & Engineering, vol. 24, no. 05, pp. 77-84, 2022

arXiv:2205.14249 [pdf, other]

Experience report of physics-informed neural networks in fluid simulations: pitfalls and frustration

Authors: Pi-Yueh Chuang, Lorena A. Barba

Abstract: Though PINNs (physics-informed neural networks) are now deemed as a complement to traditional CFD (computational fluid dynamics) solvers rather than a replacement, their ability to solve the Navier-Stokes equations without given data is still of great interest. This report presents our not-so-successful experiments of solving the Navier-Stokes equations with PINN as a replacement for traditional s… ▽ More Though PINNs (physics-informed neural networks) are now deemed as a complement to traditional CFD (computational fluid dynamics) solvers rather than a replacement, their ability to solve the Navier-Stokes equations without given data is still of great interest. This report presents our not-so-successful experiments of solving the Navier-Stokes equations with PINN as a replacement for traditional solvers. We aim to, with our experiments, prepare readers for the challenges they may face if they are interested in data-free PINN. In this work, we used two standard flow problems: 2D Taylor-Green vortex at Re=100 and 2D cylinder flow at Re=200. The PINN method solved the 2D Taylor-Green vortex problem with acceptable results, and we used this flow as an accuracy and performance benchmark. About 32 hours of training were required for the PINN method's accuracy to match the accuracy of a 16x16 finite-difference simulation, which took less than 20 seconds. The 2D cylinder flow, on the other hand, did not produce a physical solution. The PINN method behaved like a steady-flow solver and did not capture the vortex shedding phenomenon. By sharing our experience, we would like to emphasize that the PINN method is still a work-in-progress, especially in terms of solving flow problems without any given data. More work is needed to make PINN feasible for real-world problems in such applications. △ Less

Submitted 22 July, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

Comments: 8 pages, 9 figures

arXiv:2204.12564 [pdf, ps, other]

Defining the role of open source software in research reproducibility

Authors: Lorena A. Barba

Abstract: Reproducibility is inseparable from transparency, as sharing data, code and computational environment is a pre-requisite for being able to retrace the steps of producing the research results. Others have made the case that this artifact sharing should adopt appropriate licensing schemes that permit reuse, modification and redistribution. I make a new proposal for the role of open source software,… ▽ More Reproducibility is inseparable from transparency, as sharing data, code and computational environment is a pre-requisite for being able to retrace the steps of producing the research results. Others have made the case that this artifact sharing should adopt appropriate licensing schemes that permit reuse, modification and redistribution. I make a new proposal for the role of open source software, stemming from the lessons it teaches about distributed collaboration and a commitment-based culture. Reviewing the defining features of open source software (licensing, development, communities), I look for explanation of its success from the perspectives of connectivism -- a learning theory for the digital age -- and the language-action framework of Winograd and Flores. I contend that reproducibility engenders trust, which we routinely build in community via conversations, and the practices of open source software help us to learn how to be more effective learning (discovering) together, contributing to the same goal. △ Less

Submitted 17 May, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

Comments: 10 pages. Accepted for publication in IEEE Computer

arXiv:2105.00775 [pdf, other]

[Re] Three-dimensional wake topology and propulsive performance of low-aspect-ratio pitching-rolling plates

Authors: Olivier Mesnard, Lorena A. Barba

Abstract: This article reports on a full replication study in computational fluid dynamics, using an immersed boundary method to obtain the flow around a pitching and rolling elliptical wing. As in the original study, the computational experiments investigate the wake topology and aerodynamic forces, looking at the effect of: Reynolds number (100--400), Strouhal number (0.4--1.2), aspect ratio, and rolling/… ▽ More This article reports on a full replication study in computational fluid dynamics, using an immersed boundary method to obtain the flow around a pitching and rolling elliptical wing. As in the original study, the computational experiments investigate the wake topology and aerodynamic forces, looking at the effect of: Reynolds number (100--400), Strouhal number (0.4--1.2), aspect ratio, and rolling/pitching phase difference. We also include a grid-independence study (from 5 to 72 million grid cells). The trends in aerodynamic performance and the characteristics of the wake topology were replicated, despite some differences in results. We declare the replication successful, and make fully available all the digital artifacts and workflow definitions, including software build recipes and container images, as well as secondary data and post-processing code. Run times for each computational experiment were between 8.1 and 13.8 hours to complete 5 flap** cycles, using two compute nodes with dual 20-core 3.7GHz Intel Xeon Gold 6148 CPUs and two NVIDIA V100 GPU devices each. △ Less

Submitted 3 May, 2021; originally announced May 2021.

Comments: 22 pages, 19 figures

arXiv:2103.01048 [pdf, other]

High-productivity, high-performance workflow for virus-scale electrostatic simulations with Bempp-Exafmm

Authors: Tingyu Wang, Christopher D. Cooper, Timo Betcke, Lorena A. Barba

Abstract: Biomolecular electrostatics is key in protein function and the chemical processes affecting it. Implicit-solvent models via the Poisson-Boltzmann (PB) equation provide insights with less computational cost than atomistic models, making large-system studies -- at the scale of viruses -- accessible to more researchers. Here we present a high-productivity and high-performance linear PB solver based o… ▽ More Biomolecular electrostatics is key in protein function and the chemical processes affecting it. Implicit-solvent models via the Poisson-Boltzmann (PB) equation provide insights with less computational cost than atomistic models, making large-system studies -- at the scale of viruses -- accessible to more researchers. Here we present a high-productivity and high-performance linear PB solver based on Exafmm, a fast multipole method library, and Bempp, a Galerkin boundary element method package. The workflow integrates an easy-to-use Python interface with optimized computational kernels, and can be run interactively via Jupyter notebooks, for faster prototy**. Our results show the capability of the software, confirm code correctness, and assess performance with between 8,000 and 2 million elements. Showcasing the power of this interactive computing platform, we study the conditioning of two variants of the boundary integral formulation with just a few lines of code. Mesh-refinement studies confirm convergence as $1/N$, for $N$ boundary elements, and a comparison with results from the trusted APBS code using various proteins shows agreement. Our binding energy calculations using 9 various complexes align with the results from using five other grid-based PB solvers. Performance results include timings, breakdowns, and computational complexity. Exafmm offers evaluation speeds of just a few seconds for tens of millions of points, and $\mathcal{O}(N)$ scaling. The trend observed in our performance comparison with APBS demonstrates the advantage of Bempp-Exafmm in applications involving larger structures or requiring higher accuracy. Computing the solvation free energy of a Zika virus, represented by 1.6 million atoms and 10 million boundary elements, took 80-min runtime on a single compute node (dual 20-core). △ Less

Submitted 25 December, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: 17 pages, 8 figure

arXiv:2008.05414 [pdf, other]

doi 10.1098/rsta.2020.0068

Reproducible Validation and Replication Studies in Nanoscale Physics

Authors: Natalia C. Clementi, Lorena A. Barba

Abstract: Credibility building activities in computational research include verification and validation, reproducibility and replication, and uncertainty quantification. Though orthogonal to each other, they are related. This paper presents validation and replication studies in electromagnetic excitations on nanoscale structures, where the quantity of interest is the wavelength at which resonance peaks occu… ▽ More Credibility building activities in computational research include verification and validation, reproducibility and replication, and uncertainty quantification. Though orthogonal to each other, they are related. This paper presents validation and replication studies in electromagnetic excitations on nanoscale structures, where the quantity of interest is the wavelength at which resonance peaks occur. The study uses the open-source software PyGBe: a boundary element solver with trecode acceleration and GPU capability. We replicate a result by Rockstuhl et al. (2005, doi:10/dsxw9d) with a two-dimensional boundary element method on silicon carbide particles, despite differences in our method. The second replication case from Ellis et al. (2016, doi:10/f83zcb) looks at aspect ratio effects on high-order modes of localized surface phonon-polariton nanostructures. The results partially replicate: the wavenumber position of some mode match, but for other modes they differ. With virtually no information about the original simulations, explaining the discrepancies is not possible. A comparison with experiments that measured polarized reflectance of silicon carbide nano pillars provides a validation case. The wavenumber of the dominant mode and two more do match, but differences remain in other minor modes. Results in this paper were produced with strict reproducibility practices, and we share reproducibility packages for all, including input files, execution scripts, secondary data, post-processing code and plotting scripts, and the figures (deposited in Zenodo). In view of the many challenges faced, we propose that reproducible practices make replication and validation more feasible. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: 20 pages, 11 figures

arXiv:2001.00228 [pdf, other]

doi 10.1109/MCSE.2020.2976002

Engineers Code: reusable open learning modules for engineering computations

Authors: Lorena A. Barba

Abstract: Undergraduate programs in science and engineering include at least one course in basic programming, but seldom presented in a contextualized format, where computing is a tool for thinking and learning in the discipline. We have created a series of learning modules to embed computing in engineering education, and share this content under permissive public licenses. The modules are created as a set… ▽ More Undergraduate programs in science and engineering include at least one course in basic programming, but seldom presented in a contextualized format, where computing is a tool for thinking and learning in the discipline. We have created a series of learning modules to embed computing in engineering education, and share this content under permissive public licenses. The modules are created as a set of lessons using Jupyter notebooks, and complemented by online courses in the Open edX platform, using new integrations we developed. Learning sequences in the online course pull content dynamically from public Jupyter notebooks and assessments are auto-graded on-the-fly, using our Jupyter Viewer and Jupyter Grader third-party extensions for Open edX (XBlocks). The learning content is modularized and designed for reuse in various formats. In one of these formats---short but intense workshops---our university library is leveraging the curriculum to offer extra-curricular training for all, at high demands. △ Less

Submitted 16 December, 2019; originally announced January 2020.

Comments: 7 pages, 1 figure

Journal ref: Computing in Science & Engineering 22(4): 26-35 (2020)

arXiv:1904.07981 [pdf, other]

doi 10.1109/MCSE.2019.2941702

Reproducible Workflow on a Public Cloud for Computational Fluid Dynamics

Authors: Olivier Mesnard, Lorena A. Barba

Abstract: In a new effort to make our research transparent and reproducible by others, we developed a workflow to run and share computational studies on the public cloud Microsoft Azure. It uses Docker containers to create an image of the application software stack. We also adopt several tools that facilitate creating and managing virtual machines on compute nodes and submitting jobs to these nodes. The con… ▽ More In a new effort to make our research transparent and reproducible by others, we developed a workflow to run and share computational studies on the public cloud Microsoft Azure. It uses Docker containers to create an image of the application software stack. We also adopt several tools that facilitate creating and managing virtual machines on compute nodes and submitting jobs to these nodes. The configuration files for these tools are part of an expanded "reproducibility package" that includes workflow definitions for cloud computing, in addition to input files and instructions. This facilitates re-creating the cloud environment to re-run the computations under the same conditions. Although cloud providers have improved their offerings, many researchers using high-performance computing (HPC) are still skeptical about cloud computing. Thus, we ran benchmarks for tightly coupled applications to confirm that the latest HPC nodes of Microsoft Azure are indeed a viable alternative to traditional on-site HPC clusters. We also show that cloud offerings are now adequate to complete computational fluid dynamics studies with in-house research software that uses parallel computing with GPUs. Finally, we share with the community what we have learned from nearly two years of using Azure cloud to enhance transparency and reproducibility in our computational simulations. △ Less

Submitted 25 September, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

Comments: 11 pages, 8 figures, 5 tables

Journal ref: Computing in Science and Engineering, Vol. 22(1):102-116, 2019

arXiv:1812.10722 [pdf, other]

doi 10.1103/PhysRevE.100.063305

Computational nanoplasmonics in the quasistatic limit for biosensing applications

Authors: Natalia C. Clementi, Christopher D. Cooper, Lorena A. Barba

Abstract: This work uses the long-wavelength limit to compute LSPR response of biosensors, expanding the open-source PyGBe code to compute the extinction cross-section of metallic nanoparticles in the presence of any target for sensing. The target molecule is represented by a surface mesh, based on its crystal structure. PyGBe is research software for continuum electrostatics, written in Python with computa… ▽ More This work uses the long-wavelength limit to compute LSPR response of biosensors, expanding the open-source PyGBe code to compute the extinction cross-section of metallic nanoparticles in the presence of any target for sensing. The target molecule is represented by a surface mesh, based on its crystal structure. PyGBe is research software for continuum electrostatics, written in Python with computationally expensive parts accelerated on GPU hardware, via PyCUDA. It is also accelerated algorithmically via a treecode that offers O(N log N) computational complexity. These features allow PyGBe to handle problems with half a million boundary elements or more. Using a model problem consisting of an isolated silver nanosphere in an electric field, our results show grid convergence as 1/N, and accurate computation of the extinction cross-section as a function of wavelength (compared with an analytical solution). For a model of a sensor-analyte system, consisting of a spherical silver nanoparticle and a set of bovine serum albumin (BSA) proteins, our results again obtain grid convergence as 1/N (with respect to the Richardson extrapolated value). Computing the LSPR response as a function of wavelength in the presence of BSA proteins captures a red-shift of 0.5 nm in the resonance frequency due to the presence of the analytes at 1-nm distance. The final result is a sensitivity study of the biosensor model, obtaining the shift in resonance frequency for various distances between the proteins and the nanoparticle. All results in this paper are fully reproducible, and we have deposited in archival data repositories all the materials needed to run the computations again and re-create the figures. PyGBe is open source under a permissive license and openly developed. Documentation is available at http://barbagroup.github.io/pygbe/docs/. △ Less

Submitted 24 July, 2020; v1 submitted 27 December, 2018; originally announced December 2018.

Comments: 14 pages, 12 figures

Journal ref: Phys. Rev. E 100, 063305 (2019)

arXiv:1802.03311 [pdf, ps, other]

Terminologies for Reproducible Research

Authors: Lorena A. Barba

Abstract: Reproducible research---by its many names---has come to be regarded as a key concern across disciplines and stakeholder groups. Funding agencies and journals, professional societies and even mass media are paying attention, often focusing on the so-called "crisis" of reproducibility. One big problem keeps coming up among those seeking to tackle the issue: different groups are using terminologies i… ▽ More Reproducible research---by its many names---has come to be regarded as a key concern across disciplines and stakeholder groups. Funding agencies and journals, professional societies and even mass media are paying attention, often focusing on the so-called "crisis" of reproducibility. One big problem keeps coming up among those seeking to tackle the issue: different groups are using terminologies in utter contradiction with each other. Looking at a broad sample of publications in different fields, we can classify their terminology via decision tree: they either, A---make no distinction between the words reproduce and replicate, or B---use them distinctly. If B, then they are commonly divided in two camps. In a spectrum of concerns that starts at a minimum standard of "same data+same methods=same results," to "new data and/or new methods in an independent study=same findings," group 1 calls the minimum standard reproduce, while group 2 calls it replicate. This direct swap of the two terms aggravates an already weighty issue. By attempting to inventory the terminologies across disciplines, I hope that some patterns will emerge to help us resolve the contradictions. △ Less

Submitted 9 February, 2018; originally announced February 2018.

arXiv:1707.02264 [pdf, other]

doi 10.7717/peerj-cs.147

Journal of Open Source Software (JOSS): design and first-year review

Authors: Arfon M Smith, Kyle E Niemeyer, Daniel S Katz, Lorena A Barba, George Githinji, Melissa Gymrek, Kathryn D Huff, Christopher R Madan, Abigail Cabunoc Mayes, Kevin M Moerman, Pjotr Prins, Karthik Ram, Ariel Rokem, Tracy K Teal, Roman Valls Guimera, Jacob T Vanderplas

Abstract: This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit s… ▽ More This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit system of science, JOSS addresses the dearth of rewards for key contributions to science made in the form of software. JOSS publishes articles that encapsulate scholarship contained in the software itself, and its rigorous peer review targets the software components: functionality, documentation, tests, continuous integration, and the license. A JOSS article contains an abstract describing the purpose and functionality of the software, references, and a link to the software archive. The article is the entry point of a JOSS submission, which encompasses the full set of software artifacts. Submission and review proceed in the open, on GitHub. Editors, reviewers, and authors work collaboratively and openly. Unlike other journals, JOSS does not reject articles requiring major revision; while not yet accepted, articles remain visible and under review until the authors make adequate changes (or withdraw, if unable to meet requirements). Once an article is accepted, JOSS gives it a DOI, deposits its metadata in Crossref, and the article can begin collecting citations on indexers like Google Scholar and other services. Authors retain copyright of their JOSS article, releasing it under a Creative Commons Attribution 4.0 International License. In its first year, starting in May 2016, JOSS published 111 articles, with more than 40 additional articles under review. JOSS is a sponsored project of the nonprofit organization NumFOCUS and is an affiliate of the Open Source Initiative. △ Less

Submitted 24 January, 2018; v1 submitted 7 July, 2017; originally announced July 2017.

Comments: 22 pages, 8 figures

Journal ref: PeerJ Computer Science 4 (2018) e147

arXiv:1605.04339 [pdf, other]

Reproducible and replicable CFD: it's harder than you think

Authors: Olivier Mesnard, Lorena A. Barba

Abstract: Completing a full replication study of our previously published findings on bluff-body aerodynamics was harder than we thought. Despite the fact that we have good reproducible-research practices, sharing our code and data openly. Here's what we learned from three years, four CFD codes and hundreds of runs. Completing a full replication study of our previously published findings on bluff-body aerodynamics was harder than we thought. Despite the fact that we have good reproducible-research practices, sharing our code and data openly. Here's what we learned from three years, four CFD codes and hundreds of runs. △ Less

Submitted 14 October, 2016; v1 submitted 13 May, 2016; originally announced May 2016.

Comments: 12 pages, 11 figures; accepted in Computing in Science and Engineering. Supplementary materials in https://github.com/barbagroup/snake-repro

arXiv:1506.05957 [pdf, other]

Inexact Krylov iterations and relaxation strategies with fast-multipole boundary element method

Authors: Tingyu Wang, Simon K. Layton, Lorena A. Barba

Abstract: Boundary element methods produce dense linear systems that can be accelerated via multipole expansions. Solved with Krylov methods, this implies computing the matrix-vector products within each iteration with some error, at an accuracy controlled by the order of the expansion, $p$. We take advantage of a unique property of Krylov iterations that allow lower accuracy of the matrix-vector products a… ▽ More Boundary element methods produce dense linear systems that can be accelerated via multipole expansions. Solved with Krylov methods, this implies computing the matrix-vector products within each iteration with some error, at an accuracy controlled by the order of the expansion, $p$. We take advantage of a unique property of Krylov iterations that allow lower accuracy of the matrix-vector products as convergence proceeds, and propose a relaxation strategy based on progressively decreasing $p$. In extensive numerical tests of the relaxed Krylov iterations, we obtained speed-ups of between $2.1\times$ and $3.3\times$ for Laplace problems and between $1.7\times$ and $4.0\times$ for Stokes problems. We include an application to Stokes flow around red blood cells, computing with up to 64 cells and problem size up to 131k boundary elements and nearly 400k unknowns. The study was done with an in-house multi-threaded C++ code, on a hexa-core CPU. The code is available on its version-control repository, \href{https://github.com/barbagroup/fmm-bem-relaxed}{https://github.com/barbagroup/fmm-bem-relaxed}. △ Less

Submitted 1 October, 2016; v1 submitted 19 June, 2015; originally announced June 2015.

Comments: 21 pages, 20 figures. Second version submitted for peer review on March 2016, with all results re-computed and revised author list. Rejected in October 2016. Currently undergoing revision for a third submission. See progress of open revision in https://github.com/barbagroup/inexact-gmres

MSC Class: 35Q35; 35Q99; 45B05; 76D07; 76Z99

arXiv:1506.03745 [pdf, other]

doi 10.1016/j.cpc.2015.12.019

Poisson-Boltzmann model for protein-surface electrostatic interactions and grid-convergence study using the PyGBe code

Authors: Christopher D. Cooper, Lorena A. Barba

Abstract: Interactions between surfaces and proteins occur in many vital processes and are crucial in biotechnology: the ability to control specific interactions is essential in fields like biomaterials, biomedical implants and biosensors. In the latter case, biosensor sensitivity hinges on ligand proteins adsorbing on bioactive surfaces with a favorable orientation, exposing reaction sites to target molecu… ▽ More Interactions between surfaces and proteins occur in many vital processes and are crucial in biotechnology: the ability to control specific interactions is essential in fields like biomaterials, biomedical implants and biosensors. In the latter case, biosensor sensitivity hinges on ligand proteins adsorbing on bioactive surfaces with a favorable orientation, exposing reaction sites to target molecules. Protein adsorption, being a free-energy-driven process, is difficult to study experimentally. This paper develops and evaluates a computational model to study electrostatic interactions of proteins and charged nanosurfaces, via the Poisson-Boltzmann equation. We extended the implicit-solvent model used in the open-source code PyGBe to include surfaces of imposed charge or potential. This code solves the boundary integral formulation of the Poisson-Boltzmann equation, discretized with surface elements. PyGBe has at its core a treecode-accelerated Krylov iterative solver, resulting in O(N log N) scaling, with further acceleration on hardware via multi-threaded execution on \gpu s. It computes solvation and surface free energies, providing a framework for studying the effect of electrostatics on adsorption. We then derived an analytical solution for a spherical charged surface interacting with a spherical molecule, then completed a grid-convergence study to build evidence on the correctness of our approach. The study showed the error decaying with the average area of the boundary elements, i.e., the method is O(1/N), which is consistent with our previous verification studies using PyGBe. We also studied grid-convergence using a real molecular geometry (protein GB1D4'), in this case using Richardson extrapolation (in the absence of an analytical solution) and confirmed the O(1/N) scaling in this case. △ Less

Submitted 11 June, 2015; originally announced June 2015.

Comments: 11 pages, 10 figures. arXiv admin note: substantial text overlap with arXiv:1503.08150

arXiv:1503.08150 [pdf, other]

doi 10.1063/1.4931113

Probing protein orientation near charged nanosurfaces for simulation-assisted biosensor design

Authors: Christopher D. Cooper, Natalia C. Clementi, Lorena A. Barba

Abstract: Protein-surface interactions are ubiquitous in biological processes and bioengineering, yet are not fully understood. In biosensors, a key factor determining the sensitivity and thus the performance of the device is the orientation of the ligand molecules on the bioactive device surface. Adsorption studies thus seek to determine how orientation can be influenced by surface preparation. In this wor… ▽ More Protein-surface interactions are ubiquitous in biological processes and bioengineering, yet are not fully understood. In biosensors, a key factor determining the sensitivity and thus the performance of the device is the orientation of the ligand molecules on the bioactive device surface. Adsorption studies thus seek to determine how orientation can be influenced by surface preparation. In this work, protein orientation near charged nanosurfaces is obtained under electrostatic effects using the Poisson-Boltzmann equation, in an implicit-solvent model. Sampling the free energy for protein GB1D4' at a range of tilt and rotation angles with respect to the charged surface, we calculated the probability of the protein orientations and observed a dipolar behavior. This result is consistent with published experimental studies and combined Monte Carlo and molecular dynamics simulations using this small protein, validating our method. More relevant to biosensor technology, antibodies such as immunoglobulin G are still a formidable challenge to molecular simulation, due to their large size. We obtained the probability distribution of orientations for the iso-type IgG2a at varying surface charge and salt concentration. This iso-type was not found to have a preferred orientation in previous studies, unlike the iso-type IgG1 whose larger dipole moment was assumed to make it easier to control. We find that the preferred orientation of IgG2a can be favorable for biosensing with positive surface charge of 0.05C/m$^{2}$ or higher and 37mM salt concentration. The results also show that local interactions dominate over dipole moment for this protein. Improving immunoassay sensitivity may thus be assisted by numerical studies using our method (and open-source code), guiding changes to fabrication protocols or protein engineering of ligand molecules to obtain more favorable orientations. △ Less

Submitted 20 August, 2015; v1 submitted 25 March, 2015; originally announced March 2015.

Comments: 14 pages, 10 figures -- This version is revised post peer review, and supersedes all previous ones. Note that v3 was reduced considerably from the previous ones, due to the material being split in two papers. Another preprint was submitted (arXiv:1506.03745) with the material that was cut of this paper, corresponding to how the papers were submitted to peer-reviewed journals

Journal ref: J. Chem. Phys. 143, 124709 (2015)

arXiv:1412.5557 [pdf]

Standing Together for Reproducibility in Large-Scale Computing: Report on reproducibility@XSEDE

Authors: Doug James, Nancy Wilkins-Diehr, Victoria Stodden, Dirk Colbry, Carlos Rosales, Mark Fahey, Justin Shi, Rafael F. Silva, Kyo Lee, Ralph Roskies, Laurence Loewe, Susan Lindsey, Rob Kooper, Lorena Barba, David Bailey, Jonathan Borwein, Oscar Corcho, Ewa Deelman, Michael Dietze, Benjamin Gilbert, Jan Harkes, Seth Keele, Praveen Kumar, Jong Lee, Erika Linke , et al. (30 additional authors not shown)

Abstract: This is the final report on reproducibility@xsede, a one-day workshop held in conjunction with XSEDE14, the annual conference of the Extreme Science and Engineering Discovery Environment (XSEDE). The workshop's discussion-oriented agenda focused on reproducibility in large-scale computational research. Two important themes capture the spirit of the workshop submissions and discussions: (1) organiz… ▽ More This is the final report on reproducibility@xsede, a one-day workshop held in conjunction with XSEDE14, the annual conference of the Extreme Science and Engineering Discovery Environment (XSEDE). The workshop's discussion-oriented agenda focused on reproducibility in large-scale computational research. Two important themes capture the spirit of the workshop submissions and discussions: (1) organizational stakeholders, especially supercomputer centers, are in a unique position to promote, enable, and support reproducible research; and (2) individual researchers should conduct each experiment as though someone will replicate that experiment. Participants documented numerous issues, questions, technologies, practices, and potentially promising initiatives emerging from the discussion, but also highlighted four areas of particular interest to XSEDE: (1) documentation and training that promotes reproducible research; (2) system-level tools that provide build- and run-time information at the level of the individual job; (3) the need to model best practices in research collaborations involving XSEDE staff; and (4) continued work on gateways and related technologies. In addition, an intriguing question emerged from the day's interactions: would there be value in establishing an annual award for excellence in reproducible research? △ Less

Submitted 2 January, 2015; v1 submitted 17 December, 2014; originally announced December 2014.

MSC Class: 68N01 ACM Class: D.2.9

arXiv:1312.3691 [pdf, other]

Finding the Force -- Consistent Particle Seeding for Satellite Aerodynamics

Authors: J. Brent Parham, L. A. Barba

Abstract: When calculating satellite trajectories in low-earth orbit, engineers need to adequately estimate aerodynamic forces. But to this day, obtaining the drag acting on the complicated shapes of modern spacecraft suffers from many sources of error. While part of the problem is the uncertain density in the upper atmosphere, this works focuses on improving the modeling of interacting rarified gases and s… ▽ More When calculating satellite trajectories in low-earth orbit, engineers need to adequately estimate aerodynamic forces. But to this day, obtaining the drag acting on the complicated shapes of modern spacecraft suffers from many sources of error. While part of the problem is the uncertain density in the upper atmosphere, this works focuses on improving the modeling of interacting rarified gases and satellite surfaces. The only numerical approach that currently captures effects in this flow regime---like self-shadowing and multiple molecular reflections---is known as test-particle Monte Carlo. This method executes a ray-tracing algorithm to follow particles that pass through a control volume containing the spacecraft and accumulates the momentum transfer to the body surfaces. Statistical fluctuations inherent in the approach demand particle numbers in the order of millions, often making this scheme too costly to be practical. This work presents a parallel test-particle Monte Carlo method that takes advantage of both GPUs and multi-core CPUs. The speed at which this model can run with millions of particles allowed exploring a regime where a flaw in the model's initial particle seeding was revealed. Our new model introduces an analytical fix based on seeding the calculation with an initial distribution of particles at the boundary of a spherical control volume and computing the integral for the correct number flux. This work includes verification of the proposed model using analytical solutions for several simple geometries and demonstrates uses for studying aero-stabilization of the Phobos-Grunt Martian probe and pose-estimation for the ICESat mission. △ Less

Submitted 12 December, 2013; originally announced December 2013.

Comments: 14 pages, 11 figures. Presented at the AIAA Science and Technology Forum and Exposition 2014: AIAA Modeling and Simulation Technologies Conference

arXiv:1309.4018 [pdf, other]

doi 10.1016/j.cpc.2013.10.028

A biomolecular electrostatics solver using Python, GPUs and boundary elements that can handle solvent-filled cavities and Stern layers

Authors: Christopher D. Cooper, Jaydeep P. Bardhan, L. A. Barba

Abstract: The continuum theory applied to bimolecular electrostatics leads to an implicit-solvent model governed by the Poisson-Boltzmann equation. Solvers relying on a boundary integral representation typically do not consider features like solvent-filled cavities or ion-exclusion (Stern) layers, due to the added difficulty of treating multiple boundary surfaces. This has hindered meaningful comparisons wi… ▽ More The continuum theory applied to bimolecular electrostatics leads to an implicit-solvent model governed by the Poisson-Boltzmann equation. Solvers relying on a boundary integral representation typically do not consider features like solvent-filled cavities or ion-exclusion (Stern) layers, due to the added difficulty of treating multiple boundary surfaces. This has hindered meaningful comparisons with volume-based methods, and the effects on accuracy of including these features has remained unknown. This work presents a solver called PyGBe that uses a boundary-element formulation and can handle multiple interacting surfaces. It was used to study the effects of solvent-filled cavities and Stern layers on the accuracy of calculating solvation energy and binding energy of proteins, using the well-known APBS finite-difference code for comparison. The results suggest that if required accuracy for an application allows errors larger than about 2%, then the simpler, single-surface model can be used. When calculating binding energies, the need for a multi-surface model is problem-dependent, becoming more critical when ligand and receptor are of comparable size. Comparing with the APBS solver, the boundary-element solver is faster when the accuracy requirements are higher. The cross-over point for the PyGBe code is in the order of 1-2% error, when running on one GPU card (NVIDIA Tesla C2075), compared with APBS running on six Intel Xeon CPU cores. PyGBe achieves algorithmic acceleration of the boundary element method using a treecode, and hardware acceleration using GPUs via PyCuda from a user-visible code that is all Python. The code is open-source under MIT license. △ Less

Submitted 16 September, 2013; originally announced September 2013.

Comments: 12 pages, 11 figures

arXiv:1309.2969 [pdf, other]

doi 10.1063/1.4866444

Lift and wakes of flying snakes

Authors: Anush Krishnan, John J. Socha, Pavlos P. Vlachos, L. A. Barba

Abstract: Flying snakes use a unique method of aerial locomotion: they jump from tree branches, flatten their bodies and undulate through the air to produce a glide. The shape of their body cross-section during the glide plays an important role in generating lift. This paper presents a computational investigation of the aerodynamics of the cross-sectional shape. Two-dimensional simulations of incompressible… ▽ More Flying snakes use a unique method of aerial locomotion: they jump from tree branches, flatten their bodies and undulate through the air to produce a glide. The shape of their body cross-section during the glide plays an important role in generating lift. This paper presents a computational investigation of the aerodynamics of the cross-sectional shape. Two-dimensional simulations of incompressible flow past the anatomically correct cross-section of the species Chrysopelea paradisi show that a significant enhancement in lift appears at a 35-degrees angle of attack, above Reynolds numbers 2000. Previous experiments on physical models also obtained an increased lift, at the same angle of attack. The flow is inherently three-dimensional in physical experiments, due to fluid instabilities, and it is thus intriguing that the enhanced lift also appears in the two-dimensional simulations. The simulations point to the lift enhancement arising from the early separation of the boundary layer on the dorsal surface of the snake profile, without stall. The separated shear layer rolls up and interacts with secondary vorticity in the near-wake, inducing the primary vortex to remain closer to the body and thus cause enhanced suction, resulting in higher lift. △ Less

Submitted 7 February, 2014; v1 submitted 11 September, 2013; originally announced September 2013.

Comments: 19 pages, 16 figures

Journal ref: Phys. Fluids, Vol. 26, 031901 (2014)

arXiv:1110.2921 [pdf, other]

doi 10.1016/j.compfluid.2012.08.002

FMM-based vortex method for simulation of isotropic turbulence on GPUs, compared with a spectral method

Authors: Rio Yokota, L. A. Barba

Abstract: The Lagrangian vortex method offers an alternative numerical approach for direct numerical simulation of turbulence. The fact that it uses the fast multipole method (FMM)--a hierarchical algorithm for N-body problems with highly scalable parallel implementations--as numerical engine makes it a potentially good candidate for exascale systems. However, there have been few validation studies of Lagra… ▽ More The Lagrangian vortex method offers an alternative numerical approach for direct numerical simulation of turbulence. The fact that it uses the fast multipole method (FMM)--a hierarchical algorithm for N-body problems with highly scalable parallel implementations--as numerical engine makes it a potentially good candidate for exascale systems. However, there have been few validation studies of Lagrangian vortex simulations and the insufficient comparisons against standard DNS codes has left ample room for skepticism. This paper presents a comparison between a Lagrangian vortex method and a pseudo-spectral method for the simulation of decaying homogeneous isotropic turbulence. This flow field is chosen despite the fact that it is not the most favorable flow problem for particle methods (which shine in wake flows or where vorticity is compact), due to the fact that it is ideal for the quantitative validation of DNS codes. We use a 256^3 grid with Re_lambda=50 and 100 and look at the turbulence statistics, including high-order moments. The focus is on the effect of the various parameters in the vortex method, e.g., order of FMM series expansion, frequency of reinitialization, overlap ratio and time step. The vortex method uses an FMM code (exaFMM) that runs on GPU hardware using CUDA, while the spectral code (hit3d) runs on CPU only. Results indicate that, for this application (and with the current code implementations), the spectral method is an order of magnitude faster than the vortex method when using a single GPU for the FMM and six CPU cores for the FFT. △ Less

Submitted 20 August, 2012; v1 submitted 13 October, 2011; originally announced October 2011.

MSC Class: 76F05 ACM Class: G.1.2; G.1.9

arXiv:1109.3524 [pdf, other]

cuIBM -- A GPU-accelerated Immersed Boundary Method

Authors: Simon K Layton, Anush Krishnan, Lorena A. Barba

Abstract: A projection-based immersed boundary method is dominated by sparse linear algebra routines. Using the open-source Cusp library, we observe a speedup (with respect to a single CPU core) which reflects the constraints of a bandwidth-dominated problem on the GPU. Nevertheless, GPUs offer the capacity to solve large problems on commodity hardware. This work includes validation and a convergence study… ▽ More A projection-based immersed boundary method is dominated by sparse linear algebra routines. Using the open-source Cusp library, we observe a speedup (with respect to a single CPU core) which reflects the constraints of a bandwidth-dominated problem on the GPU. Nevertheless, GPUs offer the capacity to solve large problems on commodity hardware. This work includes validation and a convergence study of the GPU-accelerated IBM, and various optimizations. △ Less

Submitted 8 April, 2016; v1 submitted 16 September, 2011; originally announced September 2011.

Comments: Extended paper post-conference, presented at the 23rd International Conference on Parallel Computational Fluid Dynamics (http://www.parcfd.org), ParCFD 2011, Barcelona (unpublished)

arXiv:1108.5815 [pdf, other]

doi 10.1109/MCSE.2012.1

Hierarchical N-body simulations with auto-tuning for heterogeneous systems

Authors: Rio Yokota, Lorena A. Barba

Abstract: With the current hybridization of treecodes and FMMs, combined with auto-tuning capabilities on heterogeneous architectures, the flexibility of fast N-body methods has been greatly enhanced. These features are a requirement to develo** a black-box software library for fast N-body algorithms on heterogeneous systems, which is our immediate goal. With the current hybridization of treecodes and FMMs, combined with auto-tuning capabilities on heterogeneous architectures, the flexibility of fast N-body methods has been greatly enhanced. These features are a requirement to develo** a black-box software library for fast N-body algorithms on heterogeneous systems, which is our immediate goal. △ Less

Submitted 10 December, 2011; v1 submitted 29 August, 2011; originally announced August 2011.

MSC Class: 70F10 ACM Class: D.1.2; D.1.3; G.1.0; G.1.2

Journal ref: Computing in Science and Engineering, May/June 2012 (vol. 14 no. 3), pp. 30-39

arXiv:1106.5273 [pdf, other]

doi 10.1016/j.cpc.2012.09.011

Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

Authors: R. Yokota, L. A. Barba, T. Narumi, K. Yasuoka

Abstract: This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on gpu hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this… ▽ More This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on gpu hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 4096^3 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the fmm-based vortex method achieving 74% parallel efficiency on 4096 processes (one gpu per mpi process, 3 gpus per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of mpi processes (using only cpu cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 seconds for the vortex method and 154 seconds for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex method calculations to date. △ Less

Submitted 3 September, 2012; v1 submitted 26 June, 2011; originally announced June 2011.

MSC Class: 76F05 ACM Class: G.1.2; G.1.9

arXiv:1106.2176 [pdf, other]

doi 10.1177/1094342011429952

A Tuned and Scalable Fast Multipole Method as a Preeminent Algorithm for Exascale Systems

Authors: Rio Yokota, Lorena Barba

Abstract: Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (FMM) appears as a rising star. Our previous recent work showed scaling of an FMM on GPU clusters, with problem sizes in the order of billions of unknowns. That work led to an extremely parallel FMM, scaling to thousands of GPUs or tens of thousands of CPUs. This paper reports on a a c… ▽ More Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (FMM) appears as a rising star. Our previous recent work showed scaling of an FMM on GPU clusters, with problem sizes in the order of billions of unknowns. That work led to an extremely parallel FMM, scaling to thousands of GPUs or tens of thousands of CPUs. This paper reports on a a campaign of performance tuning and scalability studies using multi-core CPUs, on the Kraken supercomputer. All kernels in the FMM were parallelized using OpenMP, and a test using 10^7 particles randomly distributed in a cube showed 78% efficiency on 8 threads. Tuning of the particle-to-particle kernel using SIMD instructions resulted in 4x speed-up of the overall algorithm on single-core tests with 10^3 - 10^7 particles. Parallel scalability was studied in both strong and weak scaling. The strong scaling test used 10^8 particles and resulted in 93% parallel efficiency on 2048 processes for the non-SIMD code and 54% for the SIMD-optimized code (which was still 2x faster). The weak scaling test used 10^6 particles per process, and resulted in 72% efficiency on 32,768 processes, with the largest calculation taking about 40 seconds to evaluate more than 32 billion unknowns. This work builds up evidence for our view that FMM is poised to play a leading role in exascale computing, and we end the paper with a discussion of the features that make it a particularly favorable algorithm for the emerging heterogeneous and massively parallel architectural landscape. △ Less

Submitted 16 October, 2011; v1 submitted 10 June, 2011; originally announced June 2011.

MSC Class: 70F10 ACM Class: D.1.3; G.1.0; G.1.2

arXiv:1010.1482 [pdf, other]

Treecode and fast multipole method for N-body simulation with CUDA

Authors: Rio Yokota, Lorena Barba

Abstract: Due to the variety and importance of applications of treecodes and FMM, the combination of algorithmic acceleration with hardware acceleration can have tremendous impact. Alas, programming these algorithms efficiently is no piece of cake. In this contribution, we aim to present GPU kernels for treecode and FMM in, as much as possible, an uncomplicated, accessible way. The interested reader should… ▽ More Due to the variety and importance of applications of treecodes and FMM, the combination of algorithmic acceleration with hardware acceleration can have tremendous impact. Alas, programming these algorithms efficiently is no piece of cake. In this contribution, we aim to present GPU kernels for treecode and FMM in, as much as possible, an uncomplicated, accessible way. The interested reader should consult some of the copious literature on the subject for a deeper understanding of the algorithms themselves. Here, we will offer the briefest of summaries. We will focus our attention on achieving a GPU implementation that is efficient in its utilization of the architecture, but without applying the most advanced techniques known in the field (which would complicate the presentation). △ Less

Submitted 7 October, 2010; originally announced October 2010.

Journal ref: GPU Computing Gems Emerald Edition, (Morgan Kaufmann/Elsevier, 2011) pp. 113-132

arXiv:1009.3457 [pdf, other]

doi 10.1016/j.cpc.2011.05.002

How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms

Authors: Felipe A. Cruz, Simon K. Layton, Lorena A. Barba

Abstract: Computing on graphics processors is maybe one of the most important developments in computational science to happen in decades. Not since the arrival of the Beowulf cluster, which combined open source software with commodity hardware to truly democratize high-performance computing, has the community been so electrified. Like then, the opportunity comes with challenges. The formulation of scientifi… ▽ More Computing on graphics processors is maybe one of the most important developments in computational science to happen in decades. Not since the arrival of the Beowulf cluster, which combined open source software with commodity hardware to truly democratize high-performance computing, has the community been so electrified. Like then, the opportunity comes with challenges. The formulation of scientific algorithms to take advantage of the performance offered by the new architecture requires rethinking core methods. Here, we have tackled fast summation algorithms (fast multipole method and fast Gauss transform), and applied algorithmic redesign for attaining performance on gpus. The progression of performance improvements attained illustrates the exercise of formulating algorithms for the massively parallel architecture of the gpu. The end result has been gpu kernels that run at over 500 Gigaflops on one nvidia Tesla C1060 card, thereby reaching close to practical peak. We can confidently say that gpu computing is not just a vogue, it is truly an irresistible trend in high-performance computing. △ Less

Submitted 1 March, 2011; v1 submitted 17 September, 2010; originally announced September 2010.

Journal ref: Comput. Phys. Commun., 182(10):2084-2098 (2011)

arXiv:1007.4591 [pdf, other]

doi 10.1016/j.cpc.2011.02.013

Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns

Authors: Rio Yokota, Jaydeep P. Bardhan, Matthew G. Knepley, L. A. Barba, Tsuyoshi Hamada

Abstract: We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (FMM) in conjunction with a boundary element method (BEM) formulation of the continuum electrostatic model, as well as the BIBEE approximation to BEM. The hardware acceleration is achieved… ▽ More We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (FMM) in conjunction with a boundary element method (BEM) formulation of the continuum electrostatic model, as well as the BIBEE approximation to BEM. The hardware acceleration is achieved through graphics processors, GPUs. We demonstrate the power of our algorithms and software for the calculation of the electrostatic interactions between biological molecules in solution. The applications demonstrated include the electrostatics of protein--drug binding and several multi-million atom systems consisting of hundreds to thousands of copies of lysozyme molecules. The parallel scalability of the software was studied in a cluster at the Nagasaki Advanced Computing Center, using 128 nodes, each with 4 GPUs. Delicate tuning has resulted in strong scaling with parallel efficiency of 0.8 for 256 and 0.5 for 512 GPUs. The largest application run, with over 20 million atoms and one billion unknowns, required only one minute on 512 GPUs. We are currently adapting our BEM software to solve the linearized Poisson-Boltzmann equation for dilute ionic solutions, and it is also designed to be flexible enough to be extended for a variety of integral equation problems, ranging from Poisson problems to Helmholtz problems in electromagnetics and acoustics to high Reynolds number flow. △ Less

Submitted 10 February, 2011; v1 submitted 26 July, 2010; originally announced July 2010.

Journal ref: Comput. Phys. Commun., 182(6):1271-1283 (2011)

arXiv:0909.5413 [pdf, ps, other]

doi 10.1016/j.cma.2010.02.008

PetRBF--A parallel O(N) algorithm for radial basis function interpolation

Authors: Rio Yokota, L. A. Barba, Matthew G. Knepley

Abstract: We have developed a parallel algorithm for radial basis function (RBF) interpolation that exhibits O(N) complexity,requires O(N) storage, and scales excellently up to a thousand processes. The algorithm uses a GMRES iterative solver with a restricted additive Schwarz method (RASM) as a preconditioner and a fast matrix-vector algorithm. Previous fast RBF methods, --,achieving at most O(NlogN) com… ▽ More We have developed a parallel algorithm for radial basis function (RBF) interpolation that exhibits O(N) complexity,requires O(N) storage, and scales excellently up to a thousand processes. The algorithm uses a GMRES iterative solver with a restricted additive Schwarz method (RASM) as a preconditioner and a fast matrix-vector algorithm. Previous fast RBF methods, --,achieving at most O(NlogN) complexity,--, were developed using multiquadric and polyharmonic basis functions. In contrast, the present method uses Gaussians with a small variance (a common choice in particle methods for fluid simulation, our main target application). The fast decay of the Gaussian basis function allows rapid convergence of the iterative solver even when the subdomains in the RASM are very small. The present method was implemented in parallel using the PETSc library (developer version). Numerical experiments demonstrate its capability in problems of RBF interpolation with more than 50 million data points, timing at 106 seconds (19 iterations for an error tolerance of 10^-15 on 1024 processors of a Blue Gene/L (700 MHz PowerPC processors). The parallel code is freely available in the open-source model. △ Less

Submitted 29 September, 2009; originally announced September 2009.

Comments: Submitted to Computer Methods in Applied Mechanics and Engineering

Journal ref: Computer Methods in Applied Mechanics and Engineering, 199(25-28), pp. 1793-1804, 2010

arXiv:0905.2637 [pdf, other]

doi 10.1002/nme.2972

PetFMM--A dynamically load-balancing parallel fast multipole library

Authors: Felipe A. Cruz, Matthew G. Knepley, L. A. Barba

Abstract: Fast algorithms for the computation of $N$-body problems can be broadly classified into mesh-based interpolation methods, and hierarchical or multiresolution methods. To this last class belongs the well-known fast multipole method (FMM), which offers O(N) complexity. This paper presents an extensible parallel library for $N$-body interactions utilizing the FMM algorithm, built on the framework o… ▽ More Fast algorithms for the computation of $N$-body problems can be broadly classified into mesh-based interpolation methods, and hierarchical or multiresolution methods. To this last class belongs the well-known fast multipole method (FMM), which offers O(N) complexity. This paper presents an extensible parallel library for $N$-body interactions utilizing the FMM algorithm, built on the framework of PETSc. A prominent feature of this library is that it is designed to be extensible, with a view to unifying efforts involving many algorithms based on the same principles as the FMM and enabling easy development of scientific application codes. The paper also details an exhaustive model for the computation of tree-based $N$-body algorithms in parallel, including both work estimates and communications estimates. With this model, we are able to implement a method to provide automatic, a priori load balancing of the parallel execution, achieving optimal distribution of the computational work among processors and minimal inter-processor communications. Using a client application that performs the calculation of velocity induced by $N$ vortex particles, ample verification and testing of the library was performed. Strong scaling results are presented with close to a million particles in up to 64 processors, including both speedup and parallel efficiency. The library is currently able to achieve over 85% parallel efficiency for 64 processors. The software library is open source under the PETSc license; this guarantees the maximum impact to the scientific community and encourages peer-based collaboration for the extensions and applications. △ Less

Submitted 15 May, 2009; originally announced May 2009.

Comments: 28 pages, 9 figures

Journal ref: Int. J. Num. Meth. Eng., 85(4): 403-428 (Jan. 2011)

arXiv:0809.1810 [pdf, other]

doi 10.1002/nme.2611

Characterization of the errors of the FMM in particle simulations

Authors: Felipe A. Cruz, L. A. Barba

Abstract: The Fast Multipole Method (FMM) offers an acceleration for pairwise interaction calculation, known as $N$-body problems, from $\mathcal{O}(N^2)$ to $\mathcal{O}(N)$ with $N$ particles. This has brought dramatic increase in the capability of particle simulations in many application areas, such as electrostatics, particle formulations of fluid mechanics, and others. Although the literature on the… ▽ More The Fast Multipole Method (FMM) offers an acceleration for pairwise interaction calculation, known as $N$-body problems, from $\mathcal{O}(N^2)$ to $\mathcal{O}(N)$ with $N$ particles. This has brought dramatic increase in the capability of particle simulations in many application areas, such as electrostatics, particle formulations of fluid mechanics, and others. Although the literature on the subject provides theoretical error bounds for the FMM approximation, there are not many reports of the measured errors in a suite of computational experiments. We have performed such an experimental investigation, and summarized the results of about 1000 calculations using the FMM algorithm, to characterize the accuracy of the method in relation with the different parameters available to the user. In addition to the more standard diagnostic of the maximum error, we supply illustrations of the spatial distribution of the errors, which offers visual evidence of all the contributing factors to the overall approximation accuracy: multipole expansion, local expansion, hierarchical spatial decomposition (interaction lists, local domain, far domain). This presentation is a contribution to any researcher wishing to incorporate the FMM acceleration to their application code, as it aids in understanding where accuracy is gained or compromised. △ Less

Submitted 10 September, 2008; originally announced September 2008.

Comments: 34 pages, 38 images

Journal ref: Int. J. Num. Meth. Engrg., 79(13):1577-1604 (2009)

Showing 1–31 of 31 results for author: Barba, L A