-
Scalable Training of Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN
Authors:
Massimiliano Lupo Pasini,
Jong Youl Choi,
Kshitij Mehta,
Pei Zhang,
David Rogers,
Jonghyun Bae,
Khaled Z. Ibrahim,
Ashwin M. Aji,
Karl W. Schulz,
Jorda Polo,
Prasanna Balaprakash
Abstract:
We present our work on develo** and training scalable graph foundation models (GFM) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that de…
▽ More
We present our work on develo** and training scalable graph foundation models (GFM) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that define convolution in GNNs. This work discusses a series of optimizations that have allowed scaling up the GFM training to tens of thousands of GPUs on datasets that consist of hundreds of millions of graphs. Our GFMs use multi-task learning (MTL) to simultaneously learn graph-level and node-level properties of atomistic structures, such as the total energy and atomic forces. Using over 150 million atomistic structures for training, we illustrate the performance of our approach along with the lessons learned on two United States Department of Energy (US-DOE) supercomputers, namely the Perlmutter petascale system at the National Energy Research Scientific Computing Center and the Frontier exascale system at Oak Ridge National Laboratory. The HydraGNN architecture enables the GFM to achieve near-linear strong scaling performance using more than 2,000 GPUs on Perlmutter and 16,000 GPUs on Frontier. Hyperparameter optimization (HPO) was performed on over 64,000 GPUs on Frontier to select GFM architectures with high accuracy. Early stop** was applied on each GFM architecture for energy awareness in performing such an extreme-scale task. The training of an ensemble of highest-ranked GFM architectures continued until convergence to establish uncertainty quantification (UQ) capabilities with ensemble learning. Our contribution opens the door for rapidly develo**, training, and deploying GFMs using large-scale computational resources to enable AI-accelerated materials discovery and design.
△ Less
Submitted 28 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
A cast of thousands: How the IDEAS Productivity project has advanced software productivity and sustainability
Authors:
Lois Curfman McInnes,
Michael Heroux,
David E. Bernholdt,
Anshu Dubey,
Elsa Gonsiorowski,
Rinku Gupta,
Osni Marques,
J. David Moulton,
Hai Ah Nam,
Boyana Norris,
Elaine M. Raybourn,
Jim Willenbring,
Ann Almgren,
Ross Bartlett,
Kita Cranfill,
Stephen Fickas,
Don Frederick,
William Godoy,
Patricia Grubel,
Rebecca Hartman-Baker,
Axel Huebl,
Rose Lynch,
Addi Malviya Thakur,
Reed Milewicz,
Mark C. Miller
, et al. (9 additional authors not shown)
Abstract:
Computational and data-enabled science and engineering are revolutionizing advances throughout science and society, at all scales of computing. For example, teams in the U.S. DOE Exascale Computing Project have been tackling new frontiers in modeling, simulation, and analysis by exploiting unprecedented exascale computing capabilities-building an advanced software ecosystem that supports next-gene…
▽ More
Computational and data-enabled science and engineering are revolutionizing advances throughout science and society, at all scales of computing. For example, teams in the U.S. DOE Exascale Computing Project have been tackling new frontiers in modeling, simulation, and analysis by exploiting unprecedented exascale computing capabilities-building an advanced software ecosystem that supports next-generation applications and addresses disruptive changes in computer architectures. However, concerns are growing about the productivity of the developers of scientific software, its sustainability, and the trustworthiness of the results that it produces. Members of the IDEAS project serve as catalysts to address these challenges through fostering software communities, incubating and curating methodologies and resources, and disseminating knowledge to advance developer productivity and software sustainability. This paper discusses how these synergistic activities are advancing scientific discovery-mitigating technical risks by building a firmer foundation for reproducible, sustainable science at all scales of computing, from laptops to clusters to exascale and beyond.
△ Less
Submitted 16 February, 2024; v1 submitted 3 November, 2023;
originally announced November 2023.
-
On Minimizing the Energy of a Spherical Graph Representation
Authors:
Matt DeVos,
Danielle Rogers,
Alexandra Wesolek
Abstract:
Graph representations are the generalization of geometric graph drawings from the plane to higher dimensions. A method introduced by Tutte to optimize properties of graph drawings is to minimize their energy. We explore this minimization for spherical graph representations, where the vertices lie on a unit sphere such that the origin is their barycentre. We present a primal and dual semidefinite p…
▽ More
Graph representations are the generalization of geometric graph drawings from the plane to higher dimensions. A method introduced by Tutte to optimize properties of graph drawings is to minimize their energy. We explore this minimization for spherical graph representations, where the vertices lie on a unit sphere such that the origin is their barycentre. We present a primal and dual semidefinite program which can be used to find such a spherical graph representation minimizing the energy. We denote the optimal value of this program by $ρ(G)$ for a given graph $G$. The value turns out to be related to the second largest eigenvalue of the adjacency matrix of $G$, which we denote by $λ_2$. We show that for $G$ regular, $ρ(G) \leq \frac{λ_{2}}{2} \cdot v(G)$, and that equality holds if and only if the $λ_{2}$ eigenspace contains a spherical 1-design. Moreover, if $G$ is a random $d$-regular graph, $ρ(G)=\left(\sqrt{(d-1)} +o(1)\right)\cdot v(G)$, asymptotically almost surely.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
In Situ Data Summaries for Flexible Feature Analysis in Large-Scale Multiphase Flow Simulations
Authors:
Soumya Dutta,
Terece Turton,
David Rogers,
Jordan Musser,
James Ahrens,
Ann Almgren
Abstract:
The study of multiphase flow is essential for understanding the complex interactions of various materials. In particular, when designing chemical reactors such as fluidized bed reactors (FBR), a detailed understanding of the hydrodynamics is critical for optimizing reactor performance and stability. An FBR allows experts to conduct different types of chemical reactions involving multiphase materia…
▽ More
The study of multiphase flow is essential for understanding the complex interactions of various materials. In particular, when designing chemical reactors such as fluidized bed reactors (FBR), a detailed understanding of the hydrodynamics is critical for optimizing reactor performance and stability. An FBR allows experts to conduct different types of chemical reactions involving multiphase materials, especially interaction between gas and solids. During such complex chemical processes, formation of void regions in the reactor, generally termed as bubbles, is an important phenomenon. Study of these bubbles has a deep implication in predicting the reactor's overall efficiency. But physical experiments needed to understand bubble dynamics are costly and non-trivial. Therefore, to study such chemical processes and bubble dynamics, a state-of-the-art massively parallel computational fluid dynamics discrete element model (CFD-DEM), MFIX-Exa is being developed for simulating multiphase flows. Despite the proven accuracy of MFIX-Exa in modeling bubbling phenomena, the very-large size of the output data prohibits the use of traditional post hoc analysis capabilities in both storage and I/O time. To address these issues and allow the application scientists to explore the bubble dynamics in an efficient and timely manner, we have developed an end-to-end visual analytics pipeline that enables in situ detection of bubbles using statistical techniques, followed by a flexible and interactive visual exploration of bubble dynamics in the post hoc analysis phase. Positive feedback from the experts has indicated the efficacy of the proposed approach for exploring bubble dynamics in very-large scale multiphase flow simulations.
△ Less
Submitted 7 January, 2022;
originally announced January 2022.
-
Three Practical Workflow Schedulers for Easy Maximum Parallelism
Authors:
David M. Rogers
Abstract:
Runtime scheduling and workflow systems are an increasingly popular algorithmic component in HPC because they allow full system utilization with relaxed synchronization requirements. There are so many special-purpose tools for task scheduling, one might wonder why more are needed. Use cases seen on the Summit supercomputer needed better integration with MPI and greater flexibility in job launch co…
▽ More
Runtime scheduling and workflow systems are an increasingly popular algorithmic component in HPC because they allow full system utilization with relaxed synchronization requirements. There are so many special-purpose tools for task scheduling, one might wonder why more are needed. Use cases seen on the Summit supercomputer needed better integration with MPI and greater flexibility in job launch configurations. Preparation, execution, and analysis of computational chemistry simulations at the scale of tens of thousands of processors revealed three distinct workflow patterns. A separate job scheduler was implemented for each one using extremely simple and robust designs: file-based, task-list based, and bulk-synchronous. Comparing to existing methods shows unique benefits of this work, including simplicity of design, suitability for HPC centers, short startup time, and well-understood per-task overhead. All three new tools have been shown to scale to full utilization of Summit, and have been made publicly available with tests and documentation. This work presents a complete characterization of the minimum effective task granularity for efficient scheduler usage scenarios. These schedulers have the same bottlenecks, and hence similar task granularities as those reported for existing tools following comparable paradigms.
△ Less
Submitted 21 October, 2021;
originally announced October 2021.
-
Deriving Disinformation Insights from Geolocalized Twitter Callouts
Authors:
David Tuxworth,
Dimosthenis Antypas,
Luis Espinosa-Anke,
Jose Camacho-Collados,
Alun Preece,
David Rogers
Abstract:
This paper demonstrates a two-stage method for deriving insights from social media data relating to disinformation by applying a combination of geospatial classification and embedding-based language modelling across multiple languages. In particular, the analysis in centered on Twitter and disinformation for three European languages: English, French and Spanish. Firstly, Twitter data is classified…
▽ More
This paper demonstrates a two-stage method for deriving insights from social media data relating to disinformation by applying a combination of geospatial classification and embedding-based language modelling across multiple languages. In particular, the analysis in centered on Twitter and disinformation for three European languages: English, French and Spanish. Firstly, Twitter data is classified into European and non-European sets using BERT. Secondly, Word2vec is applied to the classified texts resulting in Eurocentric, non-Eurocentric and global representations of the data for the three target languages. This comparative analysis demonstrates not only the efficacy of the classification method but also highlights geographic, temporal and linguistic differences in the disinformation-related media. Thus, the contributions of the work are threefold: (i) a novel language-independent transformer-based geolocation method; (ii) an analytical approach that exploits lexical specificity and word embeddings to interrogate user-generated content; and (iii) a dataset of 36 million disinformation related tweets in English, French and Spanish.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Predicting Themes within Complex Unstructured Texts: A Case Study on Safeguarding Reports
Authors:
Aleksandra Edwards,
David Rogers,
Jose Camacho-Collados,
Hélène de Ribaupierre,
Alun Preece
Abstract:
The task of text and sentence classification is associated with the need for large amounts of labelled training data. The acquisition of high volumes of labelled datasets can be expensive or unfeasible, especially for highly-specialised domains for which documents are hard to obtain. Research on the application of supervised classification based on small amounts of training data is limited. In thi…
▽ More
The task of text and sentence classification is associated with the need for large amounts of labelled training data. The acquisition of high volumes of labelled datasets can be expensive or unfeasible, especially for highly-specialised domains for which documents are hard to obtain. Research on the application of supervised classification based on small amounts of training data is limited. In this paper, we address the combination of state-of-the-art deep learning and classification methods and provide an insight into what combination of methods fit the needs of small, domain-specific, and terminologically-rich corpora. We focus on a real-world scenario related to a collection of safeguarding reports comprising learning experiences and reflections on tackling serious incidents involving children and vulnerable adults. The relatively small volume of available reports and their use of highly domain-specific terminology makes the application of automated approaches difficult. We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches. Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
△ Less
Submitted 4 June, 2021; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Cinema Darkroom: A Deferred Rendering Framework for Large-Scale Datasets
Authors:
Jonas Lukasczyk,
Christoph Garth,
Matthew Larsen,
Wito Engelke,
Ingrid Hotz,
David Rogers,
James Ahrens,
Ross Maciejewski
Abstract:
This paper presents a framework that fully leverages the advantages of a deferred rendering approach for the interactive visualization of large-scale datasets. Geometry buffers (G-Buffers) are generated and stored in situ, and shading is performed post hoc in an interactive image-based rendering front end. This decoupled framework has two major advantages. First, the G-Buffers only need to be comp…
▽ More
This paper presents a framework that fully leverages the advantages of a deferred rendering approach for the interactive visualization of large-scale datasets. Geometry buffers (G-Buffers) are generated and stored in situ, and shading is performed post hoc in an interactive image-based rendering front end. This decoupled framework has two major advantages. First, the G-Buffers only need to be computed and stored once---which corresponds to the most expensive part of the rendering pipeline. Second, the stored G-Buffers can later be consumed in an image-based rendering front end that enables users to interactively adjust various visualization parameters---such as the applied color map or the strength of ambient occlusion---where suitable choices are often not known a priori. This paper demonstrates the use of Cinema Darkroom on several real-world datasets, highlighting CD's ability to effectively decouple the complexity and size of the dataset from its visualization.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
Protein Conformational States: A First Principles Bayesian Method
Authors:
David M. Rogers
Abstract:
Automated identification of protein conformational states from simulation of an ensemble of structures is a hard problem because it requires teaching a computer to recognize shapes. We adapt the naive Bayes classifier from the machine learning community for use on atom-to-atom pairwise contacts. The result is an unsupervised learning algorithm that samples a `distribution' over potential classific…
▽ More
Automated identification of protein conformational states from simulation of an ensemble of structures is a hard problem because it requires teaching a computer to recognize shapes. We adapt the naive Bayes classifier from the machine learning community for use on atom-to-atom pairwise contacts. The result is an unsupervised learning algorithm that samples a `distribution' over potential classification schemes. We apply the classifier to a series of test structures and one real protein, showing that it identifies the conformational transition with > 95% accuracy in most cases. A nontrivial feature of our adaptation is a new connection to information entropy that allows us to vary the level of structural detail without spoiling the categorization. This is confirmed by comparing results as the number of atoms and time-samples are varied over 1.5 orders of magnitude. Further, the method's derivation from Bayesian analysis on the set of inter-atomic contacts makes it easy to understand and extend to more complex cases.
△ Less
Submitted 8 September, 2020; v1 submitted 5 August, 2020;
originally announced August 2020.
-
Towards a Direct, By-Need Evaluator for Dependently Typed Languages
Authors:
David M. Rogers
Abstract:
We present a C-language implementation of the lambda-pi calculus by extending the (call-by-need) stack machine of Ariola, Chang and Felleisen to hold types, using a typeless- tagless- final interpreter strategy. It has the advantage of expressing all operations as folds over terms, including by-need evaluation, recovery of the initial syntax-tree encoding for any term, and eliminating most garbage…
▽ More
We present a C-language implementation of the lambda-pi calculus by extending the (call-by-need) stack machine of Ariola, Chang and Felleisen to hold types, using a typeless- tagless- final interpreter strategy. It has the advantage of expressing all operations as folds over terms, including by-need evaluation, recovery of the initial syntax-tree encoding for any term, and eliminating most garbage-collection tasks. These are made possible by a disciplined approach to handling the spine of each term, along with a robust stack-based API. Type inference is not covered in this work, but also derives several advantages from the present stack transformation. Timing and maximum stack space usage results for executing benchmark problems are presented. We discuss how the design choices for this interpreter allow the language to be used as a high-level scripting language for automatic distributed parallel execution of common scientific computing workflows.
△ Less
Submitted 23 September, 2015;
originally announced September 2015.
-
Complexity limitations on quantum computation
Authors:
Lance Fortnow,
John D. Rogers
Abstract:
We use the powerful tools of counting complexity and generic oracles to help understand the limitations of the complexity of quantum computation. We show several results for the probabilistic quantum class BQP.
1. BQP is low for PP, i.e., PP^BQP=PP.
2. There exists a relativized world where P=BQP and the polynomial-time hierarchy is infinite.
3. There exists a relativized world where BQP d…
▽ More
We use the powerful tools of counting complexity and generic oracles to help understand the limitations of the complexity of quantum computation. We show several results for the probabilistic quantum class BQP.
1. BQP is low for PP, i.e., PP^BQP=PP.
2. There exists a relativized world where P=BQP and the polynomial-time hierarchy is infinite.
3. There exists a relativized world where BQP does not have complete sets.
4. There exists a relativized world where P=BQP but P is not equal to UP intersect coUP and one-way functions exist. This gives a relativized answer to an open question of Simon.
△ Less
Submitted 12 November, 1998;
originally announced November 1998.