-
Report of the DOE/NSF Workshop on Correctness in Scientific Computing, June 2023, Orlando, FL
Authors:
Maya Gokhale,
Ganesh Gopalakrishnan,
Jackson Mayo,
Santosh Nagarakatte,
Cindy Rubio-González,
Stephen F. Siegel
Abstract:
This report is a digest of the DOE/NSF Workshop on Correctness in Scientific Computing (CSC'23) held on June 17, 2023, as part of the Federated Computing Research Conference (FCRC) 2023. CSC was conceived by DOE and NSF to address the growing concerns about correctness among those who employ computational methods to perform large-scale scientific simulations. These concerns have escalated, given t…
▽ More
This report is a digest of the DOE/NSF Workshop on Correctness in Scientific Computing (CSC'23) held on June 17, 2023, as part of the Federated Computing Research Conference (FCRC) 2023. CSC was conceived by DOE and NSF to address the growing concerns about correctness among those who employ computational methods to perform large-scale scientific simulations. These concerns have escalated, given the complexity, scale, and heterogeneity of today's HPC software and hardware. If correctness is not proactively addressed, there is the risk of producing flawed science on top of unacceptable productivity losses faced by computational scientists and engineers. HPC systems are beginning to include data-driven methods, including machine learning and surrogate models, and their impact on overall HPC system correctness was also felt urgent to discuss.
Stakeholders of correctness in this space were identified to belong to several sub-disciplines of computer science; from computer architecture researchers who design special-purpose hardware that offers high energy efficiencies; numerical algorithm designers who develop efficient computational schemes based on reduced precision as well as reduced data movement; all the way to researchers in programming language and formal methods who seek methodologies for correct compilation and verification. To include attendees with such a diverse set of backgrounds, CSC was held during the Federated Computing Research Conference (FCRC) 2023.
△ Less
Submitted 27 December, 2023; v1 submitted 25 December, 2023;
originally announced December 2023.
-
DataRaceBench V1.4.1 and DataRaceBench-ML V0.1: Benchmark Suites for Data Race Detection
Authors:
Le Chen,
Wenhao Wu,
Stephen F. Siegel,
Pei-Hung Lin,
Chunhua Liao
Abstract:
Data races pose a significant threat in multi-threaded parallel applications due to their negative impact on program correctness. DataRaceBench, an open-source benchmark suite, is specifically crafted to assess these data race detection tools in a systematic and measurable manner. Machine learning techniques have recently demonstrated considerable potential in high-performance computing (HPC) prog…
▽ More
Data races pose a significant threat in multi-threaded parallel applications due to their negative impact on program correctness. DataRaceBench, an open-source benchmark suite, is specifically crafted to assess these data race detection tools in a systematic and measurable manner. Machine learning techniques have recently demonstrated considerable potential in high-performance computing (HPC) program analysis and optimization. However, these techniques require specialized data formats for training and refinement. This paper presents the latest update to DataRaceBench, incorporating new data race contributions from Wu et al. \cite{wu2023model}, and introduces a derived dataset named DataRaceBench-ML (DRB-ML) \cite{drbml}. DRB-ML aligns with the emerging trend of machine learning and large language models. Originating from DataRaceBench, this dataset includes detailed labels that denote the presence of a data race and provides comprehensive details of associated variables, such as variable names, line numbers, and the operation (read/write). Unique to DRB-ML, we have also integrated a series of tailored prompt-response pairs specifically designed for LLM fine-tuning.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Model Checking Race-freedom When "Sequential Consistency for Data-race-free Programs" is Guaranteed
Authors:
Wenhao Wu,
Jan Hückelheim,
Paul D. Hovland,
Ziqing Luo,
Stephen F. Siegel
Abstract:
Many parallel programming models guarantee that if all sequentially consistent (SC) executions of a program are free of data races, then all executions of the program will appear to be sequentially consistent. This greatly simplifies reasoning about the program, but leaves open the question of how to verify that all SC executions are race-free. In this paper, we show that with a few simple modific…
▽ More
Many parallel programming models guarantee that if all sequentially consistent (SC) executions of a program are free of data races, then all executions of the program will appear to be sequentially consistent. This greatly simplifies reasoning about the program, but leaves open the question of how to verify that all SC executions are race-free. In this paper, we show that with a few simple modifications, model checking can be an effective tool for verifying race-freedom. We explore this technique on a suite of C programs parallelized with OpenMP.
△ Less
Submitted 20 July, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Report of the HPC Correctness Summit, Jan 25--26, 2017, Washington, DC
Authors:
Ganesh Gopalakrishnan,
Paul D. Hovland,
Costin Iancu,
Sriram Krishnamoorthy,
Ignacio Laguna,
Richard A. Lethin,
Koushik Sen,
Stephen F. Siegel,
Armando Solar-Lezama
Abstract:
Maintaining leadership in HPC requires the ability to support simulations at large scales and fidelity. In this study, we detail one of the most significant productivity challenges in achieving this goal, namely the increasing proclivity to bugs, especially in the face of growing hardware and software heterogeneity and sheer system scale. We identify key areas where timely new research must be pro…
▽ More
Maintaining leadership in HPC requires the ability to support simulations at large scales and fidelity. In this study, we detail one of the most significant productivity challenges in achieving this goal, namely the increasing proclivity to bugs, especially in the face of growing hardware and software heterogeneity and sheer system scale. We identify key areas where timely new research must be proactively begun to address these challenges, and create new correctness tools that must ideally play a significant role even while ram** up toward exacale. We close with the proposal for a two-day workshop in which the problems identified in this report can be more broadly discussed, and specific plans to launch these new research thrusts identified.
△ Less
Submitted 21 May, 2017;
originally announced May 2017.