-
LASSI: An LLM-based Automated Self-Correcting Pipeline for Translating Parallel Scientific Codes
Authors:
Matthew T. Dearing,
Yiheng Tao,
Xingfu Wu,
Zhiling Lan,
Valerie Taylor
Abstract:
This paper addresses the problem of providing a novel approach to sourcing significant training data for LLMs focused on science and engineering. In particular, a crucial challenge is sourcing parallel scientific codes in the ranges of millions to billions of codes. To tackle this problem, we propose an automated pipeline framework, called LASSI, designed to translate between parallel programming…
▽ More
This paper addresses the problem of providing a novel approach to sourcing significant training data for LLMs focused on science and engineering. In particular, a crucial challenge is sourcing parallel scientific codes in the ranges of millions to billions of codes. To tackle this problem, we propose an automated pipeline framework, called LASSI, designed to translate between parallel programming languages by bootstrap** existing closed- or open-source LLMs. LASSI incorporates autonomous enhancement through self-correcting loops where errors encountered during compilation and execution of generated code are fed back to the LLM through guided prompting for debugging and refactoring. We highlight the bi-directional translation of existing GPU benchmarks between OpenMP target offload and CUDA to validate LASSI.
The results of evaluating LASSI with different application codes across four LLMs demonstrate the effectiveness of LASSI for generating executable parallel codes, with 80% of OpenMP to CUDA translations and 85% of CUDA to OpenMP translations producing the expected output. We also observe approximately 78% of OpenMP to CUDA translations and 62% of CUDA to OpenMP translations execute within 10% of or at a faster runtime than the original benchmark code in the same language.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
An Autotuning-based Optimization Framework for Mixed-kernel SVM Classifications in Smart Pixel Datasets and Heterojunction Transistors
Authors:
Xingfu Wu,
Tupendra Oli,
ustin H. Qian,
Valerie Taylor,
Mark C. Hersam,
Vinod K. Sangwan
Abstract:
Support Vector Machine (SVM) is a state-of-the-art classification method widely used in science and engineering due to its high accuracy, its ability to deal with high dimensional data, and its flexibility in modeling diverse sources of data. In this paper, we propose an autotuning-based optimization framework to quantify the ranges of hyperparameters in SVMs to identify their optimal choices, and…
▽ More
Support Vector Machine (SVM) is a state-of-the-art classification method widely used in science and engineering due to its high accuracy, its ability to deal with high dimensional data, and its flexibility in modeling diverse sources of data. In this paper, we propose an autotuning-based optimization framework to quantify the ranges of hyperparameters in SVMs to identify their optimal choices, and apply the framework to two SVMs with the mixed-kernel between Sigmoid and Gaussian kernels for smart pixel datasets in high energy physics (HEP) and mixed-kernel heterojunction transistors (MKH). Our experimental results show that the optimal selection of hyperparameters in the SVMs and the kernels greatly varies for different applications and datasets, and choosing their optimal choices is critical for a high classification accuracy of the mixed kernel SVMs. Uninformed choices of hyperparameters C and coef0 in the mixed-kernel SVMs result in severely low accuracy, and the proposed framework effectively quantifies the proper ranges for the hyperparameters in the SVMs to identify their optimal choices to achieve the highest accuracy 94.6\% for the HEP application and the highest average accuracy 97.2\% with far less tuning time for the MKH application.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Integrating ytopt and libEnsemble to Autotune OpenMC
Authors:
Xingfu Wu,
John R. Tramm,
Jeffrey Larson,
John-Luke Navarro,
Prasanna Balaprakash,
Brice Videau,
Michael Kruse,
Paul Hovland,
Valerie Taylor,
Mary Hall
Abstract:
ytopt is a Python machine-learning-based autotuning software package developed within the ECP PROTEAS-TUNE project. The ytopt software adopts an asynchronous search framework that consists of sampling a small number of input parameter configurations and progressively fitting a surrogate model over the input-output space until exhausting the user-defined maximum number of evaluations or the wall-cl…
▽ More
ytopt is a Python machine-learning-based autotuning software package developed within the ECP PROTEAS-TUNE project. The ytopt software adopts an asynchronous search framework that consists of sampling a small number of input parameter configurations and progressively fitting a surrogate model over the input-output space until exhausting the user-defined maximum number of evaluations or the wall-clock time. libEnsemble is a Python toolkit for coordinating workflows of asynchronous and dynamic ensembles of calculations across massively parallel resources developed within the ECP PETSc/TAO project. libEnsemble helps users take advantage of massively parallel resources to solve design, decision, and inference problems and expands the class of problems that can benefit from increased parallelism. In this paper we present our methodology and framework to integrate ytopt and libEnsemble to take advantage of massively parallel resources to accelerate the autotuning process. Specifically, we focus on using the proposed framework to autotune the ECP ExaSMR application OpenMC, an open source Monte Carlo particle transport code. OpenMC has seven tunable parameters some of which have large ranges such as the number of particles in-flight, which is in the range of 100,000 to 8 million, with its default setting of 1 million. Setting the proper combination of these parameter values to achieve the best performance is extremely time-consuming. Therefore, we apply the proposed framework to autotune the MPI/OpenMP offload version of OpenMC based on a user-defined metric such as the figure of merit (FoM) (particles/s) or energy efficiency energy-delay product (EDF) on the OLCF Frontier TDS system Crusher. The experimental results show that we achieve improvement up to 29.49% in FoM and up to 30.44% in EDP.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Autotuning Apache TVM-based Scientific Applications Using Bayesian Optimization
Authors:
Xingfu Wu,
Praveen Paramasivam,
Valerie Taylor
Abstract:
Apache TVM (Tensor Virtual Machine), an open source machine learning compiler framework designed to optimize computations across various hardware platforms, provides an opportunity to improve the performance of dense matrix factorizations such as LU (Lower Upper) decomposition and Cholesky decomposition on GPUs and AI (Artificial Intelligence) accelerators. In this paper, we propose a new TVM auto…
▽ More
Apache TVM (Tensor Virtual Machine), an open source machine learning compiler framework designed to optimize computations across various hardware platforms, provides an opportunity to improve the performance of dense matrix factorizations such as LU (Lower Upper) decomposition and Cholesky decomposition on GPUs and AI (Artificial Intelligence) accelerators. In this paper, we propose a new TVM autotuning framework using Bayesian Optimization and use the TVM tensor expression language to implement linear algebra kernels such as LU, Cholesky, and 3mm. We use these scientific computation kernels to evaluate the effectiveness of our methods on a GPU cluster, called Swing, at Argonne National Laboratory. We compare the proposed autotuning framework with the TVM autotuning framework AutoTVM with four tuners and find that our framework outperforms AutoTVM in most cases.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms
Authors:
Anthony Francis,
Claudia Pérez-D'Arpino,
Chengshu Li,
Fei Xia,
Alexandre Alahi,
Rachid Alami,
Aniket Bera,
Abhijat Biswas,
Joydeep Biswas,
Rohan Chandra,
Hao-Tien Lewis Chiang,
Michael Everett,
Sehoon Ha,
Justin Hart,
Jonathan P. How,
Haresh Karnan,
Tsang-Wei Edward Lee,
Luis J. Manso,
Reuth Mirksy,
Sören Pirk,
Phani Teja Singamaneni,
Peter Stone,
Ada V. Taylor,
Peter Trautman,
Nathan Tsoi
, et al. (6 additional authors not shown)
Abstract:
A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation. While the field of social navigation has advanced tremendously in recent years, the fair evaluation of algorithms that tackle social navigation remains hard because it involves not just robotic agents moving in static environments but also dynamic human agent…
▽ More
A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation. While the field of social navigation has advanced tremendously in recent years, the fair evaluation of algorithms that tackle social navigation remains hard because it involves not just robotic agents moving in static environments but also dynamic human agents and their perceptions of the appropriateness of robot behavior. In contrast, clear, repeatable, and accessible benchmarks have accelerated progress in fields like computer vision, natural language processing and traditional robot navigation by enabling researchers to fairly compare algorithms, revealing limitations of existing solutions and illuminating promising new directions. We believe the same approach can benefit social navigation. In this paper, we pave the road towards common, widely accessible, and repeatable benchmarking criteria to evaluate social robot navigation. Our contributions include (a) a definition of a socially navigating robot as one that respects the principles of safety, comfort, legibility, politeness, social competency, agent understanding, proactivity, and responsiveness to context, (b) guidelines for the use of metrics, development of scenarios, benchmarks, datasets, and simulators to evaluate social navigation, and (c) a design of a social navigation metrics framework to make it easier to compare results from different simulators, robots and datasets.
△ Less
Submitted 19 September, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales
Authors:
Xingfu Wu,
Prasanna Balaprakash,
Michael Kruse,
Jaehoon Koo,
Brice Videau,
Paul Hovland,
Valerie Taylor,
Brad Geltz,
Siddhartha Jana,
Mary Hall
Abstract:
As we enter the exascale computing era, efficiently utilizing power and optimizing the performance of scientific applications under power and energy constraints has become critical and challenging. We propose a low-overhead autotuning framework to autotune performance and energy for various hybrid MPI/OpenMP scientific applications at large scales and to explore the tradeoffs between application r…
▽ More
As we enter the exascale computing era, efficiently utilizing power and optimizing the performance of scientific applications under power and energy constraints has become critical and challenging. We propose a low-overhead autotuning framework to autotune performance and energy for various hybrid MPI/OpenMP scientific applications at large scales and to explore the tradeoffs between application runtime and power/energy for energy efficient application execution, then use this framework to autotune four ECP proxy applications -- XSBench, AMG, SWFFT, and SW4lite. Our approach uses Bayesian optimization with a Random Forest surrogate model to effectively search parameter spaces with up to 6 million different configurations on two large-scale production systems, Theta at Argonne National Laboratory and Summit at Oak Ridge National Laboratory. The experimental results show that our autotuning framework at large scales has low overhead and achieves good scalability. Using the proposed autotuning framework to identify the best configurations, we achieve up to 91.59% performance improvement, up to 21.2% energy savings, and up to 37.84% EDP improvement on up to 4,096 nodes.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization (extended version)
Authors:
Xingfu Wu,
Michael Kruse,
Prasanna Balaprakash,
Hal Finkel,
Paul Hovland,
Valerie Taylor,
Mary Hall
Abstract:
In this paper, we develop a ytopt autotuning framework that leverages Bayesian optimization to explore the parameter space search and compare four different supervised learning methods within Bayesian optimization and evaluate their effectiveness. We select six of the most complex PolyBench benchmarks and apply the newly developed LLVM Clang/Polly loop optimization pragmas to the benchmarks to opt…
▽ More
In this paper, we develop a ytopt autotuning framework that leverages Bayesian optimization to explore the parameter space search and compare four different supervised learning methods within Bayesian optimization and evaluate their effectiveness. We select six of the most complex PolyBench benchmarks and apply the newly developed LLVM Clang/Polly loop optimization pragmas to the benchmarks to optimize them. We then use the autotuning framework to optimize the pragma parameters to improve their performance. The experimental results show that our autotuning approach outperforms the other compiling methods to provide the smallest execution time for the benchmarks syr2k, 3mm, heat-3d, lu, and covariance with two large datasets in 200 code evaluations for effectively searching the parameter spaces with up to 170,368 different configurations. We find that the Floyd-Warshall benchmark did not benefit from autotuning because Polly uses heuristics to optimize the benchmark to make it run much slower. To cope with this issue, we provide some compiler option solutions to improve the performance. Then we present loop autotuning without a user's knowledge using a simple mctree autotuning framework to further improve the performance of the Floyd-Warshall benchmark. We also extend the ytopt autotuning framework to tune a deep learning application.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
Performance and Power Modeling and Prediction Using MuMMI and Ten Machine Learning Methods
Authors:
Xingfu Wu,
Valerie Taylor,
Zhiling Lan
Abstract:
In this paper, we use modeling and prediction tool MuMMI (Multiple Metrics Modeling Infrastructure) and ten machine learning methods to model and predict performance and power and compare their prediction error rates. We use a fault-tolerant linear algebra code and a fault-tolerant heat distribution code to conduct our modeling and prediction study on the Cray XC40 Theta and IBM BG/Q Mira at Argon…
▽ More
In this paper, we use modeling and prediction tool MuMMI (Multiple Metrics Modeling Infrastructure) and ten machine learning methods to model and predict performance and power and compare their prediction error rates. We use a fault-tolerant linear algebra code and a fault-tolerant heat distribution code to conduct our modeling and prediction study on the Cray XC40 Theta and IBM BG/Q Mira at Argonne National Laboratory and the Intel Haswell cluster Shepard at Sandia National Laboratories. Our experiment results show that the prediction error rates in performance and power using MuMMI are less than 10% for most cases. Based on the models for runtime, node power, CPU power, and memory power, we identify the most significant performance counters for potential optimization efforts associated with the application characteristics and the target architectures, and we predict theoretical outcomes of the potential optimizations. When we compare the prediction accuracy using MuMMI with that using 10 machine learning methods, we observe that MuMMI not only results in more accurate prediction in both performance and power but also presents how performance counters impact the performance and power models. This provides some insights about how to fine-tune the applications and/or systems for energy efficiency.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Utilizing Ensemble Learning for Performance and Power Modeling and Improvement of Parallel Cancer Deep Learning CANDLE Benchmarks
Authors:
Xingfu Wu,
Valerie Taylor
Abstract:
Machine learning (ML) continues to grow in importance across nearly all domains and is a natural tool in modeling to learn from data. Often a tradeoff exists between a model's ability to minimize bias and variance. In this paper, we utilize ensemble learning to combine linear, nonlinear, and tree-/rule-based ML methods to cope with the bias-variance tradeoff and result in more accurate models. Har…
▽ More
Machine learning (ML) continues to grow in importance across nearly all domains and is a natural tool in modeling to learn from data. Often a tradeoff exists between a model's ability to minimize bias and variance. In this paper, we utilize ensemble learning to combine linear, nonlinear, and tree-/rule-based ML methods to cope with the bias-variance tradeoff and result in more accurate models. Hardware performance counter values are correlated with properties of applications that impact performance and power on the underlying system. We use the datasets collected for two parallel cancer deep learning CANDLE benchmarks, NT3 (weak scaling) and P1B2 (strong scaling), to build performance and power models based on hardware performance counters using single-object and multiple-objects ensemble learning to identify the most important counters for improvement. Based on the insights from these models, we improve the performance and energy of P1B2 and NT3 by optimizing the deep learning environments TensorFlow, Keras, Horovod, and Python under the huge page size of 8 MB on the Cray XC40 Theta at Argonne National Laboratory. Experimental results show that ensemble learning not only produces more accurate models but also provides more robust performance counter ranking. We achieve up to 61.15% performance improvement and up to 62.58% energy saving for P1B2 and up to 55.81% performance improvement and up to 52.60% energy saving for NT3 on up to 24,576 cores.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization
Authors:
Xingfu Wu,
Michael Kruse,
Prasanna Balaprakash,
Hal Finkel,
Paul Hovland,
Valerie Taylor,
Mary Hall
Abstract:
An autotuning is an approach that explores a search space of possible implementations/configurations of a kernel or an application by selecting and evaluating a subset of implementations/configurations on a target platform and/or use models to identify a high performance implementation/configuration. In this paper, we develop an autotuning framework that leverages Bayesian optimization to explore…
▽ More
An autotuning is an approach that explores a search space of possible implementations/configurations of a kernel or an application by selecting and evaluating a subset of implementations/configurations on a target platform and/or use models to identify a high performance implementation/configuration. In this paper, we develop an autotuning framework that leverages Bayesian optimization to explore the parameter space search. We select six of the most complex benchmarks from the application domains of the PolyBench benchmarks (syr2k, 3mm, heat-3d, lu, covariance, and Floyd-Warshall) and apply the newly developed LLVM Clang/Polly loop optimization pragmas to the benchmarks to optimize them. We then use the autotuning framework to optimize the pragma parameters to improve their performance. The experimental results show that our autotuning approach outperforms the other compiling methods to provide the smallest execution time for the benchmarks syr2k, 3mm, heat-3d, lu, and covariance with two large datasets in 200 code evaluations for effectively searching the parameter spaces with up to 170,368 different configurations. We compare four different supervised learning methods within Bayesian optimization and evaluate their effectiveness. We find that the Floyd-Warshall benchmark did not benefit from autotuning because Polly uses heuristics to optimize the benchmark to make it run much slower. To cope with this issue, we provide some compiler option solutions to improve the performance.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Toward an End-to-End Auto-tuning Framework in HPC PowerStack
Authors:
Xingfu Wu,
Aniruddha Marathe,
Siddhartha Jana,
Ondrej Vysocky,
Jophin John,
Andrea Bartolini,
Lubomir Riha,
Michael Gerndt,
Valerie Taylor,
Sridutt Bhalachandra
Abstract:
Efficiently utilizing procured power and optimizing performance of scientific applications under power and energy constraints are challenging. The HPC PowerStack defines a software stack to manage power and energy of high-performance computing systems and standardizes the interfaces between different components of the stack. This survey paper presents the findings of a working group focused on the…
▽ More
Efficiently utilizing procured power and optimizing performance of scientific applications under power and energy constraints are challenging. The HPC PowerStack defines a software stack to manage power and energy of high-performance computing systems and standardizes the interfaces between different components of the stack. This survey paper presents the findings of a working group focused on the end-to-end tuning of the PowerStack. First, we provide a background on the PowerStack layer-specific tuning efforts in terms of their high-level objectives, the constraints and optimization goals, layer-specific telemetry, and control parameters, and we list the existing software solutions that address those challenges. Second, we propose the PowerStack end-to-end auto-tuning framework, identify the opportunities in co-tuning different layers in the PowerStack, and present specific use cases and solutions. Third, we discuss the research opportunities and challenges for collective auto-tuning of two or more management layers (or domains) in the PowerStack. This paper takes the first steps in identifying and aggregating the important R&D challenges in streamlining the optimization efforts across the layers of the PowerStack.
△ Less
Submitted 14 August, 2020;
originally announced August 2020.
-
Intra-Library Collusion: A Potential Privacy Nightmare on Smartphones
Authors:
Vincent F. Taylor,
Alastair R. Beresford,
Ivan Martinovic
Abstract:
Smartphones contain a trove of sensitive personal data including our location, who we talk to, our habits, and our interests. Smartphone users trade access to this data by permitting apps to use it, and in return obtain functionality provided by the apps. In many cases, however, users fail to appreciate the scale or sensitivity of the data that they share with third-parties when they use apps. To…
▽ More
Smartphones contain a trove of sensitive personal data including our location, who we talk to, our habits, and our interests. Smartphone users trade access to this data by permitting apps to use it, and in return obtain functionality provided by the apps. In many cases, however, users fail to appreciate the scale or sensitivity of the data that they share with third-parties when they use apps. To this end, prior work has looked at the threat to privacy posed by apps and the third-party libraries that they embed. Prior work, however, fails to paint a realistic picture of the full threat to smartphone users, as it has typically examined apps and third-party libraries in isolation.
In this paper, we describe a novel and potentially devastating privilege escalation attack that can be performed by third-party libraries. This attack, which we call intra-library collusion, occurs when a single library embedded in more than one app on a device leverages the combined set of permissions available to it to pilfer sensitive user data. The possibility for intra-library collusion exists because libraries obtain the same privileges as their host app and popular libraries will likely be used by more than one app on a device.
Using a real-world dataset of over 30,000 smartphones, we find that many popular third-party libraries have the potential to aggregate significant sensitive data from devices by using intra-library collusion. We demonstrate that several popular libraries already collect enough data to facilitate this attack. Using historical data, we show that risks from intra-library collusion have increased significantly over the last two-and-a-half years. We conclude with recommendations for mitigating the aforementioned problems.
△ Less
Submitted 11 August, 2017;
originally announced August 2017.
-
Robust Smartphone App Identification Via Encrypted Network Traffic Analysis
Authors:
Vincent F. Taylor,
Riccardo Spolaor,
Mauro conti,
Ivan Martinovic
Abstract:
The apps installed on a smartphone can reveal much information about a user, such as their medical conditions, sexual orientation, or religious beliefs. Additionally, the presence or absence of particular apps on a smartphone can inform an adversary who is intent on attacking the device. In this paper, we show that a passive eavesdropper can feasibly identify smartphone apps by fingerprinting the…
▽ More
The apps installed on a smartphone can reveal much information about a user, such as their medical conditions, sexual orientation, or religious beliefs. Additionally, the presence or absence of particular apps on a smartphone can inform an adversary who is intent on attacking the device. In this paper, we show that a passive eavesdropper can feasibly identify smartphone apps by fingerprinting the network traffic that they send. Although SSL/TLS hides the payload of packets, side-channel data such as packet size and direction is still leaked from encrypted connections. We use machine learning techniques to identify smartphone apps from this side-channel data. In addition to merely fingerprinting and identifying smartphone apps, we investigate how app fingerprints change over time, across devices and across different versions of apps. Additionally, we introduce strategies that enable our app classification system to identify and mitigate the effect of ambiguous traffic, i.e., traffic in common among apps such as advertisement traffic. We fully implemented a framework to fingerprint apps and ran a thorough set of experiments to assess its performance. We fingerprinted 110 of the most popular apps in the Google Play Store and were able to identify them six months later with up to 96% accuracy. Additionally, we show that app fingerprints persist to varying extents across devices and app versions.
△ Less
Submitted 20 April, 2017;
originally announced April 2017.
-
Quantifying Permission-Creep in the Google Play Store
Authors:
Vincent F. Taylor,
Ivan Martinovic
Abstract:
Although there are over 1,600,000 third-party Android apps in the Google Play Store, little has been conclusively shown about how their individual (and collective) permission usage has evolved over time. Recently, Android 6 overhauled the way permissions are granted by users, by switching to run-time permission requests instead of install-time permission requests. This is a welcome change, but rec…
▽ More
Although there are over 1,600,000 third-party Android apps in the Google Play Store, little has been conclusively shown about how their individual (and collective) permission usage has evolved over time. Recently, Android 6 overhauled the way permissions are granted by users, by switching to run-time permission requests instead of install-time permission requests. This is a welcome change, but recent research has shown that many users continue to accept run-time permissions blindly, leaving them at the mercy of third-party app developers and adversaries. Beyond intentionally invading privacy, highly privileged apps increase the attack surface of smartphones and are more attractive targets for adversaries. This work focuses exclusively on dangerous permissions, i.e., those permissions identified by Android as guarding access to sensitive user data. By taking snapshots of the Google Play Store over a 20-month period, we characterise changes in the number and type of dangerous permissions used by Android apps when they are updated, to gain a greater understanding of the evolution of permission usage. We found that approximately 25,000 apps asked for additional permissions every three months. Worryingly, we made statistically significant observations that free apps and highly popular apps were more likely to ask for additional permissions when they were updated. By looking at patterns in dangerous permission usage, we find evidence that suggests developers may still be failing to correctly specify the permissions their apps need.
△ Less
Submitted 10 August, 2016; v1 submitted 6 June, 2016;
originally announced June 2016.