Search | arXiv e-print repository

Local Adjoints for Simultaneous Preaccumulations with Shared Inputs

Authors: Johannes Blühdorn, Nicolas R. Gauger

Abstract: In shared-memory parallel automatic differentiation, shared inputs among simultaneous thread-local preaccumulations lead to data races if Jacobians are accumulated with a single, shared vector of adjoint variables. In this work, we discuss the benefits and tradeoffs of re-enabling such preaccumulations by a transition to suitable local adjoint variables. In particular, we assess the performance of… ▽ More In shared-memory parallel automatic differentiation, shared inputs among simultaneous thread-local preaccumulations lead to data races if Jacobians are accumulated with a single, shared vector of adjoint variables. In this work, we discuss the benefits and tradeoffs of re-enabling such preaccumulations by a transition to suitable local adjoint variables. In particular, we assess the performance of mapped local adjoints in discrete adjoint computations in the multiphysics simulation suite SU2. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 9 pages, 4 figures, 1 table

ACM Class: D.1.3; G.1.4; G.4; J.2

arXiv:2405.06056 [pdf, other]

Hybrid Parallel Discrete Adjoints in SU2

Authors: Johannes Blühdorn, Pedro Gomes, Max Aehle, Nicolas R. Gauger

Abstract: The open-source multiphysics suite SU2 features discrete adjoints by means of operator overloading automatic differentiation (AD). While both primal and discrete adjoint solvers support MPI parallelism, hybrid parallelism using both MPI and OpenMP has only been introduced for the primal solvers so far. In this work, we enable hybrid parallel discrete adjoint solvers. Coupling SU2 with OpDiLib, an… ▽ More The open-source multiphysics suite SU2 features discrete adjoints by means of operator overloading automatic differentiation (AD). While both primal and discrete adjoint solvers support MPI parallelism, hybrid parallelism using both MPI and OpenMP has only been introduced for the primal solvers so far. In this work, we enable hybrid parallel discrete adjoint solvers. Coupling SU2 with OpDiLib, an add-on for operator overloading AD tools that extends AD to OpenMP parallelism, marks a key step in this endeavour. We identify the affected parts of SU2's advanced AD workflow and discuss the required changes and their tradeoffs. Detailed performance studies compare MPI parallel and hybrid parallel discrete adjoints in terms of memory and runtime and unveil key performance characteristics. We showcase the effectiveness of performance optimizations and highlight perspectives for future improvements. At the same time, this study demonstrates the applicability of OpDiLib in a large code base and its scalability on large test cases, providing valuable insights for future applications both within and beyond SU2. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 32 pages, 9 figures, 2 listings

ACM Class: D.1.3; G.1.4; G.4; J.2

arXiv:2212.13760 [pdf, other]

Reverse-Mode Automatic Differentiation of Compiled Programs

Authors: Max Aehle, Johannes Blühdorn, Max Sagebaum, Nicolas R. Gauger

Abstract: Tools for algorithmic differentiation (AD) provide accurate derivatives of computer-implemented functions for use in, e. g., optimization and machine learning (ML). However, they often require the source code of the function to be available in a restricted set of programming languages. As a step towards making AD accessible for code bases with cross-language or closed-source components, we recentl… ▽ More Tools for algorithmic differentiation (AD) provide accurate derivatives of computer-implemented functions for use in, e. g., optimization and machine learning (ML). However, they often require the source code of the function to be available in a restricted set of programming languages. As a step towards making AD accessible for code bases with cross-language or closed-source components, we recently presented the forward-mode AD tool Derivgrind. It inserts forward-mode AD logic into the machine code of a compiled program using the Valgrind dynamic binary instrumentation framework. This work extends Derivgrind, adding the capability to record the real-arithmetic evaluation tree, and thus enabling operator overloading style reverse-mode AD for compiled programs. We maintain the high level of correctness reported for Derivgrind's forward mode, failing the same few testcases in an extensive test suite for the same well-understood reasons. Runtime-wise, the recording slows down the execution of a compiled 64-bit benchmark program by a factor of about 180. △ Less

Submitted 28 December, 2022; originally announced December 2022.

Comments: 17 pages, 5 figures, 1 listing

arXiv:2209.01895 [pdf, other]

Forward-Mode Automatic Differentiation of Compiled Programs

Authors: Max Aehle, Johannes Blühdorn, Max Sagebaum, Nicolas R. Gauger

Abstract: Algorithmic differentiation (AD) is a set of techniques that provide partial derivatives of computer-implemented functions. Such a function can be supplied to state-of-the-art AD tools via its source code, or via an intermediate representation produced while compiling its source code. We present the novel AD tool Derivgrind, which augments the machine code of compiled programs with forward-mode… ▽ More Algorithmic differentiation (AD) is a set of techniques that provide partial derivatives of computer-implemented functions. Such a function can be supplied to state-of-the-art AD tools via its source code, or via an intermediate representation produced while compiling its source code. We present the novel AD tool Derivgrind, which augments the machine code of compiled programs with forward-mode AD logic. Derivgrind leverages the Valgrind instrumentation framework for a structured access to the machine code, and a shadow memory tool to store dot values. Access to the source code is required at most for the files in which input and output variables are defined. Derivgrind's versatility comes at the price of scaling the run-time by a factor between 30 and 75, measured on a benchmark based on a numerical solver for a partial differential equation. Results of our extensive regression test suite indicate that Derivgrind produces correct results on GCC- and Clang-compiled programs, including a Python interpreter, with a small number of exceptions. While we provide a list of scenarios that Derivgrind does not handle correctly, nearly all of them are academic counterexamples or originate from highly optimized math libraries. As long as differentiating those is avoided, Derivgrind can be applied to an unprecedentedly wide range of cross-language or partially closed-source software with little integration efforts. △ Less

Submitted 7 July, 2023; v1 submitted 5 September, 2022; originally announced September 2022.

Comments: 21 pages, 3 figures, 3 tables, 5 listings

arXiv:2202.05551 [pdf, other]

Exploration of Differentiability in a Proton Computed Tomography Simulation Framework

Authors: Max Aehle, Johan Alme, Gergely Gábor Barnaföldi, Johannes Blühdorn, Tea Bodova, Vyacheslav Borshchov, Anthony van den Brink, Viljar Eikeland, Gregory Feofilov, Christoph Garth, Nicolas R. Gauger, Ola Grøttvik, Håvard Helstrup, Sergey Igolkin, Ralf Keidel, Chinorat Kobdaj, Tobias Kortus, Lisa Kusch, Viktor Leonhardt, Shruti Mehendale, Raju Ningappa Mulawade, Odd Harald Odland, George O'Neill, Gábor Papp, Thomas Peitzmann , et al. (25 additional authors not shown)

Abstract: Objective. Algorithmic differentiation (AD) can be a useful technique to numerically optimize design and algorithmic parameters by, and quantify uncertainties in, computer simulations. However, the effectiveness of AD depends on how "well-linearizable" the software is. In this study, we assess how promising derivative information of a typical proton computed tomography (pCT) scan computer simulati… ▽ More Objective. Algorithmic differentiation (AD) can be a useful technique to numerically optimize design and algorithmic parameters by, and quantify uncertainties in, computer simulations. However, the effectiveness of AD depends on how "well-linearizable" the software is. In this study, we assess how promising derivative information of a typical proton computed tomography (pCT) scan computer simulation is for the aforementioned applications. Approach. This study is mainly based on numerical experiments, in which we repeatedly evaluate three representative computational steps with perturbed input values. We support our observations with a review of the algorithmic steps and arithmetic operations performed by the software, using debugging techniques. Main results. The model-based iterative reconstruction (MBIR) subprocedure (at the end of the software pipeline) and the Monte Carlo (MC) simulation (at the beginning) were piecewise differentiable. Jumps in the MBIR function arose from the discrete computation of the set of voxels intersected by a proton path. Jumps in the MC function likely arose from changes in the control flow that affect the amount of consumed random numbers. The tracking algorithm solves an inherently non-differentiable problem. Significance. The MC and MBIR codes are ready for the integration of AD, and further research on surrogate models for the tracking subprocedure is necessary. △ Less

Submitted 12 May, 2023; v1 submitted 11 February, 2022; originally announced February 2022.

Comments: 27 pages, 11 figures

arXiv:2102.11572 [pdf, other]

doi 10.1145/3570159

Event-Based Automatic Differentiation of OpenMP with OpDiLib

Authors: Johannes Blühdorn, Max Sagebaum, Nicolas R. Gauger

Abstract: We present the new software OpDiLib, a universal add-on for classical operator overloading AD tools that enables the automatic differentiation (AD) of OpenMP parallelized code. With it, we establish support for OpenMP features in a reverse mode operator overloading AD tool to an extent that was previously only reported on in source transformation tools. We achieve this with an event-based implemen… ▽ More We present the new software OpDiLib, a universal add-on for classical operator overloading AD tools that enables the automatic differentiation (AD) of OpenMP parallelized code. With it, we establish support for OpenMP features in a reverse mode operator overloading AD tool to an extent that was previously only reported on in source transformation tools. We achieve this with an event-based implementation ansatz that is unprecedented in AD. Combined with modern OpenMP features around OMPT, we demonstrate how it can be used to achieve differentiation without any additional modifications of the source code; neither do we impose a priori restrictions on the data access patterns, which makes OpDiLib highly applicable. For further performance optimizations, restrictions like atomic updates on adjoint variables can be lifted in a fine-grained manner. OpDiLib can also be applied in a semi-automatic fashion via a macro interface, which supports compilers that do not implement OMPT. We demonstrate the applicability of OpDiLib for a pure operator overloading approach in a hybrid parallel environment. We quantify the cost of atomic updates on adjoint variables and showcase the speedup and scaling that can be achieved with the different configurations of OpDiLib in both the forward and the reverse pass. △ Less

Submitted 30 June, 2022; v1 submitted 23 February, 2021; originally announced February 2021.

Comments: 31 pages, 13 figures, 3 tables, 13 listings; new layout, additional references, refocused Section 3 (former Section 4), extended performance tests, overall polishing and shortening

ACM Class: D.1.3; D.2.13; G.1.4; G.4

arXiv:2006.12992 [pdf, ps, other]

Assign optimization for algorithmic differentiation reuse index management strategies

Authors: Max Sagebaum, Johannes Blühdorn, Nicolas R. Gauger

Abstract: The identification of primal variables and adjoint variables is usually done via indices in operator overloading algorithmic differentiation tools. One approach is a linear management scheme, which is easy to implement and supports memory optimization for copy statements. An alternative approach performs a reuse of indices, which requires more implementation effort but results in much smaller adjo… ▽ More The identification of primal variables and adjoint variables is usually done via indices in operator overloading algorithmic differentiation tools. One approach is a linear management scheme, which is easy to implement and supports memory optimization for copy statements. An alternative approach performs a reuse of indices, which requires more implementation effort but results in much smaller adjoint vectors. Therefore, the vector mode of algorithmic differentiation scales better with the reuse management scheme. In this paper, we present a novel approach that reuses the indices and allows the copy optimization, thus combining the advantages of the two aforementioned schemes. The new approach is compared to the known approaches on a simple synthetic test case and a real-world example using the computational fluid dynamics solver SU2. △ Less

Submitted 15 August, 2023; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: 15 pages, 4 figures, 4 tables

MSC Class: 68N30 ACM Class: G.1.4; G.4; D.2.2

arXiv:2006.04391 [pdf, other]

doi 10.1007/s00466-021-02105-2

AutoMat -- Automatic Differentiation for Generalized Standard Materials on GPUs

Authors: Johannes Blühdorn, Nicolas R. Gauger, Matthias Kabel

Abstract: We propose a universal method for the evaluation of generalized standard materials that greatly simplifies the material law implementation process. By means of automatic differentiation and a numerical integration scheme, AutoMat reduces the implementation effort to two potential functions. By moving AutoMat to the GPU, we close the performance gap to conventional evaluation routines and demonstra… ▽ More We propose a universal method for the evaluation of generalized standard materials that greatly simplifies the material law implementation process. By means of automatic differentiation and a numerical integration scheme, AutoMat reduces the implementation effort to two potential functions. By moving AutoMat to the GPU, we close the performance gap to conventional evaluation routines and demonstrate in detail that the expression level reverse mode of automatic differentiation as well as its extension to second order derivatives can be applied inside CUDA kernels. We underline the effectiveness and the applicability of AutoMat by integrating it into the FFT-based homogenization scheme of Moulinec and Suquet and discuss the benefits of using AutoMat with respect to runtime and solution accuracy for an elasto-viscoplastic example. △ Less

Submitted 6 October, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: 28 pages, 15 figures, 7 tables; new layout, more detailed proof of Theorem 1

ACM Class: G.1.4; G.1.7; G.4; J.2

Showing 1–8 of 8 results for author: Blühdorn, J