-
Efficient CHAD
Authors:
Tom Smeding,
Matthijs Vákár
Abstract:
We show how the basic Combinatory Homomorphic Automatic Differentiation (CHAD) algorithm can be optimised, using well-known methods, to yield a simple, composable, and generally applicable reverse-mode automatic differentiation (AD) technique that has the correct computational complexity that we would expect of reverse-mode AD. Specifically, we show that the standard optimisations of sparse vector…
▽ More
We show how the basic Combinatory Homomorphic Automatic Differentiation (CHAD) algorithm can be optimised, using well-known methods, to yield a simple, composable, and generally applicable reverse-mode automatic differentiation (AD) technique that has the correct computational complexity that we would expect of reverse-mode AD. Specifically, we show that the standard optimisations of sparse vectors and state-passing style code (as well as defunctionalisation/closure conversion, for higher-order languages) give us a purely functional algorithm that is most of the way to the correct complexity, with (functional) mutable updates taking care of the final log-factors. We provide an Agda formalisation of our complexity proof. Finally, we discuss how the techniques apply to differentiating parallel functional array programs: the key observations are 1) that all required mutability is (commutative, associative) accumulation, which lets us preserve task-parallelism and 2) that we can write down data-parallel derivatives for most data-parallel array primitives.
△ Less
Submitted 20 October, 2023; v1 submitted 7 July, 2023;
originally announced July 2023.
-
Parallel Dual-Numbers Reverse AD
Authors:
Tom Smeding,
Matthijs Vákár
Abstract:
Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a c…
▽ More
Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of linear factoring to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the use of our technique by providing a practical implementation that differentiates most of Haskell98. Where previous work on dual numbers reverse AD has required sequentialisation to construct the reverse pass, we demonstrate that we can apply our technique to task-parallel source programs and generate a task-parallel derivative computation.
△ Less
Submitted 7 May, 2024; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Dual-Numbers Reverse AD, Efficiently
Authors:
Tom Smeding,
Matthijs Vákár
Abstract:
Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent derivative, dual-numbers /reverse-mode/ AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis w…
▽ More
Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent derivative, dual-numbers /reverse-mode/ AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis was on a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of /linear factoring/ to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with resource-linear types, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. Furthermore, we observe a connection with classical imperative ta**-based reverse AD, as well as Kmett's 'ad' Haskell library, recently analysed by Krawiec et al. We demonstrate the practical use of our technique by providing a performant implementation that differentiates most of Haskell98.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
CHAD: Combinatory Homomorphic Automatic Differentiation
Authors:
Matthijs Vákár,
Tom Smeding
Abstract:
We introduce Combinatory Homomorphic Automatic Differentiation (CHAD), a principled, pure, provably correct define-then-run method for performing forward and reverse mode automatic differentiation (AD) on programming languages with expressive features. It implements AD as a compositional, type-respecting source-code transformation that generates purely functional code. This code transformation is…
▽ More
We introduce Combinatory Homomorphic Automatic Differentiation (CHAD), a principled, pure, provably correct define-then-run method for performing forward and reverse mode automatic differentiation (AD) on programming languages with expressive features. It implements AD as a compositional, type-respecting source-code transformation that generates purely functional code. This code transformation is principled in the sense that it is the unique homomorphic (structure preserving) extension to expressive languages of Elliott's well-known and unambiguous definitions of AD for a first-order functional language. Correctness of the method follows by a (compositional) logical relations argument that shows that the semantics of the syntactic derivative is the usual calculus derivative of the semantics of the original program.
In their most elegant formulation, the transformations generate code with linear types. However, the code transformations can be implemented in a standard functional language lacking linear types: while the correctness proof requires tracking of linearity, the actual transformations do not. In fact, even in a standard functional language, we can get all of the type-safety that linear types give us: we can implement all linear types used to type the transformations as abstract types, by using a basic module system.
In this paper, we detail the method when applied to a simple higher-order language for manipulating statically sized arrays. However, we explain how the methodology applies, more generally, to functional languages with other expressive features. Finally, we discuss how the scope of CHAD extends beyond applications in AD to other dynamic program analyses that accumulate data in a commutative monoid.
△ Less
Submitted 8 June, 2022; v1 submitted 29 March, 2021;
originally announced March 2021.