Search | arXiv e-print repository

Challenges with Differentiable Quantum Dynamics

Authors: Sri Hari Krishna Narayanan, Michael Perlin, Robert Lewis-Swan, Jeffrey Larson, Matt Menickelly, Jan Hückelheim, Paul Hovland

Abstract: Differentiable quantum dynamics require automatic differentiation of a complex-valued initial value problem, which numerically integrates a system of ordinary differential equations from a specified initial condition, as well as the eigendecomposition of a matrix. We explored several automatic differentiation frameworks for these tasks, finding that no framework natively supports our application r… ▽ More Differentiable quantum dynamics require automatic differentiation of a complex-valued initial value problem, which numerically integrates a system of ordinary differential equations from a specified initial condition, as well as the eigendecomposition of a matrix. We explored several automatic differentiation frameworks for these tasks, finding that no framework natively supports our application requirements. We therefore demonstrate a need for broader support of complex-valued, differentiable numerical integration in scientific computing libraries. △ Less

Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2405.15590 [pdf, ps, other]

Profiling checkpointing schedules in adjoint ST-AD

Authors: Laurent Hascoët, Jean-Luc Bouchot, Shreyas Sunil Gaikwad, Sri Hari Krishna Narayanan, Jan Hückelheim

Abstract: Checkpointing is a cornerstone of data-flow reversal in adjoint algorithmic differentiation. Checkpointing is a storage/recomputation trade-off that can be applied at different levels, one of which being the call tree. We are looking for good placements of checkpoints onto the call tree of a given application, to reduce run time and memory footprint of its adjoint. There is no known optimal soluti… ▽ More Checkpointing is a cornerstone of data-flow reversal in adjoint algorithmic differentiation. Checkpointing is a storage/recomputation trade-off that can be applied at different levels, one of which being the call tree. We are looking for good placements of checkpoints onto the call tree of a given application, to reduce run time and memory footprint of its adjoint. There is no known optimal solution to this problem other than a combinatorial search on all placements. We propose a heuristics based on run-time profiling of the adjoint code. We describe implementation of this profiling tool in an existing source-transformation AD tool. We demonstrate the interest of this approach on test cases taken from the MITgcm ocean and atmospheric global circulation model. We discuss the limitations of our approach and propose directions to lift them. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2404.17039 [pdf, other]

Differentiating Through Linear Solvers

Authors: Paul Hovland, Jan Hückelheim

Abstract: Computer programs containing calls to linear solvers are a known challenge for automatic differentiation. Previous publications advise against differentiating through the low-level solver implementation, and instead advocate for high-level approaches that express the derivative in terms of a modified linear system that can be solved with a separate solver call. Despite this ubiquitous advice, we a… ▽ More Computer programs containing calls to linear solvers are a known challenge for automatic differentiation. Previous publications advise against differentiating through the low-level solver implementation, and instead advocate for high-level approaches that express the derivative in terms of a modified linear system that can be solved with a separate solver call. Despite this ubiquitous advice, we are not aware of prior work comparing the accuracy of both approaches. With this article we thus empirically study a simple question: What happens if we ignore common wisdom, and differentiate through linear solvers? △ Less

Submitted 6 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.09950 [pdf, other]

Parametric Sensitivities of a Wind-driven Baroclinic Ocean Using Neural Surrogates

Authors: Yixuan Sun, Elizabeth Cucuzzella, Steven Brus, Sri Hari Krishna Narayanan, Balasubramanya Nadiga, Luke Van Roekel, Jan Hückelheim, Sandeep Madireddy, Patrick Heimbach

Abstract: Numerical models of the ocean and ice sheets are crucial for understanding and simulating the impact of greenhouse gases on the global climate. Oceanic processes affect phenomena such as hurricanes, extreme precipitation, and droughts. Ocean models rely on subgrid-scale parameterizations that require calibration and often significantly affect model skill. When model sensitivities to parameters can… ▽ More Numerical models of the ocean and ice sheets are crucial for understanding and simulating the impact of greenhouse gases on the global climate. Oceanic processes affect phenomena such as hurricanes, extreme precipitation, and droughts. Ocean models rely on subgrid-scale parameterizations that require calibration and often significantly affect model skill. When model sensitivities to parameters can be computed by using approaches such as automatic differentiation, they can be used for such calibration toward reducing the misfit between model output and data. Because the SOMA model code is challenging to differentiate, we have created neural network-based surrogates for estimating the sensitivity of the ocean model to model parameters. We first generated perturbed parameter ensemble data for an idealized ocean model and trained three surrogate neural network models. The neural surrogates accurately predicted the one-step forward ocean dynamics, of which we then computed the parametric sensitivity. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2401.11952 [pdf, other]

MITgcm-AD v2: Open source tangent linear and adjoint modeling framework for the oceans and atmosphere enabled by the Automatic Differentiation tool Tapenade

Authors: Shreyas Sunil Gaikwad, Sri Hari Krishna Narayanan, Laurent Hascoet, Jean-Michel Campin, Helen Pillar, An Nguyen, Jan Hückelheim, Paul Hovland, Patrick Heimbach

Abstract: The Massachusetts Institute of Technology General Circulation Model (MITgcm) is widely used by the climate science community to simulate planetary atmosphere and ocean circulations. A defining feature of the MITgcm is that it has been developed to be compatible with an algorithmic differentiation (AD) tool, TAF, enabling the generation of tangent-linear and adjoint models. These provide gradient i… ▽ More The Massachusetts Institute of Technology General Circulation Model (MITgcm) is widely used by the climate science community to simulate planetary atmosphere and ocean circulations. A defining feature of the MITgcm is that it has been developed to be compatible with an algorithmic differentiation (AD) tool, TAF, enabling the generation of tangent-linear and adjoint models. These provide gradient information which enables dynamics-based sensitivity and attribution studies, state and parameter estimation, and rigorous uncertainty quantification. Importantly, gradient information is essential for computing comprehensive sensitivities and performing efficient large-scale data assimilation, ensuring that observations collected from satellites and in-situ measuring instruments can be effectively used to optimize a large uncertain control space. As a result, the MITgcm forms the dynamical core of a key data assimilation product employed by the physical oceanography research community: Estimating the Circulation and Climate of the Ocean (ECCO) state estimate. Although MITgcm and ECCO are used extensively within the research community, the AD tool TAF is proprietary and hence inaccessible to a large proportion of these users. The new version 2 (MITgcm-AD v2) framework introduced here is based on the source-to-source AD tool Tapenade, which has recently been open-sourced. Another feature of Tapenade is that it stores required variables by default (instead of recomputing them) which simplifies the implementation of efficient, AD-compatible code. The framework has been integrated with the MITgcm model main branch and is now freely available. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 12 pages, 2 figures, 4 tables, submitted to Joint Laboratory on Extreme Scale Computing Future Generation Computer Systems (JLESC-FGCS)

arXiv:2311.11876 [pdf, other]

Forward Gradients for Data-Driven CFD Wall Modeling

Authors: Jan Hückelheim, Tadbhagya Kumar, Krishnan Raghavan, Pinaki Pal

Abstract: Computational Fluid Dynamics (CFD) is used in the design and optimization of gas turbines and many other industrial/ scientific applications. However, the practical use is often limited by the high computational cost, and the accurate resolution of near-wall flow is a significant contributor to this cost. Machine learning (ML) and other data-driven methods can complement existing wall models. Neve… ▽ More Computational Fluid Dynamics (CFD) is used in the design and optimization of gas turbines and many other industrial/ scientific applications. However, the practical use is often limited by the high computational cost, and the accurate resolution of near-wall flow is a significant contributor to this cost. Machine learning (ML) and other data-driven methods can complement existing wall models. Nevertheless, training these models is bottlenecked by the large computational effort and memory footprint demanded by back-propagation. Recent work has presented alternatives for computing gradients of neural networks where a separate forward and backward sweep is not needed and storage of intermediate results between sweeps is not required because an unbiased estimator for the gradient is computed in a single forward sweep. In this paper, we discuss the application of this approach for training a subgrid wall model that could potentially be used as a surrogate in wall-bounded flow CFD simulations to reduce the computational overhead while preserving predictive accuracy. △ Less

Submitted 28 November, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.08421 [pdf, other]

Surrogate Neural Networks to Estimate Parametric Sensitivity of Ocean Models

Authors: Yixuan Sun, Elizabeth Cucuzzella, Steven Brus, Sri Hari Krishna Narayanan, Balu Nadiga, Luke Van Roekel, Jan Hückelheim, Sandeep Madireddy

Abstract: Modeling is crucial to understanding the effect of greenhouse gases, warming, and ice sheet melting on the ocean. At the same time, ocean processes affect phenomena such as hurricanes and droughts. Parameters in the models that cannot be physically measured have a significant effect on the model output. For an idealized ocean model, we generated perturbed parameter ensemble data and trained surrog… ▽ More Modeling is crucial to understanding the effect of greenhouse gases, warming, and ice sheet melting on the ocean. At the same time, ocean processes affect phenomena such as hurricanes and droughts. Parameters in the models that cannot be physically measured have a significant effect on the model output. For an idealized ocean model, we generated perturbed parameter ensemble data and trained surrogate neural network models. The neural surrogates accurately predicted the one-step forward dynamics, of which we then computed the parametric sensitivity. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2305.18198 [pdf, ps, other]

Model Checking Race-freedom When "Sequential Consistency for Data-race-free Programs" is Guaranteed

Authors: Wenhao Wu, Jan Hückelheim, Paul D. Hovland, Ziqing Luo, Stephen F. Siegel

Abstract: Many parallel programming models guarantee that if all sequentially consistent (SC) executions of a program are free of data races, then all executions of the program will appear to be sequentially consistent. This greatly simplifies reasoning about the program, but leaves open the question of how to verify that all SC executions are race-free. In this paper, we show that with a few simple modific… ▽ More Many parallel programming models guarantee that if all sequentially consistent (SC) executions of a program are free of data races, then all executions of the program will appear to be sequentially consistent. This greatly simplifies reasoning about the program, but leaves open the question of how to verify that all SC executions are race-free. In this paper, we show that with a few simple modifications, model checking can be an effective tool for verifying race-freedom. We explore this technique on a suite of C programs parallelized with OpenMP. △ Less

Submitted 20 July, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.07546 [pdf, other]

Understanding Automatic Differentiation Pitfalls

Authors: Jan Hückelheim, Harshitha Menon, William Moses, Bruce Christianson, Paul Hovland, Laurent Hascoët

Abstract: Automatic differentiation, also known as backpropagation, AD, autodiff, or algorithmic differentiation, is a popular technique for computing derivatives of computer programs accurately and efficiently. Sometimes, however, the derivatives computed by AD could be interpreted as incorrect. These pitfalls occur systematically across tools and approaches. In this paper we broadly categorize problematic… ▽ More Automatic differentiation, also known as backpropagation, AD, autodiff, or algorithmic differentiation, is a popular technique for computing derivatives of computer programs accurately and efficiently. Sometimes, however, the derivatives computed by AD could be interpreted as incorrect. These pitfalls occur systematically across tools and approaches. In this paper we broadly categorize problematic usages of AD and illustrate each category with examples such as chaos, time-averaged oscillations, discretizations, fixed-point loops, lookup tables, and linear solvers. We also review debugging techniques and their effectiveness in these situations. With this article we hope to help readers avoid unexpected behavior, detect problems more easily when they occur, and have more realistic expectations from AD tools. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:2210.08378 [pdf, other]

Memory-Efficient Differentiable Programming for Quantum Optimal Control of Discrete Lattices

Authors: Xian Wang, Paul Kairys, Sri Hari Krishna Narayanan, Jan Hückelheim, Paul Hovland

Abstract: Quantum optimal control problems are typically solved by gradient-based algorithms such as GRAPE, which suffer from exponential growth in storage with increasing number of qubits and linear growth in memory requirements with increasing number of time steps. Employing QOC for discrete lattices reveals that these memory requirements are a barrier for simulating large models or long time spans. We em… ▽ More Quantum optimal control problems are typically solved by gradient-based algorithms such as GRAPE, which suffer from exponential growth in storage with increasing number of qubits and linear growth in memory requirements with increasing number of time steps. Employing QOC for discrete lattices reveals that these memory requirements are a barrier for simulating large models or long time spans. We employ a nonstandard differentiable programming approach that significantly reduces the memory requirements at the cost of a reasonable amount of recomputation. The approach exploits invertibility properties of the unitary matrices to reverse the computation during back-propagation. We utilize QOC software written in the differentiable programming framework JAX that implements this approach, and demonstrate its effectiveness for lattice gauge theory. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Comments: 6 pages, 6 figures, The International Workshop on Quantum Computing Software

MSC Class: 68; 81 ACM Class: I.6; J.2; G.4

arXiv:2203.12717 [pdf, other]

Reducing Memory Requirements of Quantum Optimal Control

Authors: Sri Hari Krishna Narayanan, Thomas Propson, Marcelo Bongarti, Jan Hueckelheim, Paul Hovland

Abstract: Quantum optimal control problems are typically solved by gradient-based algorithms such as GRAPE, which suffer from exponential growth in storage with increasing number of qubits and linear growth in memory requirements with increasing number of time steps. These memory requirements are a barrier for simulating large models or long time spans. We have created a nonstandard automatic differentiatio… ▽ More Quantum optimal control problems are typically solved by gradient-based algorithms such as GRAPE, which suffer from exponential growth in storage with increasing number of qubits and linear growth in memory requirements with increasing number of time steps. These memory requirements are a barrier for simulating large models or long time spans. We have created a nonstandard automatic differentiation technique that can compute gradients needed by GRAPE by exploiting the fact that the inverse of a unitary matrix is its conjugate transpose. Our approach significantly reduces the memory requirements for GRAPE, at the cost of a reasonable amount of recomputation. We present benchmark results based on an implementation in JAX. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: 14 pages, 6 figures, 4 listings, 1 table, accepted for publication in the proceedings of the International Conference on Computational Science (ICCS) 2022

Report number: ANL/MCS-P9566-0222

arXiv:2202.08387 [pdf, other]

TROPHY: Trust Region Optimization Using a Precision Hierarchy

Authors: Richard J Clancy, Matt Menickelly, Jan Hückelheim, Paul Hovland, Prani Nalluri, Rebecca G**i

Abstract: We present an algorithm to perform trust-region-based optimization for nonlinear unconstrained problems. The method selectively uses function and gradient evaluations at different floating-point precisions to reduce the overall energy consumption, storage, and communication costs; these capabilities are increasingly important in the era of exascale computing. In particular, we are motivated by a d… ▽ More We present an algorithm to perform trust-region-based optimization for nonlinear unconstrained problems. The method selectively uses function and gradient evaluations at different floating-point precisions to reduce the overall energy consumption, storage, and communication costs; these capabilities are increasingly important in the era of exascale computing. In particular, we are motivated by a desire to improve computational efficiency for massive climate models. We employ our method on two examples: the CUTEst test set and a large-scale data assimilation problem to recover wind fields from radar returns. Although this paper is primarily a proof of concept, we show that if implemented on appropriate hardware, the use of mixed-precision can significantly reduce the computational load compared with fixed-precision solvers. △ Less

Submitted 16 February, 2022; originally announced February 2022.

Comments: 14 pages, 2 figures, 2 tables

MSC Class: 90-08 ACM Class: G.1.6

arXiv:2111.01861 [pdf, other]

Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

Authors: Jan Hückelheim, Laurent Hascoët

Abstract: This paper presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are crucial in optimization, uncertainty quantification, and machine learning. The computational cost to compute gradients is a common bottleneck in practice. For… ▽ More This paper presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are crucial in optimization, uncertainty quantification, and machine learning. The computational cost to compute gradients is a common bottleneck in practice. For applications that are parallelized for multicore CPUs or GPUs using OpenMP, one also wishes to compute the gradients in parallel. We propose a framework to reason about the correctness of the generated derivative code, from which we justify our OpenMP extension to the differentiation model. We implement this model in the automatic differentiation tool Tapenade and present test cases that are differentiated following our extended differentiation procedure. Performance of the generated derivative programs in forward and reverse mode is better than sequential, although our reverse mode often scales worse than the input programs. △ Less

Submitted 2 November, 2021; originally announced November 2021.

Comments: To appear in ACM TOMS

arXiv:2009.12623 [pdf, other]

Lossy Checkpoint Compression in Full Waveform Inversion: a case study with ZFPv0.5.5 and the Overthrust Model

Authors: Navjot Kukreja, Jan Hueckelheim, Mathias Louboutin, John Washbourne, Paul H. J. Kelly, Gerard J. Gorman

Abstract: This paper proposes a new method that combines check-pointing methods with error-controlled lossy compression for large-scale high-performance Full-Waveform Inversion (FWI), an inverse problem commonly used in geophysical exploration. This combination can significantly reduce data movement, allowing a reduction in run time as well as peak memory. In the Exascale computing era, frequent data transf… ▽ More This paper proposes a new method that combines check-pointing methods with error-controlled lossy compression for large-scale high-performance Full-Waveform Inversion (FWI), an inverse problem commonly used in geophysical exploration. This combination can significantly reduce data movement, allowing a reduction in run time as well as peak memory. In the Exascale computing era, frequent data transfer (e.g., memory bandwidth, PCIe bandwidth for GPUs, or network) is the performance bottleneck rather than the peak FLOPS of the processing unit. Like many other adjoint-based optimization problems, FWI is costly in terms of the number of floating-point operations, large memory footprint during backpropagation, and data transfer overheads. Past work for adjoint methods has developed checkpointing methods that reduce the peak memory requirements during backpropagation at the cost of additional floating-point computations. Combining this traditional checkpointing with error-controlled lossy compression, we explore the three-way tradeoff between memory, precision, and time to solution. We investigate how approximation errors introduced by lossy compression of the forward solution impact the objective function gradient and final inverted solution. Empirical results from these numerical experiments indicate that high lossy-compression rates (compression factors ranging up to 100) have a relatively minor impact on convergence rates and the quality of the final solution. △ Less

Submitted 15 September, 2021; v1 submitted 26 September, 2020; originally announced September 2020.

arXiv:1907.02818 [pdf, other]

doi 10.1145/3337821.3337906

Automatic Differentiation for Adjoint Stencil Loops

Authors: Jan Hückelheim, Navjot Kukreja, Sri Hari Krishna Narayanan, Fabio Luporini, Gerard Gorman, Paul Hovland

Abstract: Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoin… ▽ More Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoint differentiation, or back-propagation, is sometimes used to obtain gradients of programs that contain stencil loops. Unfortunately, conventional automatic differentiation results in a memory access pattern that is not stencil-like and not easily parallelisable. In this paper we present a novel combination of automatic differentiation and loop transformations that preserves the structure and memory access pattern of stencil loops, while computing fully consistent derivatives. The generated loops can be parallelised and optimised for performance in the same way and using the same tools as the original computation. We have implemented this new technique in the Python tool PerforAD, which we release with this paper along with test cases derived from seismic imaging and computational fluid dynamics applications. △ Less

Submitted 5 July, 2019; originally announced July 2019.

Comments: ICPP 2019

arXiv:1903.03051 [pdf, other]

Training on the Edge: The why and the how

Authors: Navjot Kukreja, Alena Shilova, Olivier Beaumont, Jan Huckelheim, Nicola Ferrier, Paul Hovland, Gerard Gorman

Abstract: Edge computing is the natural progression from Cloud computing, where, instead of collecting all data and processing it centrally, like in a cloud computing environment, we distribute the computing power and try to do as much processing as possible, close to the source of the data. There are various reasons this model is being adopted quickly, including privacy, and reduced power and bandwidth req… ▽ More Edge computing is the natural progression from Cloud computing, where, instead of collecting all data and processing it centrally, like in a cloud computing environment, we distribute the computing power and try to do as much processing as possible, close to the source of the data. There are various reasons this model is being adopted quickly, including privacy, and reduced power and bandwidth requirements on the Edge nodes. While it is common to see inference being done on Edge nodes today, it is much less common to do training on the Edge. The reasons for this range from computational limitations, to it not being advantageous in reducing communications between the Edge nodes. In this paper, we explore some scenarios where it is advantageous to do training on the Edge, as well as the use of checkpointing strategies to save memory. △ Less

Submitted 13 February, 2019; originally announced March 2019.

Comments: Submitted to PAISE 2019

arXiv:1810.05268 [pdf, other]

Combining Checkpointing and Data Compression to Accelerate Adjoint-Based Optimization Problems

Authors: Navjot Kukreja, Jan Hueckelheim, Mathias Louboutin, Fabio Luporini, Paul Hovland, Gerard Gorman

Abstract: Seismic inversion and imaging are adjoint-based optimization problems that process up to terabytes of data, regularly exceeding the memory capacity of available computers. Data compression is an effective strategy to reduce this memory requirement by a certain factor, particularly if some loss in accuracy is acceptable. A popular alternative is checkpointing, where data is stored at selected point… ▽ More Seismic inversion and imaging are adjoint-based optimization problems that process up to terabytes of data, regularly exceeding the memory capacity of available computers. Data compression is an effective strategy to reduce this memory requirement by a certain factor, particularly if some loss in accuracy is acceptable. A popular alternative is checkpointing, where data is stored at selected points in time, and values at other times are recomputed as needed from the last stored state. This allows arbitrarily large adjoint computations with limited memory, at the cost of additional recomputations. In this paper, we combine compression and checkpointing for the first time to compute a realistic seismic inversion. The combination of checkpointing and compression allows larger adjoint computations compared to using only compression, and reduces the recomputation overhead significantly compared to using only checkpointing. △ Less

Submitted 20 September, 2021; v1 submitted 11 October, 2018; originally announced October 2018.

Comments: Accepted in European Conference on Parallel Proessing (EuroPar) 2019. Part of the Lecture Notes in Computer Science book series (LNCS, volume 11725)

arXiv:1807.03032 [pdf, other]

Architecture and performance of Devito, a system for automated stencil computation

Authors: Fabio Luporini, Michael Lange, Mathias Louboutin, Navjot Kukreja, Jan Hückelheim, Charles Yount, Philipp Witte, Paul H. J. Kelly, Felix J. Herrmann, Gerard J. Gorman

Abstract: Stencil computations are a key part of many high-performance computing applications, such as image processing, convolutional neural networks, and finite-difference solvers for partial differential equations. Devito is a framework capable of generating highly-optimized code given symbolic equations expressed in Python, specialized in, but not limited to, affine (stencil) codes. The lowering process… ▽ More Stencil computations are a key part of many high-performance computing applications, such as image processing, convolutional neural networks, and finite-difference solvers for partial differential equations. Devito is a framework capable of generating highly-optimized code given symbolic equations expressed in Python, specialized in, but not limited to, affine (stencil) codes. The lowering process---from mathematical equations down to C++ code---is performed by the Devito compiler through a series of intermediate representations. Several performance optimizations are introduced, including advanced common sub-expressions elimination, tiling and parallelization. Some of these are obtained through well-established stencil optimizers, integrated in the back-end of the Devito compiler. The architecture of the Devito compiler, as well as the performance optimizations that are applied when generating code, are presented. The effectiveness of such performance optimizations is demonstrated using operators drawn from seismic imaging applications. △ Less

Submitted 7 February, 2020; v1 submitted 9 July, 2018; originally announced July 2018.

Comments: Submitted to ACM Transactions on Mathematical Software

MSC Class: 65N06; 68N20

arXiv:1806.01117 [pdf, other]

Backpropagation for long sequences: beyond memory constraints with constant overheads

Authors: Navjot Kukreja, Jan Hückelheim, Gerard J. Gorman

Abstract: Naive backpropagation through time has a memory footprint that grows linearly in the sequence length, due to the need to store each state of the forward propagation. This is a problem for large networks. Strategies have been developed to trade memory for added computations, which results in a sublinear growth of memory footprint or computation overhead. In this work, we present a library that uses… ▽ More Naive backpropagation through time has a memory footprint that grows linearly in the sequence length, due to the need to store each state of the forward propagation. This is a problem for large networks. Strategies have been developed to trade memory for added computations, which results in a sublinear growth of memory footprint or computation overhead. In this work, we present a library that uses asynchronous storing and prefetching to move data to and from slow and cheap stor- age. The library only stores and prefetches states as frequently as possible without delaying the computation, and uses the optimal Revolve backpropagation strategy for the computations in between. The memory footprint of the backpropagation can thus be reduced to any size (e.g. to fit into DRAM), while the computational overhead is constant in the sequence length, and only depends on the ratio between compute and transfer times on a given hardware. We show in experiments that by exploiting asyncronous data transfer, our strategy is always at least as fast, and usually faster than the previously studied "optimal" strategies. △ Less

Submitted 22 May, 2018; originally announced June 2018.

arXiv:1802.02474 [pdf, other]

High-level python abstractions for optimal checkpointing in inversion problems

Authors: Navjot Kukreja, Jan Hückelheim, Michael Lange, Mathias Louboutin, Andrea Walther, Simon W. Funke, Gerard Gorman

Abstract: Inversion and PDE-constrained optimization problems often rely on solving the adjoint problem to calculate the gradient of the objec- tive function. This requires storing large amounts of intermediate data, setting a limit to the largest problem that might be solved with a given amount of memory available. Checkpointing is an approach that can reduce the amount of memory required by redoing parts… ▽ More Inversion and PDE-constrained optimization problems often rely on solving the adjoint problem to calculate the gradient of the objec- tive function. This requires storing large amounts of intermediate data, setting a limit to the largest problem that might be solved with a given amount of memory available. Checkpointing is an approach that can reduce the amount of memory required by redoing parts of the computation instead of storing intermediate results. The Revolve checkpointing algorithm o ers an optimal schedule that trades computational cost for smaller memory footprints. Integrat- ing Revolve into a modern python HPC code and combining it with code generation is not straightforward. We present an API that makes checkpointing accessible from a DSL-based code generation environment along with some initial performance gures with a focus on seismic applications. △ Less

Submitted 12 January, 2018; originally announced February 2018.

arXiv:1707.03776 [pdf, other]

Optimised finite difference computation from symbolic equations

Authors: Michael Lange, Navjot Kukreja, Fabio Luporini, Mathias Louboutin, Charles Yount, Jan Hückelheim, Gerard J. Gorman

Abstract: Domain-specific high-productivity environments are playing an increasingly important role in scientific computing due to the levels of abstraction and automation they provide. In this paper we introduce Devito, an open-source domain-specific framework for solving partial differential equations from symbolic problem definitions by the finite difference method. We highlight the generation and automa… ▽ More Domain-specific high-productivity environments are playing an increasingly important role in scientific computing due to the levels of abstraction and automation they provide. In this paper we introduce Devito, an open-source domain-specific framework for solving partial differential equations from symbolic problem definitions by the finite difference method. We highlight the generation and automated execution of highly optimized stencil code from only a few lines of high-level symbolic Python for a set of scientific equations, before exploring the use of Devito operators in seismic inversion problems. △ Less

Submitted 12 July, 2017; originally announced July 2017.

Comments: Accepted for publication in Proceedings of the 16th Python in Science Conference (SciPy 2017)

Showing 1–21 of 21 results for author: Hueckelheim, J