Search | arXiv e-print repository

The Foil: Capture-Avoiding Substitution With No Sharp Edges

Authors: Dougal Maclaurin, Alexey Radul, Adam Paszke

Abstract: Correctly manipulating program terms in a compiler is surprisingly difficult because of the need to avoid name capture. The rapier from "Secrets of the Glasgow Haskell Compiler inliner" is a cutting-edge technique for fast, stateless capture-avoiding substitution for expressions represented with explicit names. It is, however, a sharp tool: its invariants are tricky and need to be maintained throu… ▽ More Correctly manipulating program terms in a compiler is surprisingly difficult because of the need to avoid name capture. The rapier from "Secrets of the Glasgow Haskell Compiler inliner" is a cutting-edge technique for fast, stateless capture-avoiding substitution for expressions represented with explicit names. It is, however, a sharp tool: its invariants are tricky and need to be maintained throughout the whole compiler that uses it. We describe the foil, an elaboration of the rapier that uses Haskell's type system to enforce the rapier's invariants statically, preventing a class of hard-to-find bugs, but without adding any run-time overheads. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: Presented at IFL 2022

arXiv:2204.10923 [pdf, other]

You Only Linearize Once: Tangents Transpose to Gradients

Authors: Alexey Radul, Adam Paszke, Roy Frostig, Matthew Johnson, Dougal Maclaurin

Abstract: Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two "modes" -- forward and reverse -- which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzip** the… ▽ More Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two "modes" -- forward and reverse -- which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzip** the linear and non-linear parts and then (iii) transposition of the linear part. To that end, we define a (substructurally) linear type system that can prove a class of functions are (algebraically) linear. Our main results are that forward-mode AD produces such linear functions, and that we can unzip and transpose any such linear function, conserving cost, size, and linearity. Composing these three transformations recovers reverse-mode AD. This decomposition also sheds light on checkpointing, which emerges naturally from a free choice in unzip** `let` expressions. As a corollary, checkpointing techniques are applicable to general-purpose partial evaluation, not just AD. We hope that our formalization will lead to a deeper understanding of automatic differentiation and that it will simplify implementations, by separating the concerns of differentiation proper from the concerns of gaining efficiency (namely, separating the derivative computation from the act of running it backward). △ Less

Submitted 6 December, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

arXiv:2105.09469 [pdf, other]

Decomposing reverse-mode automatic differentiation

Authors: Roy Frostig, Matthew J. Johnson, Dougal Maclaurin, Adam Paszke, Alexey Radul

Abstract: We decompose reverse-mode automatic differentiation into (forward-mode) linearization followed by transposition. Doing so isolates the essential difference between forward- and reverse-mode AD, and simplifies their joint implementation. In particular, once forward-mode AD rules are defined for every primitive operation in a source language, only linear primitives require an additional transpositio… ▽ More We decompose reverse-mode automatic differentiation into (forward-mode) linearization followed by transposition. Doing so isolates the essential difference between forward- and reverse-mode AD, and simplifies their joint implementation. In particular, once forward-mode AD rules are defined for every primitive operation in a source language, only linear primitives require an additional transposition rule in order to arrive at a complete reverse-mode AD implementation. This is how reverse-mode AD is written in JAX and Dex. △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: Presented at the LAFI 2021 workshop at POPL, 17 January 2021

arXiv:2104.05372 [pdf, other]

Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming

Authors: Adam Paszke, Daniel Johnson, David Duvenaud, Dimitrios Vytiniotis, Alexey Radul, Matthew Johnson, Jonathan Ragan-Kelley, Dougal Maclaurin

Abstract: We present a novel programming language design that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages. We treat arrays as eagerly-memoized functions on typed index sets, allowing abstract function manipulations, such as currying, to work on arrays. In contrast to composing primitive bulk-array operatio… ▽ More We present a novel programming language design that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages. We treat arrays as eagerly-memoized functions on typed index sets, allowing abstract function manipulations, such as currying, to work on arrays. In contrast to composing primitive bulk-array operations, we argue for an explicit nested indexing style that mirrors application of functions to arguments. We also introduce a fine-grained typed effects system which affords concise and automatically-parallelized in-place updates. Specifically, an associative accumulation effect allows reverse-mode automatic differentiation of in-place updates in a way that preserves parallelism. Empirically, we benchmark against the Futhark array programming language, and demonstrate that aggressive inlining and type-driven compilation allows array programs to be written in an expressive, "pointful" style with little performance penalty. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: 31 pages with appendix, 11 figures. A conference submission is still under review

arXiv:2010.09647 [pdf, other]

The Base Measure Problem and its Solution

Authors: Alexey Radul, Boris Alexeev

Abstract: Probabilistic programming systems generally compute with probability density functions, leaving the base measure of each such function implicit. This mostly works, but creates problems when densities with respect to different base measures are accidentally combined or compared. Mistakes also happen when computing volume corrections for continuous changes of variables, which in general depend on th… ▽ More Probabilistic programming systems generally compute with probability density functions, leaving the base measure of each such function implicit. This mostly works, but creates problems when densities with respect to different base measures are accidentally combined or compared. Mistakes also happen when computing volume corrections for continuous changes of variables, which in general depend on the support measure. We motivate and clarify the problem in the context of a composable library of probability distributions and bijective transformations. We solve the problem by standardizing on Hausdorff measure as a base, and deriving formulas for comparing and combining mixed-dimension densities, as well as updating densities with respect to Hausdorff measure under diffeomorphic transformations. We also propose a software architecture that implements these formulas efficiently in the common case. We hope that by adopting our solution, probabilistic programming systems can become more robust and general, and make a broader class of models accessible to practitioners. △ Less

Submitted 10 December, 2020; v1 submitted 6 October, 2020; originally announced October 2020.

arXiv:2001.05035 [pdf, ps, other]

FunMC: A functional API for building Markov Chains

Authors: Pavel Sountsov, Alexey Radul, Srinivas Vasudevan

Abstract: Constant-memory algorithms, also loosely called Markov chains, power the vast majority of probabilistic inference and machine learning applications today. A lot of progress has been made in constructing user-friendly APIs around these algorithms. Such APIs, however, rarely make it easy to research new algorithms of this type. In this work we present FunMC, a minimal Python library for doing method… ▽ More Constant-memory algorithms, also loosely called Markov chains, power the vast majority of probabilistic inference and machine learning applications today. A lot of progress has been made in constructing user-friendly APIs around these algorithms. Such APIs, however, rarely make it easy to research new algorithms of this type. In this work we present FunMC, a minimal Python library for doing methodological research into algorithms based on Markov chains. FunMC is not targeted toward data scientists or others who wish to use MCMC or optimization as a black box, but rather towards researchers implementing new Markovian algorithms from scratch. △ Less

Submitted 26 May, 2021; v1 submitted 14 January, 2020; originally announced January 2020.

Comments: Updated source code to reflect API; updated link to point to new location

arXiv:1910.11141 [pdf, other]

Automatically Batching Control-Intensive Programs for Modern Accelerators

Authors: Alexey Radul, Brian Patton, Dougal Maclaurin, Matthew D. Hoffman, Rif A. Saurous

Abstract: We present a general approach to batching arbitrary computations for accelerators such as GPUs. We show orders-of-magnitude speedups using our method on the No U-Turn Sampler (NUTS), a workhorse algorithm in Bayesian statistics. The central challenge of batching NUTS and other Markov chain Monte Carlo algorithms is data-dependent control flow and recursion. We overcome this by mechanically transfo… ▽ More We present a general approach to batching arbitrary computations for accelerators such as GPUs. We show orders-of-magnitude speedups using our method on the No U-Turn Sampler (NUTS), a workhorse algorithm in Bayesian statistics. The central challenge of batching NUTS and other Markov chain Monte Carlo algorithms is data-dependent control flow and recursion. We overcome this by mechanically transforming a single-example implementation into a form that explicitly tracks the current program point for each batch member, and only steps forward those in the same place. We present two different batching algorithms: a simpler, previously published one that inherits recursion from the host Python, and a more complex, novel one that implemenents recursion directly and can batch across it. We implement these batching methods as a general program transformation on Python source. Both the batching system and the NUTS implementation presented here are available as part of the popular TensorFlow Probability software package. △ Less

Submitted 12 March, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

Comments: 10 pages; Machine Learning and Systems 2020

arXiv:1811.02091 [pdf, other]

Simple, Distributed, and Accelerated Probabilistic Programming

Authors: Dustin Tran, Matthew Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, Alexey Radul, Matthew Johnson, Rif A. Saurous

Abstract: We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction---the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-… ▽ More We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction---the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-parallel autoregressive model (Image Transformer) with TPUv2s; and multi-GPU No-U-Turn Sampler (NUTS). For both a state-of-the-art VAE on 64x64 ImageNet and Image Transformer on 256x256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips. With NUTS, we see a 100x speedup on GPUs over Stan and 37x over PyMC3. △ Less

Submitted 28 November, 2018; v1 submitted 5 November, 2018; originally announced November 2018.

Comments: Appears in Neural Information Processing Systems, 2018. Code available at http://bit.ly/2JpFipt

arXiv:1704.04977 [pdf, other]

Probabilistic programs for inferring the goals of autonomous agents

Authors: Marco F. Cusumano-Towner, Alexey Radul, David Wingate, Vikash K. Mansinghka

Abstract: Intelligent systems sometimes need to infer the probable goals of people, cars, and robots, based on partial observations of their motion. This paper introduces a class of probabilistic programs for formulating and solving these problems. The formulation uses randomized path planning algorithms as the basis for probabilistic models of the process by which autonomous agents plan to achieve their go… ▽ More Intelligent systems sometimes need to infer the probable goals of people, cars, and robots, based on partial observations of their motion. This paper introduces a class of probabilistic programs for formulating and solving these problems. The formulation uses randomized path planning algorithms as the basis for probabilistic models of the process by which autonomous agents plan to achieve their goals. Because these path planning algorithms do not have tractable likelihood functions, new inference algorithms are needed. This paper proposes two Monte Carlo techniques for these "likelihood-free" models, one of which can use likelihood estimates from neural networks to accelerate inference. The paper demonstrates efficacy on three simple examples, each using under 50 lines of probabilistic code. △ Less

Submitted 18 April, 2017; v1 submitted 17 April, 2017; originally announced April 2017.

arXiv:1611.07051 [pdf, other]

Time Series Structure Discovery via Probabilistic Program Synthesis

Authors: Ulrich Schaechtle, Feras Saad, Alexey Radul, Vikash Mansinghka

Abstract: There is a widespread need for techniques that can discover structure from time series data. Recently introduced techniques such as Automatic Bayesian Covariance Discovery (ABCD) provide a way to find structure within a single time series by searching through a space of covariance kernels that is generated using a simple grammar. While ABCD can identify a broad class of temporal patterns, it is di… ▽ More There is a widespread need for techniques that can discover structure from time series data. Recently introduced techniques such as Automatic Bayesian Covariance Discovery (ABCD) provide a way to find structure within a single time series by searching through a space of covariance kernels that is generated using a simple grammar. While ABCD can identify a broad class of temporal patterns, it is difficult to extend and can be brittle in practice. This paper shows how to extend ABCD by formulating it in terms of probabilistic program synthesis. The key technical ideas are to (i) represent models using abstract syntax trees for a domain-specific probabilistic language, and (ii) represent the time series model prior, likelihood, and search strategy using probabilistic programs in a sufficiently expressive language. The final probabilistic program is written in under 70 lines of probabilistic code in Venture. The paper demonstrates an application to time series clustering that involves a non-parametric extension to ABCD, experiments for interpolation and extrapolation on real-world econometric data, and improvements in accuracy over both non-parametric and standard regression baselines. △ Less

Submitted 22 May, 2017; v1 submitted 21 November, 2016; originally announced November 2016.

Comments: The first two authors contributed equally to this work

arXiv:1610.00831 [pdf, ps, other]

Notes on Pure Dataflow Matrix Machines: Programming with Self-referential Matrix Transformations

Authors: Michael Bukatin, Steve Matthews, Andrey Radul

Abstract: Dataflow matrix machines are self-referential generalized recurrent neural nets. The self-referential mechanism is provided via a stream of matrices defining the connectivity and weights of the network in question. A natural question is: what should play the role of untyped lambda-calculus for this programming architecture? The proposed answer is a discipline of programming with only one kind of s… ▽ More Dataflow matrix machines are self-referential generalized recurrent neural nets. The self-referential mechanism is provided via a stream of matrices defining the connectivity and weights of the network in question. A natural question is: what should play the role of untyped lambda-calculus for this programming architecture? The proposed answer is a discipline of programming with only one kind of streams, namely the streams of appropriately shaped matrices. This yields Pure Dataflow Matrix Machines which are networks of transformers of streams of matrices capable of defining a pure dataflow matrix machine. △ Less

Submitted 2 November, 2018; v1 submitted 3 October, 2016; originally announced October 2016.

Comments: 7 pages (v3 - update page 7)

arXiv:1606.09470 [pdf, ps, other]

Programming Patterns in Dataflow Matrix Machines and Generalized Recurrent Neural Nets

Authors: Michael Bukatin, Steve Matthews, Andrey Radul

Abstract: Dataflow matrix machines arise naturally in the context of synchronous dataflow programming with linear streams. They can be viewed as a rather powerful generalization of recurrent neural networks. Similarly to recurrent neural networks, large classes of dataflow matrix machines are described by matrices of numbers, and therefore dataflow matrix machines can be synthesized by computing their matri… ▽ More Dataflow matrix machines arise naturally in the context of synchronous dataflow programming with linear streams. They can be viewed as a rather powerful generalization of recurrent neural networks. Similarly to recurrent neural networks, large classes of dataflow matrix machines are described by matrices of numbers, and therefore dataflow matrix machines can be synthesized by computing their matrices. At the same time, the evidence is fairly strong that dataflow matrix machines have sufficient expressive power to be a convenient general-purpose programming platform. Because of the network nature of this platform, programming patterns often correspond to patterns of connectivity in the generalized recurrent neural networks understood as programs. This paper explores a variety of such programming patterns. △ Less

Submitted 3 August, 2018; v1 submitted 30 June, 2016; originally announced June 2016.

Comments: 13 pages (v2 - update references)

arXiv:1605.05296 [pdf, ps, other]

Dataflow matrix machines as programmable, dynamically expandable, self-referential generalized recurrent neural networks

Authors: Michael Bukatin, Steve Matthews, Andrey Radul

Abstract: Dataflow matrix machines are a powerful generalization of recurrent neural networks. They work with multiple types of linear streams and multiple types of neurons, including higher-order neurons which dynamically update the matrix describing weights and topology of the network in question while the network is running. It seems that the power of dataflow matrix machines is sufficient for them to be… ▽ More Dataflow matrix machines are a powerful generalization of recurrent neural networks. They work with multiple types of linear streams and multiple types of neurons, including higher-order neurons which dynamically update the matrix describing weights and topology of the network in question while the network is running. It seems that the power of dataflow matrix machines is sufficient for them to be a convenient general purpose programming platform. This paper explores a number of useful programming idioms and constructions arising in this context. △ Less

Submitted 20 June, 2018; v1 submitted 17 May, 2016; originally announced May 2016.

Comments: 9 pages (v2 - update references)

arXiv:1603.09002 [pdf, ps, other]

Dataflow Matrix Machines as a Generalization of Recurrent Neural Networks

Authors: Michael Bukatin, Steve Matthews, Andrey Radul

Abstract: Dataflow matrix machines are a powerful generalization of recurrent neural networks. They work with multiple types of arbitrary linear streams, multiple types of powerful neurons, and allow to incorporate higher-order constructions. We expect them to be useful in machine learning and probabilistic programming, and in the synthesis of dynamic systems and of deterministic and probabilistic programs. Dataflow matrix machines are a powerful generalization of recurrent neural networks. They work with multiple types of arbitrary linear streams, multiple types of powerful neurons, and allow to incorporate higher-order constructions. We expect them to be useful in machine learning and probabilistic programming, and in the synthesis of dynamic systems and of deterministic and probabilistic programs. △ Less

Submitted 28 May, 2018; v1 submitted 29 March, 2016; originally announced March 2016.

Comments: 4 pages position paper (v2 - update references)

arXiv:1512.05665 [pdf, other]

Probabilistic Programming with Gaussian Process Memoization

Authors: Ulrich Schaechtle, Ben Zinberg, Alexey Radul, Kostas Stathis, Vikash K. Mansinghka

Abstract: Gaussian Processes (GPs) are widely used tools in statistics, machine learning, robotics, computer vision, and scientific computation. However, despite their popularity, they can be difficult to apply; all but the simplest classification or regression applications require specification and inference over complex covariance functions that do not admit simple analytical posteriors. This paper shows… ▽ More Gaussian Processes (GPs) are widely used tools in statistics, machine learning, robotics, computer vision, and scientific computation. However, despite their popularity, they can be difficult to apply; all but the simplest classification or regression applications require specification and inference over complex covariance functions that do not admit simple analytical posteriors. This paper shows how to embed Gaussian processes in any higher-order probabilistic programming language, using an idiom based on memoization, and demonstrates its utility by implementing and extending classic and state-of-the-art GP applications. The interface to Gaussian processes, called gpmem, takes an arbitrary real-valued computational process as input and returns a statistical emulator that automatically improve as the original process is invoked and its input-output behavior is recorded. The flexibility of gpmem is illustrated via three applications: (i) robust GP regression with hierarchical hyper-parameter learning, (ii) discovering symbolic expressions from time-series data by fully Bayesian structure learning over kernels generated by a stochastic grammar, and (iii) a bandit formulation of Bayesian optimization with automatic inference and action selection. All applications share a single 50-line Python library and require fewer than 20 lines of probabilistic code each. △ Less

Submitted 5 January, 2016; v1 submitted 17 December, 2015; originally announced December 2015.

Comments: 36 pages, 9 figures

arXiv:1502.05767 [pdf, ps, other]

Automatic differentiation in machine learning: a survey

Authors: Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, Jeffrey Mark Siskind

Abstract: Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established… ▽ More Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences, and engineering design optimization. Until very recently, the fields of machine learning and AD have largely been unaware of each other and, in some cases, have independently discovered each other's results. Despite its relevance, general-purpose AD has been missing from the machine learning toolbox, a situation slowly changing with its ongoing adoption under the names "dynamic computational graphs" and "differentiable programming". We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. By precisely defining the main differentiation techniques and their interrelationships, we aim to bring clarity to the usage of the terms "autodiff", "automatic differentiation", and "symbolic differentiation" as these are encountered more and more in machine learning settings. △ Less

Submitted 5 February, 2018; v1 submitted 19 February, 2015; originally announced February 2015.

Comments: 43 pages, 5 figures

MSC Class: 68W30; 65D25; 68T05 ACM Class: G.1.4; I.2.6

Journal ref: Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey. The Journal of Machine Learning Research, 18(153):1--43, 2018

arXiv:1211.4892 [pdf, ps, other]

doi 10.1017/S095679681900008X

Confusion of Tagged Perturbations in Forward Automatic Differentiation of Higher-Order Functions

Authors: Oleksandr Manzyuk, Barak A. Pearlmutter, Alexey Andreyevich Radul, David R. Rush, Jeffrey Mark Siskind

Abstract: Forward Automatic Differentiation (AD) is a technique for augmenting programs to compute derivatives. The essence of Forward AD is to attach perturbations to each number, and propagate these through the computation. When derivatives are nested, the distinct derivative calculations, and their associated perturbations, must be distinguished. This is typically accomplished by creating a unique tag fo… ▽ More Forward Automatic Differentiation (AD) is a technique for augmenting programs to compute derivatives. The essence of Forward AD is to attach perturbations to each number, and propagate these through the computation. When derivatives are nested, the distinct derivative calculations, and their associated perturbations, must be distinguished. This is typically accomplished by creating a unique tag for each derivative calculation, tagging the perturbations, and overloading the arithmetic operators. We exhibit a subtle bug, present in fielded implementations, in which perturbations are confused despite the tagging machinery. The essence of the bug is this: each invocation of a derivative creates a unique tag but a unique tag is needed for each derivative calculation. When taking derivatives of higher-order functions, these need not correspond! The derivative of a higher-order function $f$ that returns a function $g$ will be a function $f'$ that returns a function $\bar{g}$ that performs a derivative calculation. A single invocation of $f'$ will create a single fresh tag but that same tag will be used for each derivative calculation resulting from an invocation of $\bar{g}$. This situation arises when taking derivatives of curried functions. Two potential solutions are presented, and their serious deficiencies discussed. One requires eta expansion to delay the creation of fresh tags from the invocation of $f'$ to the invocation of $\bar{g}$, which can be difficult or even impossible in some circumstances. The other requires $f'$ to wrap $\bar{g}$ with tag renaming, which is difficult to implement without violating the desirable complexity properties of forward AD. △ Less

Submitted 29 June, 2019; v1 submitted 20 November, 2012; originally announced November 2012.

arXiv:1203.1450 [pdf, ps, other]

doi 10.1007/978-3-642-30023-3_25

AD in Fortran, Part 2: Implementation via Prepreprocessor

Authors: Alexey Radul, Barak A. Pearlmutter, Jeffrey Mark Siskind

Abstract: We describe an implementation of the Farfel Fortran AD extensions. These extensions integrate forward and reverse AD directly into the programming model, with attendant benefits to flexibility, modularity, and ease of use. The implementation we describe is a "prepreprocessor" that generates input to existing Fortran-based AD tools. In essence, blocks of code which are targeted for AD by Farfel con… ▽ More We describe an implementation of the Farfel Fortran AD extensions. These extensions integrate forward and reverse AD directly into the programming model, with attendant benefits to flexibility, modularity, and ease of use. The implementation we describe is a "prepreprocessor" that generates input to existing Fortran-based AD tools. In essence, blocks of code which are targeted for AD by Farfel constructs are put into subprograms which capture their lexical variable context, and these are closure-converted into top-level subprograms and specialized to eliminate EXTERNAL arguments, rendering them amenable to existing AD preprocessors, which are then invoked, possibly repeatedly if the AD is nested. △ Less

Submitted 8 March, 2012; v1 submitted 7 March, 2012; originally announced March 2012.

Journal ref: Recent Advances in Algorithmic Differentiation, Springer Lecture Notes in Computational Science and Engineering volume 87, 2012, ISBN 978-3-642-30022-6, pages 273-284

arXiv:1203.1448 [pdf, ps, other]

AD in Fortran, Part 1: Design

Authors: Alexey Radul, Barak A. Pearlmutter, Jeffrey Mark Siskind

Abstract: We propose extensions to Fortran which integrate forward and reverse Automatic Differentiation (AD) directly into the programming model. Irrespective of implementation technology, embedding AD constructs directly into the language extends the reach and convenience of AD while allowing abstraction of concepts of interest to scientific-computing practice, such as root finding, optimization, and find… ▽ More We propose extensions to Fortran which integrate forward and reverse Automatic Differentiation (AD) directly into the programming model. Irrespective of implementation technology, embedding AD constructs directly into the language extends the reach and convenience of AD while allowing abstraction of concepts of interest to scientific-computing practice, such as root finding, optimization, and finding equilibria of continuous games. Multiple different subprograms for these tasks can share common interfaces, regardless of whether and how they use AD internally. A programmer can maximize a function F by calling a library maximizer, XSTAR=ARGMAX(F,X0), which internally constructs derivatives of F by AD, without having to learn how to use any particular AD tool. We illustrate the utility of these extensions by example: programs become much more concise and closer to traditional mathematical notation. A companion paper describes how these extensions can be implemented by a program that generates input to existing Fortran-based AD tools. △ Less

Submitted 8 March, 2012; v1 submitted 7 March, 2012; originally announced March 2012.

arXiv:1110.1556 [pdf, other]

Jewish Problems

Authors: Tanya Khovanova, Alexey Radul

Abstract: This is a special collection of problems that were given to select applicants during oral entrance exams to the math department of Moscow State University. These problems were designed to prevent Jews and other undesirables from getting a passing grade. Among problems that were used by the department to blackball unwanted candidate students, these problems are distinguished by having a simple solu… ▽ More This is a special collection of problems that were given to select applicants during oral entrance exams to the math department of Moscow State University. These problems were designed to prevent Jews and other undesirables from getting a passing grade. Among problems that were used by the department to blackball unwanted candidate students, these problems are distinguished by having a simple solution that is difficult to find. Using problems with a simple solution protected the administration from extra complaints and appeals. This collection therefore has mathematical as well as historical value. △ Less

Submitted 15 October, 2011; v1 submitted 7 October, 2011; originally announced October 2011.

Comments: 21 pages, 14 figures

Journal ref: published as "KIller Problems" in The American Matheamtical Monthly Vol. 119, No. 10 (2012), pp815-82

arXiv:1003.3406 [pdf, ps, other]

Baron Munchhausen's Sequence

Authors: Tanya Khovanova, Konstantin Knop, Alexey Radul

Abstract: We investigate a coin-weighing puzzle that appeared in the all-Russian math Olympiad in 2000. We liked the puzzle because the methods of analysis differ from classical coin-weighing puzzles. We generalize the puzzle by varying the number of participating coins, and deduce a complete solution, perhaps surprisingly, the objective can be achieved in no more than two weighings regardless of the number… ▽ More We investigate a coin-weighing puzzle that appeared in the all-Russian math Olympiad in 2000. We liked the puzzle because the methods of analysis differ from classical coin-weighing puzzles. We generalize the puzzle by varying the number of participating coins, and deduce a complete solution, perhaps surprisingly, the objective can be achieved in no more than two weighings regardless of the number of coins involved. △ Less

Submitted 17 March, 2010; originally announced March 2010.

Comments: 26 pages

MSC Class: 11B99; 00A08; 00A08

Journal ref: Journal of Integer Sequences, v.13 (2010), Article 10.8.7

arXiv:hep-th/9512150 [pdf, ps, other]

Representation theory of the vertex algebra $W_{1 + \infty}$

Authors: Victor Kac, Andrey Radul

Abstract: In our paper~\cite{KR} we began a systematic study of representations of the universal central extension $\widehat{\Cal D}\/$ of the Lie algebra of differential operators on the circle. This study was continued in the paper~\cite{FKRW} in the framework of vertex algebra theory. It was shown that the associated to $\widehat {\Cal D}\/$ simple vertex algebra $W_{1+ \infty, N}\/$ with positive inte… ▽ More In our paper~\cite{KR} we began a systematic study of representations of the universal central extension $\widehat{\Cal D}\/$ of the Lie algebra of differential operators on the circle. This study was continued in the paper~\cite{FKRW} in the framework of vertex algebra theory. It was shown that the associated to $\widehat {\Cal D}\/$ simple vertex algebra $W_{1+ \infty, N}\/$ with positive integral central charge $N\/$ is isomorphic to the classical vertex algebra $W (gl_N)$, which led to a classification of modules over $W_{1 + \infty, N}$. In the present paper we study the remaining non-trivial case, that of a negative central charge $-N$. The basic tool is the decomposition of $N\/$ pairs of free charged bosons with respect to $gl_N\/$ and the commuting with $gl_N\/$ Lie algebra of infinite matrices $\widehat{gl}$. △ Less

Submitted 18 December, 1995; originally announced December 1995.

Comments: 26 pages, AMS-TeX, all macros included

arXiv:hep-th/9405121 [pdf, ps, other]

doi 10.1007/BF02108332

W_{1+\infty} and W(gl_N) with central charge N

Authors: E. Frenkel, V. Kac, A. Radul, W. Wang

Abstract: We study representations of the central extension of the Lie algebra of differential operators on the circle, the W-infinity algebra. We obtain complete and specialized character formulas for a large class of representations, which we call primitive; these include all quasi-finite irreducible unitary representations. We show that any primitive representation with central charge N has a canonical… ▽ More We study representations of the central extension of the Lie algebra of differential operators on the circle, the W-infinity algebra. We obtain complete and specialized character formulas for a large class of representations, which we call primitive; these include all quasi-finite irreducible unitary representations. We show that any primitive representation with central charge N has a canonical structure of an irreducible representation of the W-algebra W(gl_N) with the same central charge and that all irreducible representations of W(gl_N) with central charge N arise in this way. We also establish a duality between "integral" modules of W(gl_N) and finite-dimensional irreducible modules of gl_N, and conjecture their fusion rules. △ Less

Submitted 3 October, 1994; v1 submitted 18 May, 1994; originally announced May 1994.

Comments: 29 pages, Latex, uses file amssym.def (a few remarks added, typos corrected)

Journal ref: Commun.Math.Phys. 170 (1995) 337-358

arXiv:hep-th/9308153 [pdf, ps, other]

doi 10.1007/BF02096878

Quasifinite highest weight modules over the Lie algebra of differential operators on the circle

Authors: Victor G. Kac, A. Radul

Abstract: We classify positive energy representations with finite degeneracies of the Lie algebra $W_{1+\infty}\/$ and construct them in terms of representation theory of the Lie algebra $\hatgl ( \infty R_m )\/$ of infinite matrices with finite number of non-zero diagonals over the algebra $R_m = \C [ t ] / ( t^{m + 1} )\/$. The unitary ones are classified as well. Similar results are obtained for the s… ▽ More We classify positive energy representations with finite degeneracies of the Lie algebra $W_{1+\infty}\/$ and construct them in terms of representation theory of the Lie algebra $\hatgl ( \infty R_m )\/$ of infinite matrices with finite number of non-zero diagonals over the algebra $R_m = \C [ t ] / ( t^{m + 1} )\/$. The unitary ones are classified as well. Similar results are obtained for the sin-algebras. △ Less

Submitted 31 August, 1993; originally announced August 1993.

Journal ref: Commun.Math.Phys. 157 (1993) 429-457

Showing 1–24 of 24 results for author: Radul, A