-
Deep Learning with Parametric Lenses
Authors:
Geoffrey S. H. Cruttwell,
Bruno Gavranovic,
Neil Ghani,
Paul Wilson,
Fabio Zanasi
Abstract:
We propose a categorical semantics for machine learning algorithms in terms of lenses, parametric maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a variety of loss functions such as MSE and Softmax cross-entropy, and diffe…
▽ More
We propose a categorical semantics for machine learning algorithms in terms of lenses, parametric maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a variety of loss functions such as MSE and Softmax cross-entropy, and different architectures, shedding new light on their similarities and differences. Furthermore, our approach to learning has examples generalising beyond the familiar continuous domains (modelled in categories of smooth maps) and can be realised in the discrete setting of Boolean and polynomial circuits. We demonstrate the practical significance of our framework with an implementation in Python.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
On a fibrational construction for optics, lenses, and Dialectica categories
Authors:
Matteo Capucci,
Bruno Gavranović,
Abdullah Malik,
Francisco Rios,
Jonathan Weinberger
Abstract:
Categories of lenses/optics and Dialectica categories are both comprised of bidirectional morphisms of basically the same form. In this work we show how they can be considered a special case of an overarching fibrational construction, generalizing Hofstra's construction of Dialectica fibrations and Spivak's construction of generalized lenses. This construction turns a tower of Grothendieck fibrati…
▽ More
Categories of lenses/optics and Dialectica categories are both comprised of bidirectional morphisms of basically the same form. In this work we show how they can be considered a special case of an overarching fibrational construction, generalizing Hofstra's construction of Dialectica fibrations and Spivak's construction of generalized lenses. This construction turns a tower of Grothendieck fibrations into another tower of fibrations by iteratively twisting each of the components, using the opposite fibration construction.
△ Less
Submitted 12 June, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
Fundamental Components of Deep Learning: A category-theoretic approach
Authors:
Bruno Gavranović
Abstract:
Deep learning, despite its remarkable achievements, is still a young field. Like the early stages of many scientific disciplines, it is marked by the discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform and compositional mathematical foundation. From the intricacies of the implementation of backpropagation, through a growing zoo of neural network architectures, to the new…
▽ More
Deep learning, despite its remarkable achievements, is still a young field. Like the early stages of many scientific disciplines, it is marked by the discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform and compositional mathematical foundation. From the intricacies of the implementation of backpropagation, through a growing zoo of neural network architectures, to the new and poorly understood phenomena such as double descent, scaling laws or in-context learning, there are few unifying principles in deep learning. This thesis develops a novel mathematical foundation for deep learning based on the language of category theory. We develop a new framework that is a) end-to-end, b) unform, and c) not merely descriptive, but prescriptive, meaning it is amenable to direct implementation in programming languages with sufficient features. We also systematise many existing approaches, placing many existing constructions and concepts from the literature under the same umbrella. In Part I we identify and model two main properties of deep learning systems parametricity and bidirectionality by we expand on the previously defined construction of actegories and Para to study the former, and define weighted optics to study the latter. Combining them yields parametric weighted optics, a categorical model of artificial neural networks, and more. Part II justifies the abstractions from Part I, applying them to model backpropagation, architectures, and supervised learning. We provide a lens-theoretic axiomatisation of differentiation, covering not just smooth spaces, but discrete settings of boolean circuits as well. We survey existing, and develop new categorical models of neural network architectures. We formalise the notion of optimisers and lastly, combine all the existing concepts together, providing a uniform and compositional framework for supervised learning.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
Authors:
Bruno Gavranović,
Paul Lessard,
Andrew Dudzik,
Tamara von Glehn,
João G. M. Araújo,
Petar Veličković
Abstract:
We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory -- precisely, the univers…
▽ More
We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory -- precisely, the universal algebra of monads valued in a 2-category of parametric maps -- as a single theory elegantly subsuming both of these flavours of neural network design. To defend our position, we show how this theory recovers constraints induced by geometric deep learning, as well as implementations of many architectures drawn from the diverse landscape of neural networks, such as RNNs. We also illustrate how the theory naturally encodes many standard constructs in computer science and automata theory.
△ Less
Submitted 5 June, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
Graph Convolutional Neural Networks as Parametric CoKleisli morphisms
Authors:
Bruno Gavranović,
Mattia Villani
Abstract:
We define the bicategory of Graph Convolutional Neural Networks $\mathbf{GCNN}_n$ for an arbitrary graph with $n$ nodes. We show it can be factored through the already existing categorical constructions for deep learning called $\mathbf{Para}$ and $\mathbf{Lens}$ with the base category set to the CoKleisli category of the product comonad. We prove that there exists an injective-on-objects, faithfu…
▽ More
We define the bicategory of Graph Convolutional Neural Networks $\mathbf{GCNN}_n$ for an arbitrary graph with $n$ nodes. We show it can be factored through the already existing categorical constructions for deep learning called $\mathbf{Para}$ and $\mathbf{Lens}$ with the base category set to the CoKleisli category of the product comonad. We prove that there exists an injective-on-objects, faithful 2-functor $\mathbf{GCNN}_n \to \mathbf{Para}(\mathsf{CoKl}(\mathbb{R}^{n \times n} \times -))$. We show that this construction allows us to treat the adjacency matrix of a GCNN as a global parameter instead of a a local, layer-wise one. This gives us a high-level categorical characterisation of a particular kind of inductive bias GCNNs possess. Lastly, we hypothesize about possible generalisations of GCNNs to general message-passing graph neural networks, connections to equivariant learning, and the (lack of) functoriality of activation functions.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Space-time tradeoffs of lenses and optics via higher category theory
Authors:
Bruno Gavranović
Abstract:
Optics and lenses are abstract categorical gadgets that model systems with bidirectional data flow. In this paper we observe that the denotational definition of optics - identifying two optics as equivalent by observing their behaviour from the outside - is not suitable for operational, software oriented approaches where optics are not merely observed, but built with their internal setups in mind.…
▽ More
Optics and lenses are abstract categorical gadgets that model systems with bidirectional data flow. In this paper we observe that the denotational definition of optics - identifying two optics as equivalent by observing their behaviour from the outside - is not suitable for operational, software oriented approaches where optics are not merely observed, but built with their internal setups in mind. We identify operational differences between denotationally isomorphic categories of cartesian optics and lenses: their different composition rule and corresponding space-time tradeoffs, positioning them at two opposite ends of a spectrum. With these motivations we lift the existing categorical constructions and their relationships to the 2-categorical level, showing that the relevant operational concerns become visible. We define the 2-category $\textbf{2-Optic}(\mathcal{C})$ whose 2-cells explicitly track optics' internal configuration. We show that the 1-category $\textbf{Optic}(\mathcal{C})$ arises by locally quotienting out the connected components of this 2-category. We show that the embedding of lenses into cartesian optics gets weakened from a functor to an oplax functor whose oplaxator now detects the different composition rule. We determine the difficulties in showing this functor forms a part of an adjunction in any of the standard 2-categories. We establish a conjecture that the well-known isomorphism between cartesian lenses and optics arises out of the lax 2-adjunction between their double-categorical counterparts. In addition to presenting new research, this paper is also meant to be an accessible introduction to the topic.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Actegories for the Working Amthematician
Authors:
Matteo Capucci,
Bruno Gavranović
Abstract:
Actions of monoidal categories on categories, also known as actegories, have been familiar to category theorists for a long time, and yet a comprehensive overview of this topic seems to be missing from the literature. Recently, actegories have been increasingly employed in applied category theory, thereby encouraging an effort to fill this gap according to the new needs of these applications. This…
▽ More
Actions of monoidal categories on categories, also known as actegories, have been familiar to category theorists for a long time, and yet a comprehensive overview of this topic seems to be missing from the literature. Recently, actegories have been increasingly employed in applied category theory, thereby encouraging an effort to fill this gap according to the new needs of these applications. This work started as an investigation of the notion of monoidal actegory, a compatible pair of monoidal and actegorical structures, and ended up including a sizable reference on the elementary theory of actegories. We cover basic definitions and results on actegories and biactegories, spelling out explicitly many folkloric definitions, including their tensor product and their hom-tensor adjunction. We give new definitions of actegories with monoidal, braided monoidal and symmetric monoidal structure. In the last section, we provide three Cayley-like classification results for these structures.
△ Less
Submitted 11 December, 2023; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Fibre optics
Authors:
Dylan Braithwaite,
Matteo Capucci,
Bruno Gavranović,
Jules Hedges,
Eigil Fjeldgren Rischel
Abstract:
Lenses, optics and dependent lenses (or equivalently morphisms of containers, or equivalently natural transformations of polynomial functors) are all widely used in applied category theory as models of bidirectional processes. From the definition of lenses over a finite product category, optics weaken the required structure to actions of monoidal categories, and dependent lenses make use of the ad…
▽ More
Lenses, optics and dependent lenses (or equivalently morphisms of containers, or equivalently natural transformations of polynomial functors) are all widely used in applied category theory as models of bidirectional processes. From the definition of lenses over a finite product category, optics weaken the required structure to actions of monoidal categories, and dependent lenses make use of the additional property of finite completeness (or, in case of polynomials, even local cartesian closure). This has caused a split in the applied category theory literature between those using optics and those using dependent lenses. The goal of this paper is to unify optics with dependent lenses, by finding a definition of fibre optics admitting both as special cases.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Category Theory in Machine Learning
Authors:
Dan Shiebler,
Bruno Gavranović,
Paul Wilson
Abstract:
Over the past two decades machine learning has permeated almost every realm of technology. At the same time, many researchers have begun using category theory as a unifying language, facilitating communication between different scientific disciplines. It is therefore unsurprising that there is a burgeoning interest in applying category theory to machine learning. We aim to document the motivations…
▽ More
Over the past two decades machine learning has permeated almost every realm of technology. At the same time, many researchers have begun using category theory as a unifying language, facilitating communication between different scientific disciplines. It is therefore unsurprising that there is a burgeoning interest in applying category theory to machine learning. We aim to document the motivations, goals and common themes across these applications. We touch on gradient-based learning, probability, and equivariant learning.
△ Less
Submitted 13 June, 2021;
originally announced June 2021.
-
Towards Foundations of Categorical Cybernetics
Authors:
Matteo Capucci,
Bruno Gavranović,
Jules Hedges,
Eigil Fjeldgren Rischel
Abstract:
We propose a categorical framework for processes which interact bidirectionally with both an environment and a 'controller'. Examples include open learners, in which the controller is an optimiser such as gradient descent, and an approach to compositional game theory closely related to open games, in which the controller is a composite of game-theoretic agents. We believe that 'cybernetic' is an a…
▽ More
We propose a categorical framework for processes which interact bidirectionally with both an environment and a 'controller'. Examples include open learners, in which the controller is an optimiser such as gradient descent, and an approach to compositional game theory closely related to open games, in which the controller is a composite of game-theoretic agents. We believe that 'cybernetic' is an appropriate name for the processes that can be described in this framework.
△ Less
Submitted 3 November, 2022; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Categorical Foundations of Gradient-Based Learning
Authors:
G. S. H. Cruttwell,
Bruno Gavranović,
Neil Ghani,
Paul Wilson,
Fabio Zanasi
Abstract:
We propose a categorical semantics of gradient-based machine learning algorithms in terms of lenses, parametrised maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a variety of loss functions such as as MSE and Softmax cross…
▽ More
We propose a categorical semantics of gradient-based machine learning algorithms in terms of lenses, parametrised maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a variety of loss functions such as as MSE and Softmax cross-entropy, shedding new light on their similarities and differences. Our approach to gradient-based learning has examples generalising beyond the familiar continuous domains (modelled in categories of smooth maps) and can be realized in the discrete setting of boolean circuits. Finally, we demonstrate the practical significance of our framework with an implementation in Python.
△ Less
Submitted 13 July, 2021; v1 submitted 2 March, 2021;
originally announced March 2021.
-
Compositional Game Theory, Compositionally
Authors:
Robert Atkey,
Bruno Gavranović,
Neil Ghani,
Clemens Kupke,
Jérémy Ledent,
Fredrik Nordvall Forsberg
Abstract:
We present a new compositional approach to compositional game theory (CGT) based upon Arrows, a concept originally from functional programming, closely related to Tambara modules, and operators to build new Arrows from old. We model equilibria as a bimodule over an Arrow and define an operator to build a new Arrow from such a bimodule over an existing Arrow. We also model strategies as graded Ar…
▽ More
We present a new compositional approach to compositional game theory (CGT) based upon Arrows, a concept originally from functional programming, closely related to Tambara modules, and operators to build new Arrows from old. We model equilibria as a bimodule over an Arrow and define an operator to build a new Arrow from such a bimodule over an existing Arrow. We also model strategies as graded Arrows and define an operator which builds a new Arrow by taking the colimit of a graded Arrow. A final operator builds a graded Arrow from a graded bimodule. We use this compositional approach to CGT to show how known and previously unknown variants of open games can be proven to form symmetric monoidal categories.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.
-
Learning Functors using Gradient Descent
Authors:
Bruno Gavranović
Abstract:
Neural networks are a general framework for differentiable optimization which includes many other machine learning approaches as special cases. In this paper we build a category-theoretic formalism around a neural network system called CycleGAN. CycleGAN is a general approach to unpaired image-to-image translation that has been getting attention in the recent years. Inspired by categorical databas…
▽ More
Neural networks are a general framework for differentiable optimization which includes many other machine learning approaches as special cases. In this paper we build a category-theoretic formalism around a neural network system called CycleGAN. CycleGAN is a general approach to unpaired image-to-image translation that has been getting attention in the recent years. Inspired by categorical database systems, we show that CycleGAN is a "schema", i.e. a specific category presented by generators and relations, whose specific parameter instantiations are just set-valued functors on this schema. We show that enforcing cycle-consistencies amounts to enforcing composition invariants in this category. We generalize the learning procedure to arbitrary such categories and show a special class of functors, rather than functions, can be learned using gradient descent. Using this framework we design a novel neural network system capable of learning to insert and delete objects from images without paired data. We qualitatively evaluate the system on the CelebA dataset and obtain promising results.
△ Less
Submitted 14 September, 2020;
originally announced September 2020.
-
Compositional Deep Learning
Authors:
Bruno Gavranović
Abstract:
Neural networks have become an increasingly popular tool for solving many real-world problems. They are a general framework for differentiable optimization which includes many other machine learning approaches as special cases. In this thesis we build a category-theoretic formalism around a class of neural networks exemplified by CycleGAN. CycleGAN is a collection of neural networks, closed under…
▽ More
Neural networks have become an increasingly popular tool for solving many real-world problems. They are a general framework for differentiable optimization which includes many other machine learning approaches as special cases. In this thesis we build a category-theoretic formalism around a class of neural networks exemplified by CycleGAN. CycleGAN is a collection of neural networks, closed under composition, whose inductive bias is increased by enforcing composition invariants, i.e. cycle-consistencies. Inspired by Functorial Data Migration, we specify the interconnection of these networks using a categorical schema, and network instances as set-valued functors on this schema. We also frame neural network architectures, datasets, models, and a number of other concepts in a categorical setting and thus show a special class of functors, rather than functions, can be learned using gradient descent. We use the category-theoretic framework to conceive a novel neural network architecture whose goal is to learn the task of object insertion and object deletion in images with unpaired data. We test the architecture on three different datasets and obtain promising results.
△ Less
Submitted 16 July, 2019;
originally announced July 2019.