-
Graph Expansions of Deep Neural Networks and their Universal Scaling Limits
Authors:
Nicola Muca Cirone,
Jad Hamdan,
Cristopher Salvi
Abstract:
We present a unified approach to obtain scaling limits of neural networks using the genus expansion technique from random matrix theory. This approach begins with a novel expansion of neural networks which is reminiscent of Butcher series for ODEs, and is obtained through a generalisation of Faà di Bruno's formula to an arbitrary number of compositions. In this expansion, the role of monomials is…
▽ More
We present a unified approach to obtain scaling limits of neural networks using the genus expansion technique from random matrix theory. This approach begins with a novel expansion of neural networks which is reminiscent of Butcher series for ODEs, and is obtained through a generalisation of Faà di Bruno's formula to an arbitrary number of compositions. In this expansion, the role of monomials is played by random multilinear maps indexed by directed graphs whose edges correspond to random matrices, which we call operator graphs. This expansion linearises the effect of the activation functions, allowing for the direct application of Wick's principle to compute the expectation of each of its terms. We then determine the leading contribution to each term by embedding the corresponding graphs onto surfaces, and computing their Euler characteristic. Furthermore, by develo** a correspondence between analytic and graphical operations, we obtain similar graph expansions for the neural tangent kernel as well as the input-output Jacobian of the original neural network, and derive their infinite-width limits with relative ease. Notably, we find explicit formulae for the moments of the limiting singular value distribution of the Jacobian. We then show that all of these results hold for networks with more general weights, such as general matrices with i.i.d. entries satisfying moment assumptions, complex matrices and sparse matrices.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
SigDiffusions: Score-Based Diffusion Models for Long Time Series via Log-Signature Embeddings
Authors:
Barbora Barancikova,
Zhuoyue Huang,
Cristopher Salvi
Abstract:
Score-based diffusion models have recently emerged as state-of-the-art generative models for a variety of data modalities. Nonetheless, it remains unclear how to adapt these models to generate long multivariate time series. Viewing a time series as the discretization of an underlying continuous process, we introduce SigDiffusion, a novel diffusion model operating on log-signature embeddings of the…
▽ More
Score-based diffusion models have recently emerged as state-of-the-art generative models for a variety of data modalities. Nonetheless, it remains unclear how to adapt these models to generate long multivariate time series. Viewing a time series as the discretization of an underlying continuous process, we introduce SigDiffusion, a novel diffusion model operating on log-signature embeddings of the data. The forward and backward processes gradually perturb and denoise log-signatures preserving their algebraic structure. To recover a signal from its log-signature, we provide new closed-form inversion formulae expressing the coefficients obtained by expanding the signal in a given basis (e.g. Fourier or orthogonal polynomials) as explicit polynomial functions of the log-signature. Finally, we show that combining SigDiffusion with these inversion formulae results in highly realistic time series generation, competitive with the current state-of-the-art on various datasets of synthetic and real-world examples.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Exact Gradients for Stochastic Spiking Neural Networks Driven by Rough Signals
Authors:
Christian Holberg,
Cristopher Salvi
Abstract:
We introduce a mathematically rigorous framework based on rough path theory to model stochastic spiking neural networks (SSNNs) as stochastic differential equations with event discontinuities (Event SDEs) and driven by càdlàg rough paths. Our formalism is general enough to allow for potential jumps to be present both in the solution trajectories as well as in the driving noise. We then identify a…
▽ More
We introduce a mathematically rigorous framework based on rough path theory to model stochastic spiking neural networks (SSNNs) as stochastic differential equations with event discontinuities (Event SDEs) and driven by càdlàg rough paths. Our formalism is general enough to allow for potential jumps to be present both in the solution trajectories as well as in the driving noise. We then identify a set of sufficient conditions ensuring the existence of pathwise gradients of solution trajectories and event times with respect to the network's parameters and show how these gradients satisfy a recursive relation. Furthermore, we introduce a general-purpose loss function defined by means of a new class of signature kernels indexed on càdlàg rough paths and use it to train SSNNs as generative models. We provide an end-to-end autodifferentiable solver for Event SDEs and make its implementation available as part of the $\texttt{diffrax}$ library. Our framework is, to our knowledge, the first enabling gradient-based training of SSNNs with noise affecting both the spike timing and the network's dynamics.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Lecture notes on rough paths and applications to machine learning
Authors:
Thomas Cass,
Cristopher Salvi
Abstract:
These notes expound the recent use of the signature transform and rough path theory in data science and machine learning. We develop the core theory of the signature from first principles and then survey some recent popular applications of this approach, including signature-based kernel methods and neural rough differential equations. The notes are based on a course given by the two authors at Imp…
▽ More
These notes expound the recent use of the signature transform and rough path theory in data science and machine learning. We develop the core theory of the signature from first principles and then survey some recent popular applications of this approach, including signature-based kernel methods and neural rough differential equations. The notes are based on a course given by the two authors at Imperial College London.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
A path-dependent PDE solver based on signature kernels
Authors:
Alexandre Pannier,
Cristopher Salvi
Abstract:
We develop a provably convergent kernel-based solver for path-dependent PDEs (PPDEs). Our numerical scheme leverages signature kernels, a recently introduced class of kernels on path-space. Specifically, we solve an optimal recovery problem by approximating the solution of a PPDE with an element of minimal norm in the signature reproducing kernel Hilbert space (RKHS) constrained to satisfy the PPD…
▽ More
We develop a provably convergent kernel-based solver for path-dependent PDEs (PPDEs). Our numerical scheme leverages signature kernels, a recently introduced class of kernels on path-space. Specifically, we solve an optimal recovery problem by approximating the solution of a PPDE with an element of minimal norm in the signature reproducing kernel Hilbert space (RKHS) constrained to satisfy the PPDE at a finite collection of collocation paths. In the linear case, we show that the optimisation has a unique closed-form solution expressed in terms of signature kernel evaluations at the collocation paths. We prove consistency of the proposed scheme, guaranteeing convergence to the PPDE solution as the number of collocation points increases. Finally, several numerical examples are presented, in particular in the context of option pricing under rough volatility. Our numerical scheme constitutes a valid alternative to the ubiquitous Monte Carlo methods.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Theoretical Foundations of Deep Selective State-Space Models
Authors:
Nicola Muca Cirone,
Antonio Orvieto,
Benjamin Walker,
Cristopher Salvi,
Terry Lyons
Abstract:
Structured state-space models (SSMs) such as S4, stemming from the seminal work of Gu et al., are gaining popularity as effective approaches for modeling sequential data. Deep SSMs demonstrate outstanding performance across a diverse set of domains, at a reduced training and inference cost compared to attention-based transformers. Recent developments show that if the linear recurrence powering SSM…
▽ More
Structured state-space models (SSMs) such as S4, stemming from the seminal work of Gu et al., are gaining popularity as effective approaches for modeling sequential data. Deep SSMs demonstrate outstanding performance across a diverse set of domains, at a reduced training and inference cost compared to attention-based transformers. Recent developments show that if the linear recurrence powering SSMs allows for multiplicative interactions between inputs and hidden states (e.g. GateLoop, Mamba, GLA), then the resulting architecture can surpass in both in accuracy and efficiency attention-powered foundation models trained on text, at scales of billion parameters. In this paper, we give theoretical grounding to this recent finding using tools from Rough Path Theory: we show that when random linear recurrences are equipped with simple input-controlled transitions (selectivity mechanism), then the hidden state is provably a low-dimensional projection of a powerful mathematical object called the signature of the input -- capturing non-linear interactions between tokens at distinct timescales. Our theory not only motivates the success of modern selective state-space models such as Mamba but also provides a solid framework to understand the expressive power of future SSM variants.
△ Less
Submitted 4 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes
Authors:
Georg Manten,
Cecilia Casolo,
Emilio Ferrucci,
Søren Wengel Mogensen,
Cristopher Salvi,
Niki Kilbertus
Abstract:
Inferring the causal structure underlying stochastic dynamical systems from observational data holds great promise in domains ranging from science and health to finance. Such processes can often be accurately modeled via stochastic differential equations (SDEs), which naturally imply causal relationships via "which variables enter the differential of which other variables". In this paper, we devel…
▽ More
Inferring the causal structure underlying stochastic dynamical systems from observational data holds great promise in domains ranging from science and health to finance. Such processes can often be accurately modeled via stochastic differential equations (SDEs), which naturally imply causal relationships via "which variables enter the differential of which other variables". In this paper, we develop a kernel-based test of conditional independence (CI) on "path-space" -- e.g., solutions to SDEs, but applicable beyond that -- by leveraging recent advances in signature kernels. We demonstrate strictly superior performance of our proposed CI test compared to existing approaches on path-space and provide theoretical consistency results. Then, we develop constraint-based causal discovery algorithms for acyclic stochastic dynamical systems (allowing for self-loops) that leverage temporal information to recover the entire directed acyclic graph. Assuming faithfulness and a CI oracle, we show that our algorithms are sound and complete. We empirically verify that our developed CI test in conjunction with the causal discovery algorithms outperform baselines across a range of settings.
△ Less
Submitted 11 June, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Estimating the construct validity of Principal Components Analysis
Authors:
Thomas M. H. Hope,
Cathy J. Price,
Ajay Halai,
Carola Salvi,
Jenny Crinion,
Merel Keijsers,
Christoph Sperber,
Howard Bowman
Abstract:
In many scientific disciplines, the features of interest cannot be observed directly, so must instead be inferred from observed behaviour. Latent variable analyses are increasingly employed to systematise these inferences, and Principal Components Analysis (PCA) is perhaps the simplest and most popular of these methods. Here, we examine how the assumptions that we are prepared to entertain, about…
▽ More
In many scientific disciplines, the features of interest cannot be observed directly, so must instead be inferred from observed behaviour. Latent variable analyses are increasingly employed to systematise these inferences, and Principal Components Analysis (PCA) is perhaps the simplest and most popular of these methods. Here, we examine how the assumptions that we are prepared to entertain, about the latent variable system, mediate the likelihood that PCA-derived components will capture the true sources of variance underlying data. As expected, we find that this likelihood is excellent in the best case, and robust to empirically reasonable levels of measurement noise, but best-case performance is also: (a) not robust to violations of the method's more prominent assumptions, of linearity and orthogonality; and also (b) requires that other subtler assumptions be made, such as that the latent variables should have varying importance, and that weights relating latent variables to observed data have zero mean. Neither variance explained, nor replication in independent samples, could reliably predict which (if any) PCA-derived components will capture true sources of variance in data. We conclude by describing a procedure to fit these inferences more directly to empirical data, and use it to find that components derived via PCA from two different empirical neuropsychological datasets, are less likely to have meaningful referents in the brain than we hoped.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
A Neural RDE approach for continuous-time non-Markovian stochastic control problems
Authors:
Melker Hoglund,
Emilio Ferrucci,
Camilo Hernandez,
Aitor Muguruza Gonzalez,
Cristopher Salvi,
Leandro Sanchez-Betancourt,
Yufei Zhang
Abstract:
We propose a novel framework for solving continuous-time non-Markovian stochastic control problems by means of neural rough differential equations (Neural RDEs) introduced in Morrill et al. (2021). Non-Markovianity naturally arises in control problems due to the time delay effects in the system coefficients or the driving noises, which leads to optimal control strategies depending explicitly on th…
▽ More
We propose a novel framework for solving continuous-time non-Markovian stochastic control problems by means of neural rough differential equations (Neural RDEs) introduced in Morrill et al. (2021). Non-Markovianity naturally arises in control problems due to the time delay effects in the system coefficients or the driving noises, which leads to optimal control strategies depending explicitly on the historical trajectories of the system state. By modelling the control process as the solution of a Neural RDE driven by the state process, we show that the control-state joint dynamics are governed by an uncontrolled, augmented Neural RDE, allowing for fast Monte-Carlo estimation of the value function via trajectories simulation and memory-efficient backpropagation. We provide theoretical underpinnings for the proposed algorithmic framework by demonstrating that Neural RDEs serve as universal approximators for functions of random rough paths. Exhaustive numerical experiments on non-Markovian stochastic control problems are presented, which reveal that the proposed framework is time-resolution-invariant and achieves higher accuracy and better stability in irregular sampling compared to existing RNN-based approaches.
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
Non-adversarial training of Neural SDEs with signature kernel scores
Authors:
Zacharia Issa,
Blanka Horvath,
Maud Lemercier,
Cristopher Salvi
Abstract:
Neural SDEs are continuous-time generative models for sequential data. State-of-the-art performance for irregular time series generation has been previously obtained by training these models adversarially as GANs. However, as typical for GAN architectures, training is notoriously unstable, often suffers from mode collapse, and requires specialised techniques such as weight clip** and gradient pe…
▽ More
Neural SDEs are continuous-time generative models for sequential data. State-of-the-art performance for irregular time series generation has been previously obtained by training these models adversarially as GANs. However, as typical for GAN architectures, training is notoriously unstable, often suffers from mode collapse, and requires specialised techniques such as weight clip** and gradient penalty to mitigate these issues. In this paper, we introduce a novel class of scoring rules on pathspace based on signature kernels and use them as objective for training Neural SDEs non-adversarially. By showing strict properness of such kernel scores and consistency of the corresponding estimators, we provide existence and uniqueness guarantees for the minimiser. With this formulation, evaluating the generator-discriminator pair amounts to solving a system of linear path-dependent PDEs which allows for memory-efficient adjoint-based backpropagation. Moreover, because the proposed kernel scores are well-defined for paths with values in infinite dimensional spaces of functions, our framework can be easily extended to generate spatiotemporal data. Our procedure permits conditioning on a rich variety of market conditions and significantly outperforms alternative ways of training Neural SDEs on a variety of tasks including the simulation of rough volatility models, the conditional probabilistic forecasts of real-world forex pairs where the conditioning variable is an observed past trajectory, and the mesh-free generation of limit order book dynamics.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Optimal Stop** via Distribution Regression: a Higher Rank Signature Approach
Authors:
Blanka Horvath,
Maud Lemercier,
Chong Liu,
Terry Lyons,
Cristopher Salvi
Abstract:
Distribution Regression on path-space refers to the task of learning functions map** the law of a stochastic process to a scalar target. The learning procedure based on the notion of path-signature, i.e. a classical transform from rough path theory, was widely used to approximate weakly continuous functionals, such as the pricing functionals of path--dependent options' payoffs. However, this app…
▽ More
Distribution Regression on path-space refers to the task of learning functions map** the law of a stochastic process to a scalar target. The learning procedure based on the notion of path-signature, i.e. a classical transform from rough path theory, was widely used to approximate weakly continuous functionals, such as the pricing functionals of path--dependent options' payoffs. However, this approach fails for Optimal Stop** Problems arising from mathematical finance, such as the pricing of American options, because the corresponding value functions are in general discontinuous with respect to the weak topology. In this paper we develop a rigorous mathematical framework to resolve this issue by recasting an Optimal Stop** Problem as a higher order kernel mean embedding regression based on the notions of higher rank signatures of measure--valued paths and adapted topologies. The core computational component of our algorithm consists in solving a family of two--dimensional hyperbolic PDEs.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Neural signature kernels as infinite-width-depth-limits of controlled ResNets
Authors:
Nicola Muca Cirone,
Maud Lemercier,
Cristopher Salvi
Abstract:
Motivated by the paradigm of reservoir computing, we consider randomly initialized controlled ResNets defined as Euler-discretizations of neural controlled differential equations (Neural CDEs), a unified architecture which enconpasses both RNNs and ResNets. We show that in the infinite-width-depth limit and under proper scaling, these architectures converge weakly to Gaussian processes indexed on…
▽ More
Motivated by the paradigm of reservoir computing, we consider randomly initialized controlled ResNets defined as Euler-discretizations of neural controlled differential equations (Neural CDEs), a unified architecture which enconpasses both RNNs and ResNets. We show that in the infinite-width-depth limit and under proper scaling, these architectures converge weakly to Gaussian processes indexed on some spaces of continuous paths and with kernels satisfying certain partial differential equations (PDEs) varying according to the choice of activation function, extending the results of Hayou (2022); Hayou & Yang (2023) to the controlled and homogeneous case. In the special, homogeneous, case where the activation is the identity, we show that the equation reduces to a linear PDE and the limiting kernel agrees with the signature kernel of Salvi et al. (2021a). We name this new family of limiting kernels neural signature kernels. Finally, we show that in the infinite-depth regime, finite-width controlled ResNets converge in distribution to Neural CDEs with random vector fields which, depending on whether the weights are shared across layers, are either time-independent and Gaussian or behave like a matrix-valued Brownian motion.
△ Less
Submitted 4 June, 2023; v1 submitted 30 March, 2023;
originally announced March 2023.
-
New directions in the applications of rough path theory
Authors:
Adeline Fermanian,
Terry Lyons,
James Morrill,
Cristopher Salvi
Abstract:
This article provides a concise overview of some of the recent advances in the application of rough path theory to machine learning. Controlled differential equations (CDEs) are discussed as the key mathematical model to describe the interaction of a stream with a physical control system. A collection of iterated integrals known as the signature naturally arises in the description of the response…
▽ More
This article provides a concise overview of some of the recent advances in the application of rough path theory to machine learning. Controlled differential equations (CDEs) are discussed as the key mathematical model to describe the interaction of a stream with a physical control system. A collection of iterated integrals known as the signature naturally arises in the description of the response produced by such interactions. The signature comes equipped with a variety of powerful properties rendering it an ideal feature map for streamed data. We summarise recent advances in the symbiosis between deep learning and CDEs, studying the link with RNNs and culminating with the Neural CDE model. We concluded with a discussion on signature kernel methods.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
A structure theorem for streamed information
Authors:
Cristopher Salvi,
Joscha Diehl,
Terry Lyons,
Rosa Preiss,
Jeremy Reizenstein
Abstract:
We identify the free half shuffle algebra of Schützenberger (1958) with an algebra of real-valued functionals on paths, where the half shuffle emulates integration of a functional against another. We then provide two, to our knowledge, new identities in arity 3 involving its commutator (area), and show that these are sufficient to recover the Zinbiel and Tortkara identities of Dzhumadil'daev (2007…
▽ More
We identify the free half shuffle algebra of Schützenberger (1958) with an algebra of real-valued functionals on paths, where the half shuffle emulates integration of a functional against another. We then provide two, to our knowledge, new identities in arity 3 involving its commutator (area), and show that these are sufficient to recover the Zinbiel and Tortkara identities of Dzhumadil'daev (2007). We use these identities to prove that any element of the free half shuffle algebra can be expressed as a polynomial over iterated areas. Moreover, we consider minimal sets of iterated integrals defined through the recursive application of the half shuffle on Hall trees. Leveraging the duality between this set of Hall integrals and classical Hall bases of the free Lie algebra, we prove using combinatorial arguments that any element of the free half shuffle algebra can be written uniquely as a polynomial over Hall integrals. We interpret this result as a structure theorem for streamed information, loosely analogous to the unique prime factorisation of integers, allowing to split any real valued function on streamed data into two parts: a first that extracts and packages the streamed information into recursively defined atomic objects (Hall integrals), and a second that evaluates a polynomial function in these objects without further reference to the original stream. The question of whether a similar result holds if Hall integrals are replaced by Hall areas is left as an open conjecture. Finally, we construct a canonical, but to our knowledge, new decomposition of the free half shuffle algebra as shuffle power series in the greatest letter of the original alphabet with coefficients in a sub-algebra freely generated by a new alphabet with an infinite number of letters. We use this construction to provide a second proof of our structure theorem.
△ Less
Submitted 30 July, 2023; v1 submitted 30 November, 2022;
originally announced December 2022.
-
Neural Stochastic PDEs: Resolution-Invariant Learning of Continuous Spatiotemporal Dynamics
Authors:
Cristopher Salvi,
Maud Lemercier,
Andris Gerasimovics
Abstract:
Stochastic partial differential equations (SPDEs) are the mathematical tool of choice for modelling spatiotemporal PDE-dynamics under the influence of randomness. Based on the notion of mild solution of an SPDE, we introduce a novel neural architecture to learn solution operators of PDEs with (possibly stochastic) forcing from partially observed data. The proposed Neural SPDE model provides an ext…
▽ More
Stochastic partial differential equations (SPDEs) are the mathematical tool of choice for modelling spatiotemporal PDE-dynamics under the influence of randomness. Based on the notion of mild solution of an SPDE, we introduce a novel neural architecture to learn solution operators of PDEs with (possibly stochastic) forcing from partially observed data. The proposed Neural SPDE model provides an extension to two popular classes of physics-inspired architectures. On the one hand, it extends Neural CDEs and variants -- continuous-time analogues of RNNs -- in that it is capable of processing incoming sequential information arriving at arbitrary spatial resolutions. On the other hand, it extends Neural Operators -- generalizations of neural networks to model map**s between spaces of functions -- in that it can parameterize solution operators of SPDEs depending simultaneously on the initial condition and a realization of the driving noise. By performing operations in the spectral domain, we show how a Neural SPDE can be evaluated in two ways, either by calling an ODE solver (emulating a spectral Galerkin scheme), or by solving a fixed point problem. Experiments on various semilinear SPDEs, including the stochastic Navier-Stokes equations, demonstrate how the Neural SPDE model is capable of learning complex spatiotemporal dynamics in a resolution-invariant way, with better accuracy and lighter training data requirements compared to alternative models, and up to 3 orders of magnitude faster than traditional solvers.
△ Less
Submitted 24 September, 2022; v1 submitted 19 October, 2021;
originally announced October 2021.
-
Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes
Authors:
Cristopher Salvi,
Maud Lemercier,
Chong Liu,
Blanka Hovarth,
Theodoros Damoulas,
Terry Lyons
Abstract:
Stochastic processes are random variables with values in some space of paths. However, reducing a stochastic process to a path-valued random variable ignores its filtration, i.e. the flow of information carried by the process through time. By conditioning the process on its filtration, we introduce a family of higher order kernel mean embeddings (KMEs) that generalizes the notion of KME and captur…
▽ More
Stochastic processes are random variables with values in some space of paths. However, reducing a stochastic process to a path-valued random variable ignores its filtration, i.e. the flow of information carried by the process through time. By conditioning the process on its filtration, we introduce a family of higher order kernel mean embeddings (KMEs) that generalizes the notion of KME and captures additional information related to the filtration. We derive empirical estimators for the associated higher order maximum mean discrepancies (MMDs) and prove consistency. We then construct a filtration-sensitive kernel two-sample test able to pick up information that gets missed by the standard MMD test. In addition, leveraging our higher order MMDs we construct a family of universal kernels on stochastic processes that allows to solve real-world calibration and optimal stop** problems in quantitative finance (such as the pricing of American options) via classical kernel-based regression methods. Finally, adapting existing tests for conditional independence to the case of stochastic processes, we design a causal-discovery algorithm to recover the causal graph of structural dependencies among interacting bodies solely from observations of their multidimensional trajectories.
△ Less
Submitted 3 November, 2021; v1 submitted 8 September, 2021;
originally announced September 2021.
-
SigGPDE: Scaling Sparse Gaussian Processes on Sequential Data
Authors:
Maud Lemercier,
Cristopher Salvi,
Thomas Cass,
Edwin V. Bonilla,
Theodoros Damoulas,
Terry Lyons
Abstract:
Making predictions and quantifying their uncertainty when the input data is sequential is a fundamental learning challenge, recently attracting increasing attention. We develop SigGPDE, a new scalable sparse variational inference framework for Gaussian Processes (GPs) on sequential data. Our contribution is twofold. First, we construct inducing variables underpinning the sparse approximation so th…
▽ More
Making predictions and quantifying their uncertainty when the input data is sequential is a fundamental learning challenge, recently attracting increasing attention. We develop SigGPDE, a new scalable sparse variational inference framework for Gaussian Processes (GPs) on sequential data. Our contribution is twofold. First, we construct inducing variables underpinning the sparse approximation so that the resulting evidence lower bound (ELBO) does not require any matrix inversion. Second, we show that the gradients of the GP signature kernel are solutions of a hyperbolic partial differential equation (PDE). This theoretical insight allows us to build an efficient back-propagation algorithm to optimize the ELBO. We showcase the significant computational gains of SigGPDE compared to existing methods, while achieving state-of-the-art performance for classification tasks on large datasets of up to 1 million multivariate time series.
△ Less
Submitted 12 October, 2021; v1 submitted 10 May, 2021;
originally announced May 2021.
-
SK-Tree: a systematic malware detection algorithm on streaming trees via the signature kernel
Authors:
Thomas Cochrane,
Peter Foster,
Varun Chhabra,
Maud Lemercier,
Cristopher Salvi,
Terry Lyons
Abstract:
The development of machine learning algorithms in the cyber security domain has been impeded by the complex, hierarchical, sequential and multimodal nature of the data involved. In this paper we introduce the notion of a streaming tree as a generic data structure encompassing a large portion of real-world cyber security data. Starting from host-based event logs we represent computer processes as s…
▽ More
The development of machine learning algorithms in the cyber security domain has been impeded by the complex, hierarchical, sequential and multimodal nature of the data involved. In this paper we introduce the notion of a streaming tree as a generic data structure encompassing a large portion of real-world cyber security data. Starting from host-based event logs we represent computer processes as streaming trees that evolve in continuous time. Leveraging the properties of the signature kernel, a machine learning tool that recently emerged as a leading technology for learning with complex sequences of data, we develop the SK-Tree algorithm. SK-Tree is a supervised learning method for systematic malware detection on streaming trees that is robust to irregular sampling and high dimensionality of the underlying streams. We demonstrate the effectiveness of SK-Tree to detect malicious events on a portion of the publicly available DARPA OpTC dataset, achieving an AUROC score of 98%.
△ Less
Submitted 29 September, 2021; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Neural Rough Differential Equations for Long Time Series
Authors:
James Morrill,
Cristopher Salvi,
Patrick Kidger,
James Foster,
Terry Lyons
Abstract:
Neural controlled differential equations (CDEs) are the continuous-time analogue of recurrent neural networks, as Neural ODEs are to residual networks, and offer a memory-efficient continuous-time way to model functions of potentially irregular time series. Existing methods for computing the forward pass of a Neural CDE involve embedding the incoming time series into path space, often via interpol…
▽ More
Neural controlled differential equations (CDEs) are the continuous-time analogue of recurrent neural networks, as Neural ODEs are to residual networks, and offer a memory-efficient continuous-time way to model functions of potentially irregular time series. Existing methods for computing the forward pass of a Neural CDE involve embedding the incoming time series into path space, often via interpolation, and using evaluations of this path to drive the hidden state. Here, we use rough path theory to extend this formulation. Instead of directly embedding into path space, we instead represent the input signal over small time intervals through its \textit{log-signature}, which are statistics describing how the signal drives a CDE. This is the approach for solving \textit{rough differential equations} (RDEs), and correspondingly we describe our main contribution as the introduction of Neural RDEs. This extension has a purpose: by generalising the Neural CDE approach to a broader class of driving signals, we demonstrate particular advantages for tackling long time series. In this regime, we demonstrate efficacy on problems of length up to 17k observations and observe significant training speed-ups, improvements in model performance, and reduced memory requirements compared to existing approaches.
△ Less
Submitted 21 June, 2021; v1 submitted 17 September, 2020;
originally announced September 2020.
-
The Signature Kernel is the solution of a Goursat PDE
Authors:
Cristopher Salvi,
Thomas Cass,
James Foster,
Terry Lyons,
Weixin Yang
Abstract:
Recently, there has been an increased interest in the development of kernel methods for learning with sequential data. The signature kernel is a learning tool with potential to handle irregularly sampled, multivariate time series. In "Kernels for sequentially ordered data" the authors introduced a kernel trick for the truncated version of this kernel avoiding the exponential complexity that would…
▽ More
Recently, there has been an increased interest in the development of kernel methods for learning with sequential data. The signature kernel is a learning tool with potential to handle irregularly sampled, multivariate time series. In "Kernels for sequentially ordered data" the authors introduced a kernel trick for the truncated version of this kernel avoiding the exponential complexity that would have been involved in a direct computation. Here we show that for continuously differentiable paths, the signature kernel solves a hyperbolic PDE and recognize the connection with a well known class of differential equations known in the literature as Goursat problems. This Goursat PDE only depends on the increments of the input sequences, does not require the explicit computation of signatures and can be solved efficiently using state-of-the-arthyperbolic PDE numerical solvers, giving a kernel trick for the untruncated signature kernel, with the same raw complexity as the method from "Kernels for sequentially ordered data", but with the advantage that the PDE numerical scheme is well suited for GPU parallelization, which effectively reduces the complexity by a full order of magnitude in the length of the input sequences. In addition, we extend the previous analysis to the space of geometric rough paths and establish, using classical results from rough path theory, that the rough version of the signature kernel solves a rough integral equation analogous to the aforementioned Goursat PDE. Finally, we empirically demonstrate the effectiveness of our PDE kernel as a machine learning tool in various machine learning applications dealing with sequential data. We release the library sigkernel publicly available at https://github.com/crispitagorico/sigkernel.
△ Less
Submitted 20 March, 2021; v1 submitted 26 June, 2020;
originally announced June 2020.
-
Distribution Regression for Sequential Data
Authors:
Maud Lemercier,
Cristopher Salvi,
Theodoros Damoulas,
Edwin V. Bonilla,
Terry Lyons
Abstract:
Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic anal…
▽ More
Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each is suited to a different data regime in terms of the number of data streams and the dimensionality of the individual streams. We provide theoretical results on the universality of both approaches and demonstrate empirically their robustness to irregularly sampled multivariate time-series, achieving state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science.
△ Less
Submitted 29 September, 2021; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Sig-SDEs model for quantitative finance
Authors:
Imanol Perez Arribas,
Cristopher Salvi,
Lukasz Szpruch
Abstract:
Mathematical models, calibrated to data, have become ubiquitous to make key decision processes in modern quantitative finance. In this work, we propose a novel framework for data-driven model selection by integrating a classical quantitative setup with a generative modelling approach. Leveraging the properties of the signature, a well-known path-transform from stochastic analysis that recently eme…
▽ More
Mathematical models, calibrated to data, have become ubiquitous to make key decision processes in modern quantitative finance. In this work, we propose a novel framework for data-driven model selection by integrating a classical quantitative setup with a generative modelling approach. Leveraging the properties of the signature, a well-known path-transform from stochastic analysis that recently emerged as leading machine learning technology for learning time-series data, we develop the Sig-SDE model. Sig-SDE provides a new perspective on neural SDEs and can be calibrated to exotic financial products that depend, in a non-linear way, on the whole trajectory of asset prices. Furthermore, we our approach enables to consistently calibrate under the pricing measure $\mathbb Q$ and real-world measure $\mathbb P$. Finally, we demonstrate the ability of Sig-SDE to simulate future possible market scenarios needed for computing risk profiles or hedging strategies. Importantly, this new model is underpinned by rigorous mathematical analysis, that under appropriate conditions provides theoretical guarantees for convergence of the presented algorithms.
△ Less
Submitted 3 June, 2020; v1 submitted 30 May, 2020;
originally announced June 2020.
-
Deep Signature Transforms
Authors:
Patric Bonnier,
Patrick Kidger,
Imanol Perez Arribas,
Cristopher Salvi,
Terry Lyons
Abstract:
The signature is an infinite graded sequence of statistics known to characterise a stream of data up to a negligible equivalence class. It is a transform which has previously been treated as a fixed feature transformation, on top of which a model may be built. We propose a novel approach which combines the advantages of the signature transform with modern deep learning frameworks. By learning an a…
▽ More
The signature is an infinite graded sequence of statistics known to characterise a stream of data up to a negligible equivalence class. It is a transform which has previously been treated as a fixed feature transformation, on top of which a model may be built. We propose a novel approach which combines the advantages of the signature transform with modern deep learning frameworks. By learning an augmentation of the stream prior to the signature transform, the terms of the signature may be selected in a data-dependent way. More generally, we describe how the signature transform may be used as a layer anywhere within a neural network. In this context it may be interpreted as a pooling operation. We present the results of empirical experiments to back up the theoretical justification. Code available at https://github.com/patrick-kidger/Deep-Signature-Transforms.
△ Less
Submitted 26 October, 2019; v1 submitted 21 May, 2019;
originally announced May 2019.
-
Energy Conversion Using New Thermoelectric Generator
Authors:
Guillaume Savelli,
Marc Plissonnier,
Jacqueline Bablet,
C. Salvi,
J. M. Fournier
Abstract:
During recent years, microelectronics helped to develop complex and varied technologies. It appears that many of these technologies can be applied successfully to realize Seebeck micro generators: photolithography and deposition methods allow to elaborate thin thermoelectric structures at the micro-scale level. Our goal is to scavenge energy by develo** a miniature power source for operating e…
▽ More
During recent years, microelectronics helped to develop complex and varied technologies. It appears that many of these technologies can be applied successfully to realize Seebeck micro generators: photolithography and deposition methods allow to elaborate thin thermoelectric structures at the micro-scale level. Our goal is to scavenge energy by develo** a miniature power source for operating electronic components. First Bi and Sb micro-devices on silicon glass substrate have been manufactured with an area of 1cm2 including more than one hundred junctions. Each step of process fabrication has been optimized: photolithography, deposition process, anneals conditions and metallic connections. Different device structures have been realized with different micro-line dimensions. Each devices performance will be reviewed and discussed in function of their design structure.
△ Less
Submitted 21 November, 2007;
originally announced November 2007.