Search | arXiv e-print repository

The insertion method to invert the signature of a path

Authors: Adeline Fermanian, Jiawei Chang, Terry Lyons, Gérard Biau

Abstract: The signature is a representation of a path as an infinite sequence of its iterated integrals. Under certain assumptions, the signature characterizes the path, up to translation and reparameterization. Therefore, a crucial question of interest is the development of efficient algorithms to invert the signature, i.e., to reconstruct the path from the information of its (truncated) signature. In this… ▽ More The signature is a representation of a path as an infinite sequence of its iterated integrals. Under certain assumptions, the signature characterizes the path, up to translation and reparameterization. Therefore, a crucial question of interest is the development of efficient algorithms to invert the signature, i.e., to reconstruct the path from the information of its (truncated) signature. In this article, we study the insertion procedure, originally introduced by Chang and Lyons (2019), from both a theoretical and a practical point of view. After describing our version of the method, we give its rate of convergence for piecewise linear paths, accompanied by an implementation in Pytorch. The algorithm is parallelized, meaning that it is very efficient at inverting a batch of signatures simultaneously. Its performance is illustrated with both real-world and simulated examples. △ Less

Submitted 19 September, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2301.13112 [pdf, other]

Benchmarking optimality of time series classification methods in distinguishing diffusions

Authors: Zehong Zhang, Fei Lu, Esther Xu Fei, Terry Lyons, Yannis Kevrekidis, Tom Woolf

Abstract: Statistical optimality benchmarking is crucial for analyzing and designing time series classification (TSC) algorithms. This study proposes to benchmark the optimality of TSC algorithms in distinguishing diffusion processes by the likelihood ratio test (LRT). The LRT is an optimal classifier by the Neyman-Pearson lemma. The LRT benchmarks are computationally efficient because the LRT does not need… ▽ More Statistical optimality benchmarking is crucial for analyzing and designing time series classification (TSC) algorithms. This study proposes to benchmark the optimality of TSC algorithms in distinguishing diffusion processes by the likelihood ratio test (LRT). The LRT is an optimal classifier by the Neyman-Pearson lemma. The LRT benchmarks are computationally efficient because the LRT does not need training, and the diffusion processes can be efficiently simulated and are flexible to reflect the specific features of real-world applications. We demonstrate the benchmarking with three widely-used TSC algorithms: random forest, ResNet, and ROCKET. These algorithms can achieve the LRT optimality for univariate time series and multivariate Gaussian processes. However, these model-agnostic algorithms are suboptimal in classifying high-dimensional nonlinear multivariate time series. Additionally, the LRT benchmark provides tools to analyze the dependence of classification accuracy on the time length, dimension, temporal sampling frequency, and randomness of the time series. △ Less

Submitted 11 April, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: 23 pages, 8 figures

MSC Class: 62M02; 62M10; 62M20

arXiv:2301.09517 [pdf, other]

Sampling-based Nyström Approximation and Kernel Quadrature

Authors: Satoshi Hayakawa, Harald Oberhauser, Terry Lyons

Abstract: We analyze the Nyström approximation of a positive definite kernel associated with a probability measure. We first prove an improved error bound for the conventional Nyström approximation with i.i.d. sampling and singular-value decomposition in the continuous regime; the proof techniques are borrowed from statistical learning theory. We further introduce a refined selection of subspaces in Nyström… ▽ More We analyze the Nyström approximation of a positive definite kernel associated with a probability measure. We first prove an improved error bound for the conventional Nyström approximation with i.i.d. sampling and singular-value decomposition in the continuous regime; the proof techniques are borrowed from statistical learning theory. We further introduce a refined selection of subspaces in Nyström approximation with theoretical guarantees that is applicable to non-i.i.d. landmark points. Finally, we discuss their application to convex kernel quadrature and give novel theoretical guarantees as well as numerical observations. △ Less

Submitted 22 May, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: 22 pages, ICML 2023 camera-ready version. Typos fixed

arXiv:2206.14674 [pdf, other]

Signature Methods in Machine Learning

Authors: Terry Lyons, Andrew D. McLeod

Abstract: Signature-based techniques give mathematical insight into the interactions between complex streams of evolving data. These insights can be quite naturally translated into numerical approaches to understanding streamed data, and perhaps because of their mathematical precision, have proved useful in analysing streamed data in situations where the data is irregular, and not stationary, and the dimens… ▽ More Signature-based techniques give mathematical insight into the interactions between complex streams of evolving data. These insights can be quite naturally translated into numerical approaches to understanding streamed data, and perhaps because of their mathematical precision, have proved useful in analysing streamed data in situations where the data is irregular, and not stationary, and the dimension of the data and the sample sizes are both moderate. Understanding streamed multi-modal data is exponential: a word in $n$ letters from an alphabet of size $d$ can be any one of $d^n$ messages. Signatures remove the exponential amount of noise that arises from sampling irregularity, but an exponential amount of information still remain. This survey aims to stay in the domain where that exponential scaling can be managed directly. Scalability issues are an important challenge in many problems but would require another survey article and further ideas. This survey describes a range of contexts where the data sets are small enough to remove the possibility of massive machine learning, and the existence of small sets of context free and principled features can be used effectively. The mathematical nature of the tools can make their use intimidating to non-mathematicians. The examples presented in this article are intended to bridge this communication gap and provide tractable working examples drawn from the machine learning context. Notebooks are available online for several of these examples. This survey builds on the earlier paper of Ilya Chevryev and Andrey Kormilitzin which had broadly similar aims at an earlier point in the development of this machinery. This article illustrates how the theoretical insights offered by signatures are simply realised in the analysis of application data in a way that is largely agnostic to the data type. △ Less

Submitted 26 January, 2024; v1 submitted 29 June, 2022; originally announced June 2022.

MSC Class: 60L10; 93C15; 68Q32; 34F05

arXiv:2109.03582 [pdf, other]

Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes

Authors: Cristopher Salvi, Maud Lemercier, Chong Liu, Blanka Hovarth, Theodoros Damoulas, Terry Lyons

Abstract: Stochastic processes are random variables with values in some space of paths. However, reducing a stochastic process to a path-valued random variable ignores its filtration, i.e. the flow of information carried by the process through time. By conditioning the process on its filtration, we introduce a family of higher order kernel mean embeddings (KMEs) that generalizes the notion of KME and captur… ▽ More Stochastic processes are random variables with values in some space of paths. However, reducing a stochastic process to a path-valued random variable ignores its filtration, i.e. the flow of information carried by the process through time. By conditioning the process on its filtration, we introduce a family of higher order kernel mean embeddings (KMEs) that generalizes the notion of KME and captures additional information related to the filtration. We derive empirical estimators for the associated higher order maximum mean discrepancies (MMDs) and prove consistency. We then construct a filtration-sensitive kernel two-sample test able to pick up information that gets missed by the standard MMD test. In addition, leveraging our higher order MMDs we construct a family of universal kernels on stochastic processes that allows to solve real-world calibration and optimal stop** problems in quantitative finance (such as the pricing of American options) via classical kernel-based regression methods. Finally, adapting existing tests for conditional independence to the case of stochastic processes, we design a causal-discovery algorithm to recover the causal graph of structural dependencies among interacting bodies solely from observations of their multidimensional trajectories. △ Less

Submitted 3 November, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: Published at NeurIPS 2021

MSC Class: 60L10; 60L20

arXiv:2107.09597 [pdf, other]

Positively Weighted Kernel Quadrature via Subsampling

Authors: Satoshi Hayakawa, Harald Oberhauser, Terry Lyons

Abstract: We study kernel quadrature rules with convex weights. Our approach combines the spectral properties of the kernel with recombination results about point measures. This results in effective algorithms that construct convex quadrature rules using only access to i.i.d. samples from the underlying measure and evaluation of the kernel and that result in a small worst-case error. In addition to our theo… ▽ More We study kernel quadrature rules with convex weights. Our approach combines the spectral properties of the kernel with recombination results about point measures. This results in effective algorithms that construct convex quadrature rules using only access to i.i.d. samples from the underlying measure and evaluation of the kernel and that result in a small worst-case error. In addition to our theoretical results and the benefits resulting from convex weights, our experiments indicate that this construction can compete with the optimal bounds in well-known examples. △ Less

Submitted 11 October, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

Comments: 29 pages, NeurIPS 2022 camera-ready version

arXiv:2105.13493 [pdf, other]

Efficient and Accurate Gradients for Neural SDEs

Authors: Patrick Kidger, James Foster, Xuechen Li, Terry Lyons

Abstract: Neural SDEs combine many of the best qualities of both RNNs and SDEs: memory efficient training, high-capacity function approximation, and strong priors on model space. This makes them a natural choice for modelling many types of temporal dynamics. Training a Neural SDE (either as a VAE or as a GAN) requires backpropagating through an SDE solve. This may be done by solving a backwards-in-time SDE… ▽ More Neural SDEs combine many of the best qualities of both RNNs and SDEs: memory efficient training, high-capacity function approximation, and strong priors on model space. This makes them a natural choice for modelling many types of temporal dynamics. Training a Neural SDE (either as a VAE or as a GAN) requires backpropagating through an SDE solve. This may be done by solving a backwards-in-time SDE whose solution is the desired parameter gradients. However, this has previously suffered from severe speed and accuracy issues, due to high computational cost and numerical truncation errors. Here, we overcome these issues through several technical innovations. First, we introduce the \textit{reversible Heun method}. This is a new SDE solver that is \textit{algebraically reversible}: eliminating numerical gradient errors, and the first such solver of which we are aware. Moreover it requires half as many function evaluations as comparable solvers, giving up to a $1.98\times$ speedup. Second, we introduce the \textit{Brownian Interval}: a new, fast, memory efficient, and exact way of sampling \textit{and reconstructing} Brownian motion. With this we obtain up to a $10.6\times$ speed improvement over previous techniques, which in contrast are both approximate and relatively slow. Third, when specifically training Neural SDEs as GANs (Kidger et al. 2021), we demonstrate how SDE-GANs may be trained through careful weight clip** and choice of activation function. This reduces computational cost (giving up to a $1.87\times$ speedup) and removes the numerical truncation errors associated with gradient penalty. Altogether, we outperform the state-of-the-art by substantial margins, with respect to training speed, and with respect to classification, prediction, and MMD test metrics. We have contributed implementations of all of our techniques to the torchsde library to help facilitate their adoption. △ Less

Submitted 19 October, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

Comments: Accepted at NeurIPS 2021

arXiv:2105.04211 [pdf, other]

SigGPDE: Scaling Sparse Gaussian Processes on Sequential Data

Authors: Maud Lemercier, Cristopher Salvi, Thomas Cass, Edwin V. Bonilla, Theodoros Damoulas, Terry Lyons

Abstract: Making predictions and quantifying their uncertainty when the input data is sequential is a fundamental learning challenge, recently attracting increasing attention. We develop SigGPDE, a new scalable sparse variational inference framework for Gaussian Processes (GPs) on sequential data. Our contribution is twofold. First, we construct inducing variables underpinning the sparse approximation so th… ▽ More Making predictions and quantifying their uncertainty when the input data is sequential is a fundamental learning challenge, recently attracting increasing attention. We develop SigGPDE, a new scalable sparse variational inference framework for Gaussian Processes (GPs) on sequential data. Our contribution is twofold. First, we construct inducing variables underpinning the sparse approximation so that the resulting evidence lower bound (ELBO) does not require any matrix inversion. Second, we show that the gradients of the GP signature kernel are solutions of a hyperbolic partial differential equation (PDE). This theoretical insight allows us to build an efficient back-propagation algorithm to optimize the ELBO. We showcase the significant computational gains of SigGPDE compared to existing methods, while achieving state-of-the-art performance for classification tasks on large datasets of up to 1 million multivariate time series. △ Less

Submitted 12 October, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

Comments: Published at ICML 2021

MSC Class: 60L10; 60L20

arXiv:2009.08295 [pdf, other]

Neural Rough Differential Equations for Long Time Series

Authors: James Morrill, Cristopher Salvi, Patrick Kidger, James Foster, Terry Lyons

Abstract: Neural controlled differential equations (CDEs) are the continuous-time analogue of recurrent neural networks, as Neural ODEs are to residual networks, and offer a memory-efficient continuous-time way to model functions of potentially irregular time series. Existing methods for computing the forward pass of a Neural CDE involve embedding the incoming time series into path space, often via interpol… ▽ More Neural controlled differential equations (CDEs) are the continuous-time analogue of recurrent neural networks, as Neural ODEs are to residual networks, and offer a memory-efficient continuous-time way to model functions of potentially irregular time series. Existing methods for computing the forward pass of a Neural CDE involve embedding the incoming time series into path space, often via interpolation, and using evaluations of this path to drive the hidden state. Here, we use rough path theory to extend this formulation. Instead of directly embedding into path space, we instead represent the input signal over small time intervals through its \textit{log-signature}, which are statistics describing how the signal drives a CDE. This is the approach for solving \textit{rough differential equations} (RDEs), and correspondingly we describe our main contribution as the introduction of Neural RDEs. This extension has a purpose: by generalising the Neural CDE approach to a broader class of driving signals, we demonstrate particular advantages for tackling long time series. In this regime, we demonstrate efficacy on problems of length up to 17k observations and observe significant training speed-ups, improvements in model performance, and reduced memory requirements compared to existing approaches. △ Less

Submitted 21 June, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: Published at ICML 2021

arXiv:2008.03408 [pdf, other]

Learning to Detect Bipolar Disorder and Borderline Personality Disorder with Language and Speech in Non-Clinical Interviews

Authors: Bo Wang, Yue Wu, Niall Taylor, Terry Lyons, Maria Liakata, Alejo J Nevado-Holgado, Kate E A Saunders

Abstract: Bipolar disorder (BD) and borderline personality disorder (BPD) are both chronic psychiatric disorders. However, their overlap** symptoms and common comorbidity make it challenging for the clinicians to distinguish the two conditions on the basis of a clinical interview. In this work, we first present a new multi-modal dataset containing interviews involving individuals with BD or BPD being inte… ▽ More Bipolar disorder (BD) and borderline personality disorder (BPD) are both chronic psychiatric disorders. However, their overlap** symptoms and common comorbidity make it challenging for the clinicians to distinguish the two conditions on the basis of a clinical interview. In this work, we first present a new multi-modal dataset containing interviews involving individuals with BD or BPD being interviewed about a non-clinical topic . We investigate the automatic detection of the two conditions, and demonstrate a good linear classifier that can be learnt using a down-selected set of features from the different aspects of the interviews and a novel approach of summarising these features. Finally, we find that different sets of features characterise BD and BPD, thus providing insights into the difference between the automatic screening of the two conditions. △ Less

Submitted 31 May, 2021; v1 submitted 7 August, 2020; originally announced August 2020.

MSC Class: 60L10

arXiv:2006.15030 [pdf, other]

Deriving information from missing data: implications for mood prediction

Authors: Yue Wu, Terry J. Lyons, Kate E. A. Saunders

Abstract: The availability of mobile technologies has enabled the efficient collection prospective longitudinal, ecologically valid self-reported mood data from psychiatric patients. These data streams have potential for improving the efficiency and accuracy of psychiatric diagnosis as well predicting future mood states enabling earlier intervention. However, missing responses are common in such datasets an… ▽ More The availability of mobile technologies has enabled the efficient collection prospective longitudinal, ecologically valid self-reported mood data from psychiatric patients. These data streams have potential for improving the efficiency and accuracy of psychiatric diagnosis as well predicting future mood states enabling earlier intervention. However, missing responses are common in such datasets and there is little consensus as to how this should be dealt with in practice. A signature-based method was used to capture different elements of self-reported mood alongside missing data to both classify diagnostic group and predict future mood in patients with bipolar disorder, borderline personality disorder and healthy controls. The missing-response-incorporated signature-based method achieves roughly 66\% correct diagnosis, with f1 scores for three different clinic groups 59\% (bipolar disorder), 75\% (healthy control) and 61\% (borderline personality disorder) respectively. This was significantly more efficient than the naive model which excluded missing data. Accuracies of predicting subsequent mood states and scores were also improved by inclusion of missing responses. The signature method provided an effective approach to the analysis of prospectively collected mood data where missing data was common and should be considered as an approach in other similar datasets. △ Less

Submitted 8 July, 2020; v1 submitted 26 June, 2020; originally announced June 2020.

MSC Class: 60L10; 60L90; 62D10; 92-08

arXiv:2006.14498 [pdf, other]

A Data-driven Market Simulator for Small Data Environments

Authors: Hans Bühler, Blanka Horvath, Terry Lyons, Imanol Perez Arribas, Ben Wood

Abstract: Neural network based data-driven market simulation unveils a new and flexible way of modelling financial time series without imposing assumptions on the underlying stochastic dynamics. Though in this sense generative market simulation is model-free, the concrete modelling choices are nevertheless decisive for the features of the simulated paths. We give a brief overview of currently used generativ… ▽ More Neural network based data-driven market simulation unveils a new and flexible way of modelling financial time series without imposing assumptions on the underlying stochastic dynamics. Though in this sense generative market simulation is model-free, the concrete modelling choices are nevertheless decisive for the features of the simulated paths. We give a brief overview of currently used generative modelling approaches and performance evaluation metrics for financial time series, and address some of the challenges to achieve good results in the latter. We also contrast some classical approaches of market simulation with simulation based on generative modelling and highlight some advantages and pitfalls of the new approach. While most generative models tend to rely on large amounts of training data, we present here a generative model that works reliably in environments where the amount of available training data is notoriously small. Furthermore, we show how a rough paths perspective combined with a parsimonious Variational Autoencoder framework provides a powerful way for encoding and evaluating financial time series in such environments where available training data is scarce. Finally, we also propose a suitable performance evaluation metric for financial time series and discuss some connections of our Market Generator to deep hedging. △ Less

Submitted 21 June, 2020; originally announced June 2020.

Comments: 27 pages, 9 figures

arXiv:2006.05805 [pdf, other]

Distribution Regression for Sequential Data

Authors: Maud Lemercier, Cristopher Salvi, Theodoros Damoulas, Edwin V. Bonilla, Terry Lyons

Abstract: Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic anal… ▽ More Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each is suited to a different data regime in terms of the number of data streams and the dimensionality of the individual streams. We provide theoretical results on the universality of both approaches and demonstrate empirically their robustness to irregularly sampled multivariate time-series, achieving state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science. △ Less

Submitted 29 September, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: Published at AISTATS 2021

MSC Class: 60L10; 60L20

arXiv:2006.03487 [pdf, other]

Dimensionless Anomaly Detection on Multivariate Streams with Variance Norm and Path Signature

Authors: Zhen Shao, Ryan Sze-Yin Chan, Thomas Cochrane, Peter Foster, Terry Lyons

Abstract: In this paper, we propose a dimensionless anomaly detection method for multivariate streams. Our method is independent of the unit of measurement for the different stream channels, therefore dimensionless. We first propose the variance norm, a generalisation of Mahalanobis distance to handle infinite-dimensional feature space and singular empirical covariance matrix rigorously. We then combine the… ▽ More In this paper, we propose a dimensionless anomaly detection method for multivariate streams. Our method is independent of the unit of measurement for the different stream channels, therefore dimensionless. We first propose the variance norm, a generalisation of Mahalanobis distance to handle infinite-dimensional feature space and singular empirical covariance matrix rigorously. We then combine the variance norm with the path signature, an infinite collection of iterated integrals that provide global features of streams, to propose SigMahaKNN, a method for anomaly detection on (multivariate) streams. We show that SigMahaKNN is invariant to stream reparametrisation, stream concatenation and has a graded discrimination power depending on the truncation level of the path signature. We implement SigMahaKNN as an open-source software, and perform extensive numerical experiments, showing significantly improved anomaly detection on streams compared to isolation forest and local outlier factors in applications ranging from language analysis, hand-writing analysis, ship movement paths analysis and univariate time-series analysis. △ Less

Submitted 6 December, 2023; v1 submitted 5 June, 2020; originally announced June 2020.

arXiv:2006.00873 [pdf, other]

A Generalised Signature Method for Multivariate Time Series Feature Extraction

Authors: James Morrill, Adeline Fermanian, Patrick Kidger, Terry Lyons

Abstract: The 'signature method' refers to a collection of feature extraction techniques for multivariate time series, derived from the theory of controlled differential equations. There is a great deal of flexibility as to how this method can be applied. On the one hand, this flexibility allows the method to be tailored to specific problems, but on the other hand, can make precise application challenging.… ▽ More The 'signature method' refers to a collection of feature extraction techniques for multivariate time series, derived from the theory of controlled differential equations. There is a great deal of flexibility as to how this method can be applied. On the one hand, this flexibility allows the method to be tailored to specific problems, but on the other hand, can make precise application challenging. This paper makes two contributions. First, the variations on the signature method are unified into a general approach, the \emph{generalised signature method}, of which previous variations are special cases. A primary aim of this unifying framework is to make the signature method more accessible to any machine learning practitioner, whereas it is now mostly used by specialists. Second, and within this framework, we derive a canonical collection of choices that provide a domain-agnostic starting point. We derive these choices as a result of an extensive empirical study on 26 datasets and go on to show competitive performance against current benchmarks for multivariate time series classification. Finally, to ease practical application, we make our techniques available as part of the open-source [redacted] project. △ Less

Submitted 6 February, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: 25 pages

arXiv:2005.13948 [pdf, other]

Generalised Interpretable Shapelets for Irregular Time Series

Authors: Patrick Kidger, James Morrill, Terry Lyons

Abstract: The shapelet transform is a form of feature extraction for time series, in which a time series is described by its similarity to each of a collection of `shapelets'. However it has previously suffered from a number of limitations, such as being limited to regularly-spaced fully-observed time series, and having to choose between efficient training and interpretability. Here, we extend the method to… ▽ More The shapelet transform is a form of feature extraction for time series, in which a time series is described by its similarity to each of a collection of `shapelets'. However it has previously suffered from a number of limitations, such as being limited to regularly-spaced fully-observed time series, and having to choose between efficient training and interpretability. Here, we extend the method to continuous time, and in doing so handle the general case of irregularly-sampled partially-observed multivariate time series. Furthermore, we show that a simple regularisation penalty may be used to train efficiently without sacrificing interpretability. The continuous-time formulation additionally allows for learning the length of each shapelet (previously a discrete object) in a differentiable manner. Finally, we demonstrate that the measure of similarity between time series may be generalised to a learnt pseudometric. We validate our method by demonstrating its performance and interpretability on several datasets; for example we discover (purely from data) that the digits 5 and 6 may be distinguished by the chirality of their bottom loop, and that a kind of spectral gap exists in spoken audio classification. △ Less

Submitted 29 May, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

arXiv:2005.08926 [pdf, ps, other]

Neural Controlled Differential Equations for Irregular Time Series

Authors: Patrick Kidger, James Morrill, James Foster, Terry Lyons

Abstract: Neural ordinary differential equations are an attractive option for modelling temporal dynamics. However, a fundamental issue is that the solution to an ordinary differential equation is determined by its initial condition, and there is no mechanism for adjusting the trajectory based on subsequent observations. Here, we demonstrate how this may be resolved through the well-understood mathematics o… ▽ More Neural ordinary differential equations are an attractive option for modelling temporal dynamics. However, a fundamental issue is that the solution to an ordinary differential equation is determined by its initial condition, and there is no mechanism for adjusting the trajectory based on subsequent observations. Here, we demonstrate how this may be resolved through the well-understood mathematics of \emph{controlled differential equations}. The resulting \emph{neural controlled differential equation} model is directly applicable to the general setting of partially-observed irregularly-sampled multivariate time series, and (unlike previous work on this problem) it may utilise memory-efficient adjoint-based backpropagation even across observations. We demonstrate that our model achieves state-of-the-art performance against similar (ODE or RNN based) models in empirical studies on a range of datasets. Finally we provide theoretical results demonstrating universal approximation, and that our model subsumes alternative ODE models. △ Less

Submitted 5 November, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: Accepted at NeurIPS 2020 (Spotlight)

arXiv:2004.04006 [pdf, other]

Signature features with the visibility transformation

Authors: Yue Wu, Hao Ni, Terence J. Lyons, Robin L. Hudson

Abstract: In this paper we put the visibility transformation on a clear theoretical footing and show that this transform is able to embed the effect of the absolute position of the data stream into signature features in a unified and efficient way. The generated feature set is particularly useful in pattern recognition tasks, for its simplifying role in allowing the signature feature set to accommodate nonl… ▽ More In this paper we put the visibility transformation on a clear theoretical footing and show that this transform is able to embed the effect of the absolute position of the data stream into signature features in a unified and efficient way. The generated feature set is particularly useful in pattern recognition tasks, for its simplifying role in allowing the signature feature set to accommodate nonlinear functions of absolute and relative values. △ Less

Submitted 8 October, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

MSC Class: 60L10

arXiv:2002.03419 [pdf, other]

The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up

Authors: Razvan V. Marinescu, Neil P. Oxtoby, Alexandra L. Young, Esther E. Bron, Arthur W. Toga, Michael W. Weiner, Frederik Barkhof, Nick C. Fox, Arman Eshaghi, Tina Toni, Marcin Salaterski, Veronika Lunina, Manon Ansart, Stanley Durrleman, Pascal Lu, Samuel Iddi, Dan Li, Wesley K. Thompson, Michael C. Donohue, Aviv Nahon, Yarden Levy, Dan Halbersberg, Mariya Cohen, Huiling Liao, Tengfei Li , et al. (71 additional authors not shown)

Abstract: We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcome… ▽ More We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcomes: clinical diagnosis, Alzheimer's Disease Assessment Scale Cognitive Subdomain (ADAS-Cog13), and total volume of the ventricles. The methods used by challenge participants included multivariate linear regression, machine learning methods such as support vector machines and deep neural networks, as well as disease progression models. No single submission was best at predicting all three outcomes. For clinical diagnosis and ventricle volume prediction, the best algorithms strongly outperform simple baselines in predictive ability. However, for ADAS-Cog13 no single submitted prediction method was significantly better than random guesswork. Two ensemble methods based on taking the mean and median over all predictions, obtained top scores on almost all tasks. Better than average performance at diagnosis prediction was generally associated with the additional inclusion of features from cerebrospinal fluid (CSF) samples and diffusion tensor imaging (DTI). On the other hand, better performance at ventricle volume prediction was associated with inclusion of summary statistics, such as the slope or maxima/minima of biomarkers. TADPOLE's unique results suggest that current prediction algorithms provide sufficient accuracy to exploit biomarkers related to clinical diagnosis and ventricle volume, for cohort refinement in clinical trials for Alzheimer's disease. However, results call into question the usage of cognitive test scores for patient selection and as a primary endpoint in clinical trials. △ Less

Submitted 27 December, 2021; v1 submitted 9 February, 2020; originally announced February 2020.

Comments: Presents final results of the TADPOLE competition. 60 pages, 7 tables, 14 figures

Journal ref: Machine Learning for Biomedical Imaging (MELBA), Dec 2021

arXiv:2001.00706 [pdf, other]

Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU

Authors: Patrick Kidger, Terry Lyons

Abstract: Signatory is a library for calculating and performing functionality related to the signature and logsignature transforms. The focus is on machine learning, and as such includes features such as CPU parallelism, GPU support, and backpropagation. To our knowledge it is the first GPU-capable library for these operations. Signatory implements new features not available in previous libraries, such as e… ▽ More Signatory is a library for calculating and performing functionality related to the signature and logsignature transforms. The focus is on machine learning, and as such includes features such as CPU parallelism, GPU support, and backpropagation. To our knowledge it is the first GPU-capable library for these operations. Signatory implements new features not available in previous libraries, such as efficient precomputation strategies. Furthermore, several novel algorithmic improvements are introduced, producing substantial real-world speedups even on the CPU without parallelism. The library operates as a Python wrapper around C++, and is compatible with the PyTorch ecosystem. It may be installed directly via \texttt{pip}. Source code, documentation, examples, benchmarks and tests may be found at \texttt{\url{https://github.com/patrick-kidger/signatory}}. The license is Apache-2.0. △ Less

Submitted 5 February, 2021; v1 submitted 2 January, 2020; originally announced January 2020.

Comments: Published at ICLR 2021

MSC Class: 60H30; 68T99; 68N30

arXiv:1908.08286 [pdf, other]

Learning stochastic differential equations using RNN with log signature features

Authors: Shujian Liao, Terry Lyons, Weixin Yang, Hao Ni

Abstract: This paper contributes to the challenge of learning a function on streamed multimodal data through evaluation. The core of the result of our paper is the combination of two quite different approaches to this problem. One comes from the mathematically principled technology of signatures and log-signatures as representations for streamed data, while the other draws on the techniques of recurrent neu… ▽ More This paper contributes to the challenge of learning a function on streamed multimodal data through evaluation. The core of the result of our paper is the combination of two quite different approaches to this problem. One comes from the mathematically principled technology of signatures and log-signatures as representations for streamed data, while the other draws on the techniques of recurrent neural networks (RNN). The ability of the former to manage high sample rate streams and the latter to manage large scale nonlinear interactions allows hybrid algorithms that are easy to code, quicker to train, and of lower complexity for a given accuracy. We illustrate the approach by approximating the unknown functional as a controlled differential equation. Linear functionals on solutions of controlled differential equations are the natural universal class of functions on data streams. Following this approach, we propose a hybrid Logsig-RNN algorithm that learns functionals on streamed data. By testing on various datasets, i.e. synthetic data, NTU RGB+D 120 skeletal action data, and Chalearn2013 gesture data, our algorithm achieves the outstanding accuracy with superior efficiency and robustness. △ Less

Submitted 22 September, 2019; v1 submitted 22 August, 2019; originally announced August 2019.

arXiv:1905.08539 [pdf, ps, other]

Universal Approximation with Deep Narrow Networks

Authors: Patrick Kidger, Terry Lyons

Abstract: The classical Universal Approximation Theorem holds for neural networks of arbitrary width and bounded depth. Here we consider the natural `dual' scenario for networks of bounded width and arbitrary depth. Precisely, let $n$ be the number of inputs neurons, $m$ be the number of output neurons, and let $ρ$ be any nonaffine continuous function, with a continuous nonzero derivative at some point. The… ▽ More The classical Universal Approximation Theorem holds for neural networks of arbitrary width and bounded depth. Here we consider the natural `dual' scenario for networks of bounded width and arbitrary depth. Precisely, let $n$ be the number of inputs neurons, $m$ be the number of output neurons, and let $ρ$ be any nonaffine continuous function, with a continuous nonzero derivative at some point. Then we show that the class of neural networks of arbitrary depth, width $n + m + 2$, and activation function $ρ$, is dense in $C(K; \mathbb{R}^m)$ for $K \subseteq \mathbb{R}^n$ with $K$ compact. This covers every activation function possible to use in practice, and also includes polynomial activation functions, which is unlike the classical version of the theorem, and provides a qualitative difference between deep narrow networks and shallow wide networks. We then consider several extensions of this result. In particular we consider nowhere differentiable activation functions, density in noncompact domains with respect to the $L^p$-norm, and how the width may be reduced to just $n + m + 1$ for `most' activation functions. △ Less

Submitted 8 June, 2020; v1 submitted 21 May, 2019; originally announced May 2019.

Comments: Accepted at COLT 2020

MSC Class: 41A46; 41A63; 68T07

arXiv:1905.08494 [pdf, other]

Deep Signature Transforms

Authors: Patric Bonnier, Patrick Kidger, Imanol Perez Arribas, Cristopher Salvi, Terry Lyons

Abstract: The signature is an infinite graded sequence of statistics known to characterise a stream of data up to a negligible equivalence class. It is a transform which has previously been treated as a fixed feature transformation, on top of which a model may be built. We propose a novel approach which combines the advantages of the signature transform with modern deep learning frameworks. By learning an a… ▽ More The signature is an infinite graded sequence of statistics known to characterise a stream of data up to a negligible equivalence class. It is a transform which has previously been treated as a fixed feature transformation, on top of which a model may be built. We propose a novel approach which combines the advantages of the signature transform with modern deep learning frameworks. By learning an augmentation of the stream prior to the signature transform, the terms of the signature may be selected in a data-dependent way. More generally, we describe how the signature transform may be used as a layer anywhere within a neural network. In this context it may be interpreted as a pooling operation. We present the results of empirical experiments to back up the theoretical justification. Code available at https://github.com/patrick-kidger/Deep-Signature-Transforms. △ Less

Submitted 26 October, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

Comments: Published at NeurIPS 2019

MSC Class: 68T01

arXiv:1808.05865 [pdf, ps, other]

doi 10.1371/journal.pone.0222212

Using path signatures to predict a diagnosis of Alzheimer's disease

Authors: P. J. Moore, J. Gallacher, T. J. Lyons

Abstract: The path signature is a means of feature generation that can encode nonlinear interactions in the data as well as the usual linear features. It can distinguish the ordering of time-sequenced changes: for example whether or not the hippocampus shrinks fast, then slowly or the converse. It provides interpretable features and its output is a fixed length vector irrespective of the number of input poi… ▽ More The path signature is a means of feature generation that can encode nonlinear interactions in the data as well as the usual linear features. It can distinguish the ordering of time-sequenced changes: for example whether or not the hippocampus shrinks fast, then slowly or the converse. It provides interpretable features and its output is a fixed length vector irrespective of the number of input points so it can encode longitudinal data of varying length and with missing data points. In this paper we demonstrate the path signature in providing features to distinguish a set of people with Alzheimer's disease from a matched set of healthy individuals. The data used are volume measurements of the whole brain, ventricles and hippocampus from the Alzheimer's Disease Neuroimaging Initiative (ADNI). The path signature method is shown to be a useful tool for the processing of sequential data which is becoming increasingly available as monitoring technologies are applied. △ Less

Submitted 16 August, 2018; originally announced August 2018.

Comments: 5 pages, 3 figures. arXiv admin note: text overlap with arXiv:1808.03273

MSC Class: 62J12; 92D30

arXiv:1808.03273 [pdf, ps, other]

doi 10.1371/journal.pone.0211558

Random forest prediction of Alzheimer's disease using pairwise selection from time series data

Authors: Paul Moore, Terry Lyons, John Gallacher

Abstract: Time-dependent data collected in studies of Alzheimer's disease usually has missing and irregularly sampled data points. For this reason time series methods which assume regular sampling cannot be applied directly to the data without a pre-processing step. In this paper we use a machine learning method to learn the relationship between pairs of data points at different time separations. The input… ▽ More Time-dependent data collected in studies of Alzheimer's disease usually has missing and irregularly sampled data points. For this reason time series methods which assume regular sampling cannot be applied directly to the data without a pre-processing step. In this paper we use a machine learning method to learn the relationship between pairs of data points at different time separations. The input vector comprises a summary of the time series history and includes both demographic and non-time varying variables such as genetic data. The dataset used is from the 2017 TADPOLE grand challenge which aims to predict the onset of Alzheimer's disease using including demographic, physical and cognitive data. The challenge is a three-fold diagnosis classification into AD, MCI and control groups, the prediction of ADAS-13 score and the normalised ventricle volume. While the competition proceeds, forecasting methods may be compared using a leaderboard dataset selected from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and with standard metrics for measuring accuracy. For diagnosis, we find an mAUC of 0.82, and a classification accuracy of 0.73. The results show that the method is effective and comparable with other methods. △ Less

Submitted 9 August, 2018; originally announced August 2018.

Comments: 6 pages, 1 figure, 6 tables

MSC Class: 62M10

arXiv:1805.03911 [pdf, other]

Labelling as an unsupervised learning problem

Authors: Terry Lyons, Imanol Perez Arribas

Abstract: Unravelling hidden patterns in datasets is a classical problem with many potential applications. In this paper, we present a challenge whose objective is to discover nonlinear relationships in noisy cloud of points. If a set of point satisfies a nonlinear relationship that is unlikely to be due to randomness, we will label the set with this relationship. Since points can satisfy one, many or no su… ▽ More Unravelling hidden patterns in datasets is a classical problem with many potential applications. In this paper, we present a challenge whose objective is to discover nonlinear relationships in noisy cloud of points. If a set of point satisfies a nonlinear relationship that is unlikely to be due to randomness, we will label the set with this relationship. Since points can satisfy one, many or no such nonlinear relationships, cloud of points will typically have one, multiple or no labels at all. This introduces the labelling problem that will be studied in this paper. The objective of this paper is to develop a framework for the labelling problem. We introduce a precise notion of a label, and we propose an algorithm to discover such labels in a given dataset, which is then tested in synthetic datasets. We also analyse, using tools from random matrix theory, the problem of discovering false labels in the dataset. △ Less

Submitted 30 May, 2018; v1 submitted 10 May, 2018; originally announced May 2018.

arXiv:1708.09708 [pdf, ps, other]

Sketching the order of events

Authors: Terry Lyons, Harald Oberhauser

Abstract: We introduce features for massive data streams. These stream features can be thought of as "ordered moments" and generalize stream sketches from "moments of order one" to "ordered moments of arbitrary order". In analogy to classic moments, they have theoretical guarantees such as universality that are important for learning algorithms. We introduce features for massive data streams. These stream features can be thought of as "ordered moments" and generalize stream sketches from "moments of order one" to "ordered moments of arbitrary order". In analogy to classic moments, they have theoretical guarantees such as universality that are important for learning algorithms. △ Less

Submitted 31 August, 2017; originally announced August 2017.

arXiv:1708.01206 [pdf, other]

Detecting early signs of depressive and manic episodes in patients with bipolar disorder using the signature-based model

Authors: Andrey Kormilitzin, Kate E. A. Saunders, Paul J. Harrison, John R. Geddes, Terry Lyons

Abstract: Recurrent major mood episodes and subsyndromal mood instability cause substantial disability in patients with bipolar disorder. Early identification of mood episodes enabling timely mood stabilisation is an important clinical goal. Recent technological advances allow the prospective reporting of mood in real time enabling more accurate, efficient data capture. The complex nature of these data stre… ▽ More Recurrent major mood episodes and subsyndromal mood instability cause substantial disability in patients with bipolar disorder. Early identification of mood episodes enabling timely mood stabilisation is an important clinical goal. Recent technological advances allow the prospective reporting of mood in real time enabling more accurate, efficient data capture. The complex nature of these data streams in combination with challenge of deriving meaning from missing data mean pose a significant analytic challenge. The signature method is derived from stochastic analysis and has the ability to capture important properties of complex ordered time series data. To explore whether the onset of episodes of mania and depression can be identified using self-reported mood data. △ Less

Submitted 3 August, 2017; originally announced August 2017.

Comments: 12 pages, 3 tables, 10 figures

arXiv:1707.07124 [pdf, other]

doi 10.1038/s41398-018-0334-0

A signature-based machine learning model for bipolar disorder and borderline personality disorder

Authors: Imanol Perez Arribas, Kate Saunders, Guy Goodwin, Terry Lyons

Abstract: Mobile technologies offer opportunities for higher resolution monitoring of health conditions. This opportunity seems of particular promise in psychiatry where diagnoses often rely on retrospective and subjective recall of mood states. However, getting actionable information from these rather complex time series is challenging, and at present the implications for clinical care are largely hypothet… ▽ More Mobile technologies offer opportunities for higher resolution monitoring of health conditions. This opportunity seems of particular promise in psychiatry where diagnoses often rely on retrospective and subjective recall of mood states. However, getting actionable information from these rather complex time series is challenging, and at present the implications for clinical care are largely hypothetical. This research demonstrates that, with well chosen cohorts (of bipolar disorder, borderline personality disorder, and control) and modern methods, it is possible to objectively learn to identify distinctive behaviour over short periods (20 reports) that effectively separate the cohorts. Participants with bipolar disorder or borderline personality disorder and healthy volunteers completed daily mood ratings using a bespoke smartphone app for up to a year. A signature-based machine learning model was used to classify participants on the basis of the interrelationship between the different mood items assessed and to predict subsequent mood. The signature methodology was significantly superior to earlier statistical approaches applied to this data in distinguishing the participant three groups, clearly placing 75% into their original groups on the basis of their reports. Subsequent mood ratings were correctly predicted with greater than 70% accuracy in all groups. Prediction of mood was most accurate in healthy volunteers (89-98%) compared to bipolar disorder (82-90%) and borderline personality disorder (70-78%). △ Less

Submitted 4 October, 2017; v1 submitted 22 July, 2017; originally announced July 2017.

arXiv:1606.02074 [pdf, ps, other]

Application of the Signature Method to Pattern Recognition in the CEQUEL Clinical Trial

Authors: A. B. Kormilitzin, K. E. A. Saunders, P. J. Harrison, J. R. Geddes, T. J. Lyons

Abstract: The classification procedure of streaming data usually requires various ad hoc methods or particular heuristic models. We explore a novel non-parametric and systematic approach to analysis of heterogeneous sequential data. We demonstrate an application of this method to classification of the delays in responding to the prompts, from subjects with bipolar disorder collected during a clinical trial,… ▽ More The classification procedure of streaming data usually requires various ad hoc methods or particular heuristic models. We explore a novel non-parametric and systematic approach to analysis of heterogeneous sequential data. We demonstrate an application of this method to classification of the delays in responding to the prompts, from subjects with bipolar disorder collected during a clinical trial, using both synthetic and real examples. We show how this method can provide a natural and systematic way to extract characteristic features from sequential data. △ Less

Submitted 7 June, 2016; originally announced June 2016.

Comments: 16 pages, 7 figures

Showing 1–30 of 30 results for author: Lyons, T