Search | arXiv e-print repository

Multivariate Online Linear Regression for Hierarchical Forecasting

Authors: Massil Hihat, Guillaume Garrigos, Adeline Fermanian, Simon Bussy

Abstract: In this paper, we consider a deterministic online linear regression model where we allow the responses to be multivariate. To address this problem, we introduce MultiVAW, a method that extends the well-known Vovk-Azoury-Warmuth algorithm to the multivariate setting, and show that it also enjoys logarithmic regret in time. We apply our results to the online hierarchical forecasting problem and reco… ▽ More In this paper, we consider a deterministic online linear regression model where we allow the responses to be multivariate. To address this problem, we introduce MultiVAW, a method that extends the well-known Vovk-Azoury-Warmuth algorithm to the multivariate setting, and show that it also enjoys logarithmic regret in time. We apply our results to the online hierarchical forecasting problem and recover an algorithm from this literature as a special case, allowing us to relax the hypotheses usually made for its analysis. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.02857 [pdf, other]

Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation

Authors: Sobihan Surendran, Antoine Godichon-Baggioni, Adeline Fermanian, Sylvain Le Corff

Abstract: Stochastic Gradient Descent (SGD) with adaptive steps is now widely used for training deep neural networks. Most theoretical results assume access to unbiased gradient estimators, which is not the case in several recent deep learning and reinforcement learning applications that use Monte Carlo methods. This paper provides a comprehensive non-asymptotic analysis of SGD with biased gradients and ada… ▽ More Stochastic Gradient Descent (SGD) with adaptive steps is now widely used for training deep neural networks. Most theoretical results assume access to unbiased gradient estimators, which is not the case in several recent deep learning and reinforcement learning applications that use Monte Carlo methods. This paper provides a comprehensive non-asymptotic analysis of SGD with biased gradients and adaptive steps for convex and non-convex smooth functions. Our study incorporates time-dependent bias and emphasizes the importance of controlling the bias and Mean Squared Error (MSE) of the gradient estimator. In particular, we establish that Adagrad and RMSProp with biased gradients converge to critical points for smooth non-convex functions at a rate similar to existing results in the literature for the unbiased case. Finally, we provide experimental results using Variational Autoenconders (VAE) that illustrate our convergence results and show how the effect of bias can be reduced by appropriate hyperparameter tuning. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.17077 [pdf, other]

Dynamical Survival Analysis with Controlled Latent States

Authors: Linus Bleistein, Van-Tuan Nguyen, Adeline Fermanian, Agathe Guilloux

Abstract: We consider the task of learning individual-specific intensities of counting processes from a set of static variables and irregularly sampled time series. We introduce a novel modelization approach in which the intensity is the solution to a controlled differential equation. We first design a neural estimator by building on neural controlled differential equations. In a second time, we show that o… ▽ More We consider the task of learning individual-specific intensities of counting processes from a set of static variables and irregularly sampled time series. We introduce a novel modelization approach in which the intensity is the solution to a controlled differential equation. We first design a neural estimator by building on neural controlled differential equations. In a second time, we show that our model can be linearized in the signature space under sufficient regularity conditions, yielding a signature-based estimator which we call CoxSig. We provide theoretical learning guarantees for both estimators, before showcasing the performance of our models on a vast array of simulated and real-world datasets from finance, predictive maintenance and food supply chain management. △ Less

Submitted 4 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: ICML 2024

arXiv:2309.03714 [pdf, other]

FLASH: a Fast joint model for Longitudinal And Survival data in High dimension

Authors: Van Tuan Nguyen, Adeline Fermanian, Agathe Guilloux, Antoine Barbieri, Sarah Zohar, Anne-Sophie Jannot, Simon Bussy

Abstract: This paper introduces a prognostic method called FLASH that addresses the problem of joint modelling of longitudinal data and censored durations when a large number of both longitudinal and time-independent features are available. In the literature, standard joint models are either of the shared random effect or joint latent class type. Combining ideas from both worlds and using appropriate regula… ▽ More This paper introduces a prognostic method called FLASH that addresses the problem of joint modelling of longitudinal data and censored durations when a large number of both longitudinal and time-independent features are available. In the literature, standard joint models are either of the shared random effect or joint latent class type. Combining ideas from both worlds and using appropriate regularisation techniques, we define a new model with the ability to automatically identify significant prognostic longitudinal features in a high-dimensional context, which is of increasing importance in many areas such as personalised medicine or churn prediction. We develop an estimation methodology based on the EM algorithm and provide an efficient implementation. The statistical performance of the method is demonstrated both in extensive Monte Carlo simulation studies and on publicly available real-world datasets. Our method significantly outperforms the state-of-the-art joint models in predicting the latent class membership probability in terms of the C-index in a so-called ``real-time'' prediction setting, with a computational speed that is orders of magnitude faster than competing methods. In addition, our model automatically identifies significant features that are relevant from a practical perspective, making it interpretable. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 30 pages, 5 figures, 2 tables

arXiv:2304.01862 [pdf, other]

The insertion method to invert the signature of a path

Authors: Adeline Fermanian, Jiawei Chang, Terry Lyons, Gérard Biau

Abstract: The signature is a representation of a path as an infinite sequence of its iterated integrals. Under certain assumptions, the signature characterizes the path, up to translation and reparameterization. Therefore, a crucial question of interest is the development of efficient algorithms to invert the signature, i.e., to reconstruct the path from the information of its (truncated) signature. In this… ▽ More The signature is a representation of a path as an infinite sequence of its iterated integrals. Under certain assumptions, the signature characterizes the path, up to translation and reparameterization. Therefore, a crucial question of interest is the development of efficient algorithms to invert the signature, i.e., to reconstruct the path from the information of its (truncated) signature. In this article, we study the insertion procedure, originally introduced by Chang and Lyons (2019), from both a theoretical and a practical point of view. After describing our version of the method, we give its rate of convergence for piecewise linear paths, accompanied by an implementation in Pytorch. The algorithm is parallelized, meaning that it is very efficient at inverting a batch of signatures simultaneously. Its performance is illustrated with both real-world and simulated examples. △ Less

Submitted 19 September, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2302.04586 [pdf, other]

New directions in the applications of rough path theory

Authors: Adeline Fermanian, Terry Lyons, James Morrill, Cristopher Salvi

Abstract: This article provides a concise overview of some of the recent advances in the application of rough path theory to machine learning. Controlled differential equations (CDEs) are discussed as the key mathematical model to describe the interaction of a stream with a physical control system. A collection of iterated integrals known as the signature naturally arises in the description of the response… ▽ More This article provides a concise overview of some of the recent advances in the application of rough path theory to machine learning. Controlled differential equations (CDEs) are discussed as the key mathematical model to describe the interaction of a stream with a physical control system. A collection of iterated integrals known as the signature naturally arises in the description of the response produced by such interactions. The signature comes equipped with a variety of powerful properties rendering it an ideal feature map for streamed data. We summarise recent advances in the symbiosis between deep learning and CDEs, studying the link with RNNs and culminating with the Neural CDE model. We concluded with a discussion on signature kernel methods. △ Less

Submitted 9 February, 2023; originally announced February 2023.

arXiv:2301.11647 [pdf, other]

Learning the Dynamics of Sparsely Observed Interacting Systems

Authors: Linus Bleistein, Adeline Fermanian, Anne-Sophie Jannot, Agathe Guilloux

Abstract: We address the problem of learning the dynamics of an unknown non-parametric system linking a target and a feature time series. The feature time series is measured on a sparse and irregular grid, while we have access to only a few points of the target time series. Once learned, we can use these dynamics to predict values of the target from the previous values of the feature time series. We frame t… ▽ More We address the problem of learning the dynamics of an unknown non-parametric system linking a target and a feature time series. The feature time series is measured on a sparse and irregular grid, while we have access to only a few points of the target time series. Once learned, we can use these dynamics to predict values of the target from the previous values of the feature time series. We frame this task as learning the solution map of a controlled differential equation (CDE). By leveraging the rich theory of signatures, we are able to cast this non-linear problem as a high-dimensional linear regression. We provide an oracle bound on the prediction error which exhibits explicit dependencies on the individual-specific sampling schemes. Our theoretical results are illustrated by simulations which show that our method outperforms existing algorithms for recovering the full time series while being computationally cheap. We conclude by demonstrating its potential on real-world epidemiological data. △ Less

Submitted 31 May, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

Comments: ICML 2023

arXiv:2206.06929 [pdf, other]

Scaling ResNets in the Large-depth Regime

Authors: Pierre Marion, Adeline Fermanian, Gérard Biau, Jean-Philippe Vert

Abstract: Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these architectures relies on a training procedure that needs to be carefully crafted to avoid vanishing or exploding gradients, particularly as the depth $L$ increases. No consensus has been reached on how to mitigate this issue, although a widely discussed… ▽ More Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these architectures relies on a training procedure that needs to be carefully crafted to avoid vanishing or exploding gradients, particularly as the depth $L$ increases. No consensus has been reached on how to mitigate this issue, although a widely discussed strategy consists in scaling the output of each layer by a factor $α_L$. We show in a probabilistic setting that with standard i.i.d.~initializations, the only non-trivial dynamics is for $α_L = \frac{1}{\sqrt{L}}$; other choices lead either to explosion or to identity map**. This scaling factor corresponds in the continuous-time limit to a neural stochastic differential equation, contrarily to a widespread interpretation that deep ResNets are discretizations of neural ordinary differential equations. By contrast, in the latter regime, stability is obtained with specific correlated initializations and $α_L = \frac{1}{L}$. Our analysis suggests a strong interplay between scaling and regularity of the weights as a function of the layer index. Finally, in a series of experiments, we exhibit a continuous range of regimes driven by these two parameters, which jointly impact performance before and after training. △ Less

Submitted 10 June, 2024; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: 44 pages, 9 figures. Updated with clarifications and additional references

arXiv:2106.01202 [pdf, other]

Framing RNN as a kernel method: A neural ODE approach

Authors: Adeline Fermanian, Pierre Marion, Jean-Philippe Vert, Gérard Biau

Abstract: Building on the interpretation of a recurrent neural network (RNN) as a continuous-time neural differential equation, we show, under appropriate conditions, that the solution of a RNN can be viewed as a linear function of a specific feature set of the input sequence, known as the signature. This connection allows us to frame a RNN as a kernel method in a suitable reproducing kernel Hilbert space.… ▽ More Building on the interpretation of a recurrent neural network (RNN) as a continuous-time neural differential equation, we show, under appropriate conditions, that the solution of a RNN can be viewed as a linear function of a specific feature set of the input sequence, known as the signature. This connection allows us to frame a RNN as a kernel method in a suitable reproducing kernel Hilbert space. As a consequence, we obtain theoretical guarantees on generalization and stability for a large class of recurrent networks. Our results are illustrated on simulated datasets. △ Less

Submitted 29 October, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: 33 pages, 7 figures, accepted for an oral presentation at NeurIPS 2021

arXiv:2006.08442 [pdf, other]

Functional linear regression with truncated signatures

Authors: Adeline Fermanian

Abstract: We place ourselves in a functional regression setting and propose a novel methodology for regressing a real output on vector-valued functional covariates. This methodology is based on the notion of signature, which is a representation of a function as an infinite series of its iterated integrals. The signature depends crucially on a truncation parameter for which an estimator is provided, together… ▽ More We place ourselves in a functional regression setting and propose a novel methodology for regressing a real output on vector-valued functional covariates. This methodology is based on the notion of signature, which is a representation of a function as an infinite series of its iterated integrals. The signature depends crucially on a truncation parameter for which an estimator is provided, together with theoretical guarantees. An empirical study on both simulated and real-world datasets shows that the resulting methodology is competitive with traditional functional linear models, in particular when the functional covariates take their values in a high dimensional space. △ Less

Submitted 16 June, 2022; v1 submitted 15 June, 2020; originally announced June 2020.

MSC Class: 62R10 (Primary); 60L10 (Secondary)

arXiv:2006.00873 [pdf, other]

A Generalised Signature Method for Multivariate Time Series Feature Extraction

Authors: James Morrill, Adeline Fermanian, Patrick Kidger, Terry Lyons

Abstract: The 'signature method' refers to a collection of feature extraction techniques for multivariate time series, derived from the theory of controlled differential equations. There is a great deal of flexibility as to how this method can be applied. On the one hand, this flexibility allows the method to be tailored to specific problems, but on the other hand, can make precise application challenging.… ▽ More The 'signature method' refers to a collection of feature extraction techniques for multivariate time series, derived from the theory of controlled differential equations. There is a great deal of flexibility as to how this method can be applied. On the one hand, this flexibility allows the method to be tailored to specific problems, but on the other hand, can make precise application challenging. This paper makes two contributions. First, the variations on the signature method are unified into a general approach, the \emph{generalised signature method}, of which previous variations are special cases. A primary aim of this unifying framework is to make the signature method more accessible to any machine learning practitioner, whereas it is now mostly used by specialists. Second, and within this framework, we derive a canonical collection of choices that provide a domain-agnostic starting point. We derive these choices as a result of an extensive empirical study on 26 datasets and go on to show competitive performance against current benchmarks for multivariate time series classification. Finally, to ease practical application, we make our techniques available as part of the open-source [redacted] project. △ Less

Submitted 6 February, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: 25 pages

arXiv:1911.13211 [pdf, other]

Embedding and learning with signatures

Authors: Adeline Fermanian

Abstract: Sequential and temporal data arise in many fields of research, such as quantitative finance, medicine, or computer vision. A novel approach for sequential learning, called the signature method and rooted in rough path theory, is considered. Its basic principle is to represent multidimensional paths by a graded feature set of their iterated integrals, called the signature. This approach relies crit… ▽ More Sequential and temporal data arise in many fields of research, such as quantitative finance, medicine, or computer vision. A novel approach for sequential learning, called the signature method and rooted in rough path theory, is considered. Its basic principle is to represent multidimensional paths by a graded feature set of their iterated integrals, called the signature. This approach relies critically on an embedding principle, which consists in representing discretely sampled data as paths, i.e., functions from $[0,1]$ to $\mathbb{R}^d$. After a survey of machine learning methodologies for signatures, the influence of embeddings on prediction accuracy is investigated with an in-depth study of three recent and challenging datasets. It is shown that a specific embedding, called lead-lag, is systematically the strongest performer across all datasets and algorithms considered. Moreover, an empirical study reveals that computing signatures over the whole path domain does not lead to a loss of local information. It is concluded that, with a good embedding, combining signatures with other simple algorithms achieves results competitive with state-of-the-art, domain-specific approaches. △ Less

Submitted 9 December, 2020; v1 submitted 29 November, 2019; originally announced November 2019.

Showing 1–12 of 12 results for author: Fermanian, A