Skip to main content

Showing 1–29 of 29 results for author: Ollivier, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.13097  [pdf, other

    cs.LG cs.AI

    Simple Ingredients for Offline Reinforcement Learning

    Authors: Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati

    Abstract: Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task. Yet, leveraging a novel testbed (MOOD) in which trajectories come from heterogeneous sources, we show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  2. arXiv:2209.14935  [pdf, other

    cs.LG

    Does Zero-Shot Reinforcement Learning Exist?

    Authors: Ahmed Touati, Jérémy Rapin, Yann Ollivier

    Abstract: A zero-shot RL agent is an agent that can solve any RL task in a given environment, instantly with no additional planning or learning, after an initial reward-free learning phase. This marks a shift from the reward-centric RL paradigm towards "controllable" agents that can follow arbitrary instructions in an environment. Current RL agents can solve families of related tasks at best, or require pla… ▽ More

    Submitted 1 March, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Journal ref: International Conference on Learning Representations, 2023

  3. arXiv:2205.15021  [pdf, other

    cs.LG

    Agnostic Physics-Driven Deep Learning

    Authors: Benjamin Scellier, Siddhartha Mishra, Yoshua Bengio, Yann Ollivier

    Abstract: This work establishes that a physical system can perform statistical learning without gradient computations, via an Agnostic Equilibrium Propagation (Aeqprop) procedure that combines energy minimization, homeostatic control, and nudging towards the correct response. In Aeqprop, the specifics of the system do not have to be known: the procedure is based only on external manipulations, and produces… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

  4. arXiv:2106.08863  [pdf, other

    cs.LG

    Unbiased Methods for Multi-Goal Reinforcement Learning

    Authors: Léonard Blier, Yann Ollivier

    Abstract: In multi-goal reinforcement learning (RL) settings, the reward for each goal is sparse, and located in a small neighborhood of the goal. In large dimension, the probability of reaching a reward vanishes and the agent receives little learning signal. Methods such as Hindsight Experience Replay (HER) tackle this issue by also learning from realized but unplanned-for goals. But HER is known to introd… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 9 pages

  5. arXiv:2103.07945  [pdf, other

    cs.LG cs.AI math.OC

    Learning One Representation to Optimize All Rewards

    Authors: Ahmed Touati, Yann Ollivier

    Abstract: We introduce the forward-backward (FB) representation of the dynamics of a reward-free Markov decision process. It provides explicit near-optimal policies for any reward specified a posteriori. During an unsupervised phase, we use reward-free interactions with the environment to learn two representations via off-the-shelf deep learning methods and temporal difference (TD) learning. In the test pha… ▽ More

    Submitted 11 October, 2021; v1 submitted 14 March, 2021; originally announced March 2021.

  6. arXiv:2101.07123  [pdf, other

    cs.LG

    Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint

    Authors: Léonard Blier, Corentin Tallec, Yann Ollivier

    Abstract: In reinforcement learning, temporal difference-based algorithms can be sample-inefficient: for instance, with sparse rewards, no learning occurs until a reward is observed. This can be remedied by learning richer objects, such as a model of the environment, or successor states. Successor states model the expected future state occupancy from any given state for a given policy and are related to goa… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

  7. arXiv:2002.00178  [pdf, other

    cs.LG math.ST stat.ML

    An Equivalence between Bayesian Priors and Penalties in Variational Inference

    Authors: Pierre Wolinski, Guillaume Charpiat, Yann Ollivier

    Abstract: In machine learning, it is common to optimize the parameters of a probabilistic model, modulated by an ad hoc regularization term that penalizes some values of the parameters. Regularization terms appear naturally in Variational Inference, a tractable way to approximate Bayesian posteriors: the loss to optimize contains a Kullback--Leibler divergence term between the approximate posterior and a Ba… ▽ More

    Submitted 7 February, 2024; v1 submitted 1 February, 2020; originally announced February 2020.

  8. arXiv:1908.11229  [pdf, other

    stat.ML cs.CR cs.LG

    White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

    Authors: Alexandre Sablayrolles, Matthijs Douze, Yann Ollivier, Cordelia Schmid, Hervé Jégou

    Abstract: Membership inference determines, given a sample and trained parameters of a machine learning model, whether the sample was part of the training set. In this paper, we derive the optimal strategy for membership inference with a few assumptions on the distribution of the parameters. We show that optimal attacks only depend on the loss function, and thus black-box attacks are as good as white-box att… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

  9. arXiv:1902.01883  [pdf, other

    cs.LG cs.AI stat.ML

    Separating value functions across time-scales

    Authors: Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier

    Abstract: In many finite horizon episodic reinforcement learning (RL) settings, it is desirable to optimize for the undiscounted return - in settings like Atari, for instance, the goal is to collect the most points while staying alive in the long run. Yet, it may be difficult (or even intractable) mathematically to learn with this target. As such, temporal discounting is often applied to optimize over a sho… ▽ More

    Submitted 24 May, 2019; v1 submitted 5 February, 2019; originally announced February 2019.

    Comments: Full version accepted to ICML 2019. Extended abstract also to be presented at RLDM 2019

  10. arXiv:1901.09732  [pdf, other

    cs.LG stat.ML

    Making Deep Q-learning methods robust to time discretization

    Authors: Corentin Tallec, Léonard Blier, Yann Ollivier

    Abstract: Despite remarkable successes, Deep Reinforcement Learning (DRL) is not robust to hyperparameterization, implementation details, or small environment changes (Henderson et al. 2017, Zhang et al. 2018). Overcoming such sensitivity is key to making DRL applicable to real world problems. In this paper, we identify sensitivity to time discretization in near continuous-time environments as a critical fa… ▽ More

    Submitted 29 January, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

  11. arXiv:1810.01322  [pdf, other

    cs.LG cs.NE stat.ML

    Learning with Random Learning Rates

    Authors: Léonard Blier, Pierre Wolinski, Yann Ollivier

    Abstract: Hyperparameter tuning is a bothersome step in the training of deep learning models. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the 'All Learning Rates At Once' (Alrao) optimization method for neural networks: each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude.… ▽ More

    Submitted 29 January, 2019; v1 submitted 2 October, 2018; originally announced October 2018.

    Comments: 20 pages, 8 figures, code available on GitHub

  12. arXiv:1806.07185  [pdf, other

    cs.LG cs.CV stat.ML

    Mixed batches and symmetric discriminators for GAN training

    Authors: Thomas Lucas, Corentin Tallec, Jakob Verbeek, Yann Ollivier

    Abstract: Generative adversarial networks (GANs) are pow- erful generative models based on providing feed- back to a generative network via a discriminator network. However, the discriminator usually as- sesses individual samples. This prevents the dis- criminator from accessing global distributional statistics of generated samples, and often leads to mode drop**: the generator models only part of the tar… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

    Comments: Accepted at ICML 2018 (long oral)

  13. arXiv:1805.00869  [pdf, ps, other

    cs.LG math.OC stat.ML

    Approximate Temporal Difference Learning is a Gradient Descent for Reversible Policies

    Authors: Yann Ollivier

    Abstract: In reinforcement learning, temporal difference (TD) is the most direct algorithm to learn the value function of a policy. For large or infinite state spaces, exact representations of the value function are usually not available, and it must be approximated by a function in some parametric family. However, with \emph{nonlinear} parametric approximations (such as neural networks), TD is not guaran… ▽ More

    Submitted 2 May, 2018; originally announced May 2018.

  14. arXiv:1804.11188  [pdf, other

    cs.LG cs.NE stat.ML

    Can recurrent neural networks warp time?

    Authors: Corentin Tallec, Yann Ollivier

    Abstract: Successful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms. Empirically these models have been found to improve the learning of medium to long term temporal dependencies and to help with vanishing gradient issues. We prove that learnable gates in a recurrent model formally provide quasi- invariance to general time transformati… ▽ More

    Submitted 23 March, 2018; originally announced April 2018.

  15. arXiv:1802.07044  [pdf, ps, other

    cs.LG

    The Description Length of Deep Learning Models

    Authors: Léonard Blier, Yann Ollivier

    Abstract: Solomonoff's general theory of inference and the Minimum Description Length principle formalize Occam's razor, and hold that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. Deep neural networks might seem to go against this principle given the large number of parameters to be encoded. We demonstrate experimentall… ▽ More

    Submitted 1 November, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

    Comments: NIPS 2018

  16. arXiv:1802.01421  [pdf, other

    stat.ML cs.CV cs.LG

    First-order Adversarial Vulnerability of Neural Networks and Input Dimension

    Authors: Carl-Johann Simon-Gabriel, Yann Ollivier, Léon Bottou, Bernhard Schölkopf, David Lopez-Paz

    Abstract: Over the past few years, neural networks were proven vulnerable to adversarial images: targeted but imperceptible image perturbations lead to drastically different predictions. We show that adversarial vulnerability increases with the gradients of the training objective when viewed as a function of the inputs. Surprisingly, vulnerability does not depend on network topology: for many standard netwo… ▽ More

    Submitted 16 June, 2019; v1 submitted 5 February, 2018; originally announced February 2018.

    Comments: Paper previously called: "Adversarial Vulnerability of Neural Networks Increases with Input Dimension". 9 pages main text and references, 11 pages appendix, 14 figures

    MSC Class: 68T45 ACM Class: I.2.6

    Journal ref: Proceedings of ICML 2019

  17. arXiv:1712.08449  [pdf, ps, other

    stat.ML cs.LG math.OC

    True Asymptotic Natural Gradient Optimization

    Authors: Yann Ollivier

    Abstract: We introduce a simple algorithm, True Asymptotic Natural Gradient Optimization (TANGO), that converges to a true natural gradient descent in the limit of small learning rates, without explicit Fisher matrix estimation. For quadratic models the algorithm is also an instance of averaged stochastic gradient, where the parameter is a moving average of a "fast", constant-rate gradient descent. TANGO… ▽ More

    Submitted 22 December, 2017; originally announced December 2017.

  18. arXiv:1712.01076  [pdf, ps, other

    stat.ML cs.NE

    Natural Langevin Dynamics for Neural Networks

    Authors: Gaétan Marceau-Caron, Yann Ollivier

    Abstract: One way to avoid overfitting in machine learning is to use model parameters distributed according to a Bayesian posterior given the data, rather than the maximum likelihood estimator. Stochastic gradient Langevin dynamics (SGLD) is one algorithm to approximate such Bayesian posteriors for large models and datasets. SGLD is a standard stochastic gradient descent to which is added a controlled amoun… ▽ More

    Submitted 4 December, 2017; originally announced December 2017.

  19. arXiv:1705.08209  [pdf, other

    cs.NE cs.LG

    Unbiasing Truncated Backpropagation Through Time

    Authors: Corentin Tallec, Yann Ollivier

    Abstract: Truncated Backpropagation Through Time (truncated BPTT) is a widespread method for learning recurrent computational graphs. Truncated BPTT keeps the computational benefits of Backpropagation Through Time (BPTT) while relieving the need for a complete backtrack through the whole data sequence at every step. However, truncation favors short-term dependencies: the gradient estimate of truncated BPTT… ▽ More

    Submitted 23 May, 2017; originally announced May 2017.

  20. arXiv:1702.05043  [pdf, other

    cs.NE cs.LG

    Unbiased Online Recurrent Optimization

    Authors: Corentin Tallec, Yann Ollivier

    Abstract: The novel Unbiased Online Recurrent Optimization (UORO) algorithm allows for online learning of general recurrent computational graphs such as recurrent network models. It works in a streaming fashion and avoids backtracking through past activations and inputs. UORO is computationally as costly as Truncated Backpropagation Through Time (truncated BPTT), a widespread algorithm for online learning o… ▽ More

    Submitted 23 May, 2017; v1 submitted 16 February, 2017; originally announced February 2017.

    Comments: 11 pages, 5 figures

  21. arXiv:1602.08007  [pdf, other

    cs.NE cs.LG stat.ML

    Practical Riemannian Neural Networks

    Authors: Gaétan Marceau-Caron, Yann Ollivier

    Abstract: We provide the first experimental results on non-synthetic datasets for the quasi-diagonal Riemannian gradient descents for neural networks introduced in [Ollivier, 2015]. These include the MNIST, SVHN, and FACE datasets as well as a previously unpublished electroencephalogram dataset. The quasi-diagonal Riemannian algorithms consistently beat simple stochastic gradient gradient descents by a vary… ▽ More

    Submitted 25 February, 2016; originally announced February 2016.

  22. arXiv:1511.02540  [pdf, ps, other

    math.OC cs.LG stat.ML

    Speed learning on the fly

    Authors: Pierre-Yves Massé, Yann Ollivier

    Abstract: The practical performance of online stochastic gradient descent algorithms is highly dependent on the chosen step size, which must be tediously hand-tuned in many applications. The same is true for more advanced variants of stochastic gradients, such as SAGA, SVRG, or AdaGrad. Here we propose to adapt the step size by performing a gradient descent on the step size itself, viewing the whole perform… ▽ More

    Submitted 8 November, 2015; originally announced November 2015.

    Comments: preprint

  23. arXiv:1507.07680  [pdf, other

    cs.NE cs.LG stat.ML

    Training recurrent networks online without backtracking

    Authors: Yann Ollivier, Corentin Tallec, Guillaume Charpiat

    Abstract: We introduce the "NoBackTrack" algorithm to train the parameters of dynamical systems such as recurrent neural networks. This algorithm works in an online, memoryless setting, thus requiring no backpropagation through time, and is scalable, avoiding the large computational and memory cost of maintaining the full gradient of the current state with respect to the parameters. The algorithm essentia… ▽ More

    Submitted 20 November, 2015; v1 submitted 28 July, 2015; originally announced July 2015.

  24. arXiv:1503.04304  [pdf, ps, other

    cs.IT math.ST

    Laplace's rule of succession in information geometry

    Authors: Yann Ollivier

    Abstract: Laplace's "add-one" rule of succession modifies the observed frequencies in a sequence of heads and tails by adding one to the observed counts. This improves prediction by avoiding zero probabilities and corresponds to a uniform Bayesian prior on the parameter. The canonical Jeffreys prior corresponds to the "add-one-half" rule. We prove that, for exponential families of distributions, such Bayesi… ▽ More

    Submitted 14 March, 2015; originally announced March 2015.

    MSC Class: 62b10; 94a29; 62f15

  25. arXiv:1403.7752  [pdf, ps, other

    cs.NE cs.IT cs.LG

    Auto-encoders: reconstruction versus compression

    Authors: Yann Ollivier

    Abstract: We discuss the similarities and differences between training an auto-encoder to minimize the reconstruction error, and training the same auto-encoder to compress the data via a generative model. Minimizing a codelength for the data using an auto-encoder is equivalent to minimizing the reconstruction error plus some correcting terms which have an interpretation as either a denoising or contractive… ▽ More

    Submitted 23 January, 2015; v1 submitted 30 March, 2014; originally announced March 2014.

  26. arXiv:1306.0514  [pdf, ps, other

    cs.NE cs.LG

    Riemannian metrics for neural networks II: recurrent networks and learning symbolic data sequences

    Authors: Yann Ollivier

    Abstract: Recurrent neural networks are powerful models for sequential data, able to represent complex dependencies in the sequence that simpler models such as hidden Markov models cannot handle. Yet they are notoriously hard to train. Here we introduce a training procedure using a gradient ascent in a Riemannian metric: this produces an algorithm independent from design choices such as the encoding of para… ▽ More

    Submitted 3 February, 2015; v1 submitted 3 June, 2013; originally announced June 2013.

    Comments: 4th version: some changes in notation, more experiments

    MSC Class: 68T05; 68T10

  27. arXiv:1303.0818  [pdf, ps, other

    cs.NE cs.IT cs.LG math.DG

    Riemannian metrics for neural networks I: feedforward networks

    Authors: Yann Ollivier

    Abstract: We describe four algorithms for neural network training, each adapted to different scalability constraints. These algorithms are mathematically principled and invariant under a number of transformations in data and network representation, from which performance is thus independent. These algorithms are obtained from the setting of differential geometry, and are based on either the natural gradient… ▽ More

    Submitted 3 February, 2015; v1 submitted 4 March, 2013; originally announced March 2013.

    Comments: (5th version, minor changes)

    MSC Class: 68T05

  28. arXiv:1212.1524  [pdf, other

    cs.NE cs.LG stat.ML

    Layer-wise learning of deep generative models

    Authors: Ludovic Arnold, Yann Ollivier

    Abstract: When using deep, multi-layered architectures to build generative models of data, it is difficult to train all layers at once. We propose a layer-wise training procedure admitting a performance guarantee compared to the global optimum. It is based on an optimistic proxy of future performance, the best latent marginal. We interpret auto-encoders in this setting as generative models, by showing that… ▽ More

    Submitted 16 February, 2013; v1 submitted 6 December, 2012; originally announced December 2012.

  29. arXiv:1211.3831  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Objective Improvement in Information-Geometric Optimization

    Authors: Youhei Akimoto, Yann Ollivier

    Abstract: Information-Geometric Optimization (IGO) is a unified framework of stochastic algorithms for optimization problems. Given a family of probability distributions, IGO turns the original optimization problem into a new maximization problem on the parameter space of the probability distributions. IGO updates the parameter of the probability distribution along the natural gradient, taken with respect t… ▽ More

    Submitted 7 March, 2013; v1 submitted 16 November, 2012; originally announced November 2012.

    Journal ref: Foundations of Genetic Algorithms XII (2013)