Skip to main content

Showing 1–15 of 15 results for author: Balles, L

.
  1. arXiv:2406.03216  [pdf, other

    cs.LG cs.AI

    Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need

    Authors: Martin Wistuba, Prabhu Teja Sivaprasad, Lukas Balles, Giovanni Zappella

    Abstract: Recent Continual Learning (CL) methods have combined pretrained Transformers with prompt tuning, a parameter-efficient fine-tuning (PEFT) technique. We argue that the choice of prompt tuning in prior works was an undefended and unablated decision, which has been uncritically adopted by subsequent research, but warrants further research to understand its implications. In this paper, we conduct this… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  2. arXiv:2312.05021  [pdf, other

    cs.LG cs.AI math.OC

    A Negative Result on Gradient Matching for Selective Backprop

    Authors: Lukas Balles, Cedric Archambeau, Giovanni Zappella

    Abstract: With increasing scale in model and dataset size, the training of deep neural networks becomes a massive computational burden. One approach to speed up the training process is Selective Backprop. For this approach, we perform a forward pass to obtain a loss value for each data point in a minibatch. The backward pass is then restricted to a subset of that minibatch, prioritizing high-loss examples.… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Paper accepted at the ICBINB Workshop at NeurIPS 2023

  3. arXiv:2311.17601  [pdf, ps, other

    cs.LG cs.AI

    Continual Learning with Low Rank Adaptation

    Authors: Martin Wistuba, Prabhu Teja Sivaprasad, Lukas Balles, Giovanni Zappella

    Abstract: Recent work using pretrained transformers has shown impressive performance when fine-tuned with data from the downstream problem of interest. However, they struggle to retain that performance when the data characteristics changes. In this paper, we focus on continual learning, where a pre-trained transformer is updated to perform well on new data, while retaining its performance on data it was pre… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Accepted at Workshop on Distribution Shifts (DistShift), NeurIPS 2023

  4. arXiv:2304.12067  [pdf, other

    cs.LG cs.AI cs.CV

    Renate: A Library for Real-World Continual Learning

    Authors: Martin Wistuba, Martin Ferianc, Lukas Balles, Cedric Archambeau, Giovanni Zappella

    Abstract: Continual learning enables the incremental training of machine learning models on non-stationary data streams.While academic interest in the topic is high, there is little indication of the use of state-of-the-art continual learning algorithms in practical machine learning deployment. This paper presents Renate, a continual learning library designed to build real-world updating pipelines for PyTor… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: Paper accepted at the CLVision workshop at CVPR 2023

  5. arXiv:2207.06940  [pdf, other

    cs.LG stat.ML

    PASHA: Efficient HPO and NAS with Progressive Resource Allocation

    Authors: Ondrej Bohdal, Lukas Balles, Martin Wistuba, Beyza Ermis, Cédric Archambeau, Giovanni Zappella

    Abstract: Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidelity methods are employed. We propose an approach t… ▽ More

    Submitted 8 March, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted at ICLR 2023

  6. arXiv:2203.14544  [pdf, other

    cs.LG

    Gradient-Matching Coresets for Rehearsal-Based Continual Learning

    Authors: Lukas Balles, Giovanni Zappella, Cédric Archambeau

    Abstract: The goal of continual learning (CL) is to efficiently update a machine learning model with new data without forgetting previously-learned knowledge. Most widely-used CL methods rely on a rehearsal memory of data points to be reused while training on new data. Curating such a rehearsal memory to maintain a small, informative subset of all the data seen so far is crucial to the success of these meth… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: A short version of this paper has been presented at the NeurIPS '21 Workshop on Distribution Shifts

  7. arXiv:2112.05025  [pdf, other

    cs.LG

    Gradient-matching coresets for continual learning

    Authors: Lukas Balles, Giovanni Zappella, Cédric Archambeau

    Abstract: We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset. We evaluate the method in the context of continual learning, where it can be used to curate a rehearsal memory. Our method performs strong competitors such as reservoir sampling across a range of memo… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

    Comments: Accepted at the NeurIPS '21 Workshop on Distribution Shifts

  8. arXiv:2011.04803  [pdf, other

    cs.LG

    Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

    Authors: Ricky T. Q. Chen, Dami Choi, Lukas Balles, David Duvenaud, Philipp Hennig

    Abstract: Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based on this intuition, we explore the use of exact per-sample Hessian-vector products and gradients to construct optimizers that are self-tuning and hyper… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

  9. arXiv:2002.08056  [pdf, other

    cs.LG stat.ML

    The Geometry of Sign Gradient Descent

    Authors: Lukas Balles, Fabian Pedregosa, Nicolas Le Roux

    Abstract: Sign-based optimization methods have become popular in machine learning due to their favorable communication cost in distributed optimization and their surprisingly good performance in neural network training. Furthermore, they are closely connected to so-called adaptive gradient methods like Adam. Recent works on signSGD have used a non-standard "separable smoothness" assumption, whereas some old… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

  10. arXiv:1905.12558  [pdf, other

    cs.LG stat.ML

    Limitations of the Empirical Fisher Approximation for Natural Gradient Descent

    Authors: Frederik Kunstner, Lukas Balles, Philipp Hennig

    Abstract: Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argumen… ▽ More

    Submitted 8 June, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: V3: Minor corrections (typographic errors)

  11. arXiv:1903.05499  [pdf, other

    cs.LG stat.ML

    DeepOBS: A Deep Learning Optimizer Benchmark Suite

    Authors: Frank Schneider, Lukas Balles, Philipp Hennig

    Abstract: Because the choice and tuning of the optimizer affects the speed, and ultimately the performance of deep learning, there is significant past and recent research in this area. Yet, perhaps surprisingly, there is no generally agreed-upon protocol for the quantitative and reproducible evaluation of optimization strategies for deep learning. We suggest routines and benchmarks for stochastic optimizati… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: Accepted at ICLR 2019. 9 pages, 3 figures, 2 tables

  12. arXiv:1805.09806  [pdf, other

    cs.CV

    Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation

    Authors: Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, Michael J. Black

    Abstract: We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions. Our key insight is that these four fundamental vision problems are coupled through geometric constraints. Consequently, learning to solve them together simplifies the… ▽ More

    Submitted 11 March, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

    Comments: CVPR 2019

  13. arXiv:1705.07774  [pdf, other

    cs.LG stat.ML

    Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients

    Authors: Lukas Balles, Philipp Hennig

    Abstract: The ADAM optimizer is exceedingly popular in the deep learning community. Often it works very well, sometimes it doesn't. Why? We interpret ADAM as a combination of two aspects: for each weight, the update direction is determined by the sign of stochastic gradients, whereas the update magnitude is determined by an estimate of their relative variance. We disentangle these two aspects and analyze th… ▽ More

    Submitted 13 December, 2020; v1 submitted 22 May, 2017; originally announced May 2017.

    Comments: Presented at the 35th International Conference on Machine Learning (ICML), 2018

  14. arXiv:1703.09580  [pdf, other

    cs.LG stat.ML

    Early Stop** without a Validation Set

    Authors: Maren Mahsereci, Lukas Balles, Christoph Lassner, Philipp Hennig

    Abstract: Early stop** is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. We propose a novel early stop** crite… ▽ More

    Submitted 6 June, 2017; v1 submitted 28 March, 2017; originally announced March 2017.

    Comments: 16 pages, 10 figures

  15. arXiv:1612.05086  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Coupling Adaptive Batch Sizes with Learning Rates

    Authors: Lukas Balles, Javier Romero, Philipp Hennig

    Abstract: Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple empirical inspection. The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance of t… ▽ More

    Submitted 28 June, 2017; v1 submitted 15 December, 2016; originally announced December 2016.

    Comments: Thirty-Third Conference on Uncertainty in Artificial Intelligence (UAI), 2017, (accepted)