Skip to main content

Showing 1–33 of 33 results for author: Oyallon, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.02613  [pdf, other

    cs.LG cs.AI

    ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training

    Authors: Adel Nabli, Louis Fournier, Pierre Erbacher, Louis Serrano, Eugene Belilovsky, Edouard Oyallon

    Abstract: Training Large Language Models (LLMs) relies heavily on distributed implementations, employing multiple GPUs to compute stochastic gradients on model replicas in parallel. However, synchronizing gradients in data parallel settings induces a communication overhead increasing with the number of distributed workers, which can impede the efficiency gains of parallelization. To address this challenge,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  2. arXiv:2406.02052  [pdf, other

    cs.LG stat.ML

    PETRA: Parallel End-to-end Training with Reversible Architectures

    Authors: Stéphane Rivaud, Louis Fournier, Thomas Pumir, Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

    Abstract: Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep learning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2406.00153  [pdf, other

    cs.LG

    $μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers

    Authors: Benjamin Thérien, Charles-Étienne Joseph, Boris Knyazev, Edouard Oyallon, Irina Rish, Eugene Belilovsky

    Abstract: Learned optimizers (LOs) can significantly reduce the wall-clock training time of neural networks, substantially reducing training costs. However, they often suffer from poor meta-generalization, especially when training networks larger than those seen during meta-training. To address this, we use the recently proposed Maximal Update Parametrization ($μ$P), which allows zero-shot generalization of… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  4. arXiv:2405.17517  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average

    Authors: Louis Fournier, Adel Nabli, Masih Aminbeidokhti, Marco Pedersoli, Eugene Belilovsky, Edouard Oyallon

    Abstract: The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at an increased cost at inference. Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model by averaging the parameters of an ensemble of models. Yet, naive averaging results in poor performance as model… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  5. arXiv:2403.08837  [pdf, other

    cs.LG cs.AI cs.DC cs.NE stat.ML

    Cyclic Data Parallelism for Efficient Parallelism of Deep Neural Networks

    Authors: Louis Fournier, Edouard Oyallon

    Abstract: Training large deep learning models requires parallelization techniques to scale. In existing methods such as Data Parallelism or ZeRO-DP, micro-batches of data are processed in parallel, which creates two drawbacks: the total memory required to store the model's activations peaks at the end of the forward pass, and gradients must be simultaneously averaged at the end of the backpropagation step.… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  6. arXiv:2312.09634  [pdf, other

    stat.ML cs.LG

    Vectorizing string entries for data processing on tables: when are larger language models better?

    Authors: Léo Grinsztajn, Edouard Oyallon, Myung Jun Kim, Gaël Varoquaux

    Abstract: There are increasingly efficient data processing pipelines that work on vectors of numbers, for instance most machine learning models, or vector databases for fast similarity search. These require converting the data to numbers. While this conversion is easy for simple numerical and categorical entries, databases are strife with text entries, such as names or descriptions. In the age of large lang… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  7. arXiv:2306.08289  [pdf, other

    cs.LG cs.AI cs.DC

    $\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in Decentralized Deep Learning

    Authors: Adel Nabli, Eugene Belilovsky, Edouard Oyallon

    Abstract: Distributed training of Deep Learning models has been critical to many recent successes in the field. Current standard methods primarily rely on synchronous centralized algorithms which induce major communication bottlenecks and synchronization locks at scale. Decentralized asynchronous algorithms are emerging as a potential alternative but their practical applicability still lags. In order to mit… ▽ More

    Submitted 6 December, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Journal ref: Thirty-seventh Conference on Neural Information Processing Systems, Dec 2023, New Orleans, United States

  8. arXiv:2306.06968  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Can Forward Gradient Match Backpropagation?

    Authors: Louis Fournier, Stéphane Rivaud, Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

    Abstract: Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While c… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Journal ref: Fortieth International Conference on Machine Learning, Jul 2023, Honolulu (Hawaii), USA, United States

  9. arXiv:2306.03937  [pdf, other

    cs.LG cs.AI

    Guiding The Last Layer in Federated Learning with Pre-Trained Models

    Authors: Gwen Legate, Nicolas Bernier, Lucas Caccia, Edouard Oyallon, Eugene Belilovsky

    Abstract: Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data. Recent works have begun to consider the effects of using pre-trained models as an initialization point for existing FL algorithms; however, these approaches ignore the vast body of efficient transfer learning literature from the centralized learning setting. Here… ▽ More

    Submitted 6 November, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

  10. arXiv:2208.00779  [pdf, ps, other

    math.OC cs.AI cs.DC

    DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization

    Authors: Adel Nabli, Edouard Oyallon

    Abstract: This work introduces DADAO: the first decentralized, accelerated, asynchronous, primal, first-order algorithm to minimize a sum of $L$-smooth and $μ$-strongly convex functions distributed over a given network of size $n$. Our key insight is based on modeling the local gradient updates and gossip communication procedures with separate independent Poisson Point Processes. This allows us to decoupl… ▽ More

    Submitted 6 December, 2023; v1 submitted 26 July, 2022; originally announced August 2022.

    Comments: International Conference on Machine Learning, Jul 2023, Honolulu, United States

  11. arXiv:2207.08815  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Why do tree-based models still outperform deep learning on tabular data?

    Authors: Léo Grinsztajn, Edouard Oyallon, Gaël Varoquaux

    Abstract: While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear. We contribute extensive benchmarks of standard and novel deep learning methods as well as tree-based models such as XGBoost and Random Forests, across a large number of datasets and hyperparameter combinations. We define a standard set of 45 datasets from varied domains wit… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  12. arXiv:2207.03485  [pdf, ps, other

    cs.LG cs.AI cs.NE

    On Non-Linear operators for Geometric Deep Learning

    Authors: Grégoire Sergeant-Perthuis, Jakob Maier, Joan Bruna, Edouard Oyallon

    Abstract: This work studies operators map** vector and scalar fields defined over a manifold $\mathcal{M}$, and which commute with its group of diffeomorphisms $\text{Diff}(\mathcal{M})$. We prove that in the case of scalar fields $L^p_ω(\mathcal{M,\mathbb{R}})$, those operators correspond to point-wise non-linearities, recovering and extending known results on $\mathbb{R}^d$. In the context of Neural Net… ▽ More

    Submitted 9 February, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

  13. arXiv:2201.11986  [pdf, other

    cs.LG cs.AI

    Gradient Masked Averaging for Federated Learning

    Authors: Irene Tenison, Sai Aravind Sreeramadas, Vaikkunth Mugunthan, Edouard Oyallon, Irina Rish, Eugene Belilovsky

    Abstract: Federated learning (FL) is an emerging paradigm that permits a large number of clients with heterogeneous data to coordinate learning of a unified global model without the need to share data amongst each other. A major challenge in federated learning is the heterogeneity of data across client, which can degrade the performance of standard FL algorithms. Standard FL algorithms involve averaging of… ▽ More

    Submitted 14 November, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

  14. arXiv:2107.12800  [pdf, other

    cs.LG cs.CV

    Deep Reinforcement Learning for L3 Slice Localization in Sarcopenia Assessment

    Authors: Othmane Laousy, Guillaume Chassagnon, Edouard Oyallon, Nikos Paragios, Marie-Pierre Revel, Maria Vakalopoulou

    Abstract: Sarcopenia is a medical condition characterized by a reduction in muscle mass and function. A quantitative diagnosis technique consists of localizing the CT slice passing through the middle of the third lumbar area (L3) and segmenting muscles at this level. In this paper, we propose a deep reinforcement learning method for accurate localization of the L3 CT slice. Our method trains a reinforcement… ▽ More

    Submitted 13 August, 2021; v1 submitted 27 July, 2021; originally announced July 2021.

  15. arXiv:2106.07360  [pdf, other

    cs.SI cs.LG

    Low-Rank Projections of GCNs Laplacian

    Authors: Nathan Grinsztajn, Philippe Preux, Edouard Oyallon

    Abstract: In this work, we study the behavior of standard models for community detection under spectral manipulations. Through various ablation experiments, we evaluate the impact of bandpass filtering on the performance of a GCN: we empirically show that most of the necessary and used information for nodes classification is contained in the low-frequency domain, and thus contrary to images, high frequencie… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Journal ref: ICLR 2021 Workshop GTRL, 2021, Online, France

  16. arXiv:2106.06401  [pdf, other

    cs.LG cs.DC

    Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

    Authors: Eugene Belilovsky, Louis Leconte, Lucas Caccia, Michael Eickenberg, Edouard Oyallon

    Abstract: A commonly cited inefficiency of neural network training using back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can alleviate this issue have been proposed. In this context, we consider a simple alternative based on minimal feedback, which we call Decoupled Greedy Learning (DGL). It… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:1901.08164

  17. arXiv:2106.05875  [pdf, other

    cs.LG

    Interferometric Graph Transform for Community Labeling

    Authors: Nathan Grinsztajn, Louis Leconte, Philippe Preux, Edouard Oyallon

    Abstract: We present a new approach for learning unsupervised node representations in community graphs. We significantly extend the Interferometric Graph Transform (IGT) to community labeling: this non-linear operator iteratively extracts features that take advantage of the graph topology through demodulation operations. An unsupervised feature extraction step cascades modulus non-linearity with linear oper… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  18. arXiv:2101.07528  [pdf, other

    cs.CV cs.LG

    The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

    Authors: Louis Thiry, Michael Arbel, Eugene Belilovsky, Edouard Oyallon

    Abstract: A recent line of work showed that various forms of convolutional kernel methods can be competitive with standard supervised deep convolutional networks on datasets like CIFAR-10, obtaining accuracies in the range of 87-90% while being more amenable to theoretical analysis. In this work, we highlight the importance of a data-dependent feature extraction step that is key to the obtain good performan… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

    Journal ref: International Conference on Learning Representation (ICLR 2021), 2021, Vienna (online), Austria

  19. arXiv:2006.05722  [pdf, other

    cs.LG stat.ML

    Interferometric Graph Transform: a Deep Unsupervised Graph Representation

    Authors: Edouard Oyallon

    Abstract: We propose the Interferometric Graph Transform (IGT), which is a new class of deep unsupervised graph convolutional neural network for building graph representations. Our first contribution is to propose a generic, complex-valued spectral graph architecture obtained from a generalization of the Euclidean Fourier transform. We show that our learned representation consists of both discriminative and… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

    Journal ref: International Conference on Machine Learning (ICML), 2020, Online, Austria

  20. arXiv:1901.08164  [pdf, other

    cs.LG stat.ML

    Decoupled Greedy Learning of CNNs

    Authors: Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

    Abstract: A commonly cited inefficiency of neural network training by back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can alleviate this issue have been proposed. In this context, we consider a simpler, but more effective, substitute that uses minimal feedback, which we call Decoupled Greedy… ▽ More

    Submitted 19 June, 2020; v1 submitted 23 January, 2019; originally announced January 2019.

  21. arXiv:1812.11446  [pdf, other

    cs.LG stat.ML

    Greedy Layerwise Learning Can Scale to ImageNet

    Authors: Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

    Abstract: Shallow supervised 1-hidden layer neural networks have a number of favorable properties that make them easier to interpret, analyze, and optimize than their deep counterparts, but lack their representational power. Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks. Contrary to previous approaches usin… ▽ More

    Submitted 23 April, 2019; v1 submitted 29 December, 2018; originally announced December 2018.

  22. arXiv:1812.11214  [pdf, ps, other

    cs.LG cs.CV cs.SD eess.AS stat.ML

    Kymatio: Scattering Transforms in Python

    Authors: Mathieu Andreux, Tomás Angles, Georgios Exarchakis, Roberto Leonarduzzi, Gaspar Rochette, Louis Thiry, John Zarka, Stéphane Mallat, Joakim andén, Eugene Belilovsky, Joan Bruna, Vincent Lostanlen, Muawiz Chaudhary, Matthew J. Hirn, Edouard Oyallon, Sixin Zhang, Carmine Cella, Michael Eickenberg

    Abstract: The wavelet scattering transform is an invariant signal representation suitable for many signal processing and machine learning applications. We present the Kymatio software package, an easy-to-use, high-performance Python implementation of the scattering transform in 1D, 2D, and 3D that is compatible with modern deep learning frameworks. All transforms may be executed on a GPU (in addition to CPU… ▽ More

    Submitted 31 May, 2022; v1 submitted 28 December, 2018; originally announced December 2018.

  23. arXiv:1812.07956  [pdf, other

    math.OC cs.LG

    On Lazy Training in Differentiable Programming

    Authors: Lenaic Chizat, Edouard Oyallon, Francis Bach

    Abstract: In a series of recent theoretical works, it was shown that strongly over-parameterized neural networks trained with gradient-based methods could converge exponentially fast to zero training loss, with their parameters hardly varying. In this work, we show that this "lazy training" phenomenon is not specific to over-parameterized neural networks, and is due to a choice of scaling, often implicit, t… ▽ More

    Submitted 7 January, 2020; v1 submitted 19 December, 2018; originally announced December 2018.

    Journal ref: Advances in Neural Information Processing Systems (NeurIPS), Dec 2019, Vancouver, Canada

  24. Compressing the Input for CNNs with the First-Order Scattering Transform

    Authors: Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko, Michal Valko

    Abstract: We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN). We show theoretical and empirical evidence that in the case of natural images and sufficiently small translation invariance, this transform preserves most of the signal information needed for classification while substantially reducing the spatial resolution and t… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

    Journal ref: ECCV 2018

  25. arXiv:1809.06367  [pdf, other

    cs.LG cs.CV stat.ML

    Scattering Networks for Hybrid Representation Learning

    Authors: Edouard Oyallon, Sergey Zagoruyko, Gabriel Huang, Nikos Komodakis, Simon Lacoste-Julien, Matthew Blaschko, Eugene Belilovsky

    Abstract: Scattering networks are a class of designed Convolutional Neural Networks (CNNs) with fixed weights. We argue they can serve as generic representations for modelling images. In particular, by working in scattering space, we achieve competitive results both for supervised and unsupervised learning tasks, while making progress towards constructing more interpretable CNNs. For supervised learning, we… ▽ More

    Submitted 17 September, 2018; originally announced September 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1703.08961

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, 2018, pp.11

  26. arXiv:1806.00370  [pdf, ps, other

    math.OC cs.LG stat.ML

    Nonlinear Acceleration of CNNs

    Authors: Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach

    Abstract: The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to… ▽ More

    Submitted 1 June, 2018; originally announced June 2018.

  27. arXiv:1805.09639  [pdf, ps, other

    math.OC cs.LG stat.ML

    Online Regularized Nonlinear Acceleration

    Authors: Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach

    Abstract: Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method. It can be seen as a regularized version of Anderson acceleration, a classical acceleration scheme from numerical analysis. The new scheme provably improves the rate of convergence of fixed step gradient descent, and its empirical performance is com… ▽ More

    Submitted 21 June, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

  28. arXiv:1802.07088  [pdf, other

    cs.LG cs.CV stat.ML

    i-RevNet: Deep Invertible Networks

    Authors: Jörn-Henrik Jacobsen, Arnold Smeulders, Edouard Oyallon

    Abstract: It is widely believed that the success of deep convolutional networks is based on progressively discarding uninformative variability about the input with respect to the problem at hand. This is supported empirically by the difficulty of recovering images from their hidden representations, in most commonly used network architectures. In this paper we show via a one-to-one map** that this loss of… ▽ More

    Submitted 20 February, 2018; originally announced February 2018.

    Journal ref: ICLR 2018 - International Conference on Learning Representations, Apr 2018, Vancouver, Canada. 2018, https://iclr.cc/

  29. arXiv:1703.08961  [pdf, ps, other

    cs.CV cs.LG

    Scaling the Scattering Transform: Deep Hybrid Networks

    Authors: Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko

    Abstract: We use the scattering network as a generic and fixed ini-tialization of the first layers of a supervised hybrid deep network. We show that early layers do not necessarily need to be learned, providing the best results to-date with pre-defined representations while being competitive with Deep CNNs. Using a shallow cascade of 1 x 1 convolutions, which encodes scattering coefficients that correspond… ▽ More

    Submitted 4 April, 2017; v1 submitted 27 March, 2017; originally announced March 2017.

  30. arXiv:1703.04140  [pdf, other

    cs.LG stat.ML

    Multiscale Hierarchical Convolutional Networks

    Authors: Jörn-Henrik Jacobsen, Edouard Oyallon, Stéphane Mallat, Arnold W. M. Smeulders

    Abstract: Deep neural network algorithms are difficult to analyze because they lack structure allowing to understand the properties of underlying transforms and invariants. Multiscale hierarchical convolutional networks are structured deep convolutional networks where layers are indexed by progressively higher dimensional attributes, which are learned from training data. Each new layer is computed with mult… ▽ More

    Submitted 12 March, 2017; originally announced March 2017.

  31. arXiv:1703.01775  [pdf, ps, other

    cs.CV cs.LG

    Building a Regular Decision Boundary with Deep Networks

    Authors: Edouard Oyallon

    Abstract: In this work, we build a generic architecture of Convolutional Neural Networks to discover empirical properties of neural networks. Our first contribution is to introduce a state-of-the-art framework that depends upon few hyper parameters and to study the network when we vary them. It has no max pooling, no biases, only 13 layers, is purely convolutional and yields up to 95.4% and 79.6% accuracy r… ▽ More

    Submitted 6 March, 2017; originally announced March 2017.

    Comments: CVPR 2017, 8 pages

  32. arXiv:1412.8659  [pdf, other

    cs.CV

    Deep Roto-Translation Scattering for Object Classification

    Authors: Edouard Oyallon, Stéphane Mallat

    Abstract: Dictionary learning algorithms or supervised deep convolution networks have considerably improved the efficiency of predefined feature representations such as SIFT. We introduce a deep scattering convolution network, with predefined wavelet filters over spatial and angular variables. This representation brings an important improvement to results previously obtained with predefined features over ob… ▽ More

    Submitted 30 May, 2015; v1 submitted 30 December, 2014; originally announced December 2014.

    Comments: 9 pages, 3 figures, CVPR 2015 paper

  33. arXiv:1312.5940  [pdf, ps, other

    cs.CV

    Generic Deep Networks with Wavelet Scattering

    Authors: Edouard Oyallon, Stéphane Mallat, Laurent Sifre

    Abstract: We introduce a two-layer wavelet scattering network, for object classification. This scattering transform computes a spatial wavelet transform on the first layer and a new joint wavelet transform along spatial, angular and scale variables in the second layer. Numerical experiments demonstrate that this two layer convolution network, which involves no learning and no max pooling, performs efficient… ▽ More

    Submitted 10 March, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: Workshop, 3 pages, prepared for ICLR 2014