Skip to main content

Showing 1–50 of 69 results for author: Fleuret, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19320  [pdf, other

    cs.LG cs.AI cs.CV

    Efficient World Models with Context-Aware Tokenization

    Authors: Vincent Micheli, Eloi Alonso, François Fleuret

    Abstract: Scaling up deep Reinforcement Learning (RL) methods presents a significant challenge. Following developments in generative modelling, model-based RL positions itself as a strong contender. Recent advances in sequence modelling have led to effective transformer-based world models, albeit at the price of heavy computations due to the long sequences of tokens required to accurately simulate environme… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  2. arXiv:2406.02313  [pdf, other

    cond-mat.stat-mech cs.LG

    Neural Thermodynamic Integration: Free Energies from Energy-based Diffusion Models

    Authors: Bálint Máté, François Fleuret, Tristan Bereau

    Abstract: Thermodynamic integration (TI) offers a rigorous method for estimating free-energy differences by integrating over a sequence of interpolating conformational ensembles. However, TI calculations are computationally expensive and typically limited to coupling a small number of degrees of freedom due to the need to sample numerous intermediate ensembles with sufficient conformational-space overlap. I… ▽ More

    Submitted 12 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2405.12399  [pdf, other

    cs.LG cs.AI cs.CV

    Diffusion for World Modeling: Visual Details Matter in Atari

    Authors: Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, François Fleuret

    Abstract: World models constitute a promising approach for training reinforcement learning agents in a safe and sample-efficient manner. Recent world models predominantly operate on sequences of discrete latent variables to model environment dynamics. However, this compression into a compact discrete representation may ignore visual details that are important for reinforcement learning. Concurrently, diffus… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 25 pages, 11 figures, 10 tables

  4. arXiv:2405.07813  [pdf, other

    cs.LG cs.CV

    Localizing Task Information for Improved Model Merging and Compression

    Authors: Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez, François Fleuret, Pascal Frossard

    Abstract: Model merging and task arithmetic have emerged as promising scalable approaches to merge multiple single-task checkpoints to one multi-task model, but their applicability is reduced by significant performance loss. Previous works have linked these drops to interference in the weight space and erasure of important task-specific features. Instead, in this work we show that the information required t… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted ICML 2024; The first two authors contributed equally to this work; Project website: https://tall-masks.github.io

  5. arXiv:2404.09562  [pdf, other

    cs.LG cs.AI

    σ-GPTs: A New Approach to Autoregressive Models

    Authors: Arnaud Pannatier, Evann Courdier, François Fleuret

    Abstract: Autoregressive models, such as the GPT family, use a fixed order, usually left-to-right, to generate sequences. However, this is not a necessity. In this paper, we challenge this assumption and show that by simply adding a positional encoding for the output, this order can be modulated on-the-fly per-sample which offers key advantageous properties. It allows for the sampling of and conditioning on… ▽ More

    Submitted 1 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 23 pages, 7 figures, accepted at ECML/PKDD 2024

  6. arXiv:2402.02622  [pdf, other

    cs.CL cs.LG

    DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

    Authors: Matteo Pagliardini, Amirkeivan Mohtashami, Francois Fleuret, Martin Jaggi

    Abstract: The transformer architecture by Vaswani et al. (2017) is now ubiquitous across application domains, from natural language processing to speech processing and image understanding. We propose DenseFormer, a simple modification to the standard architecture that improves the perplexity of the model without increasing its size -- adding a few thousand parameters for large-scale models in the 100B param… ▽ More

    Submitted 21 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  7. arXiv:2401.00828  [pdf, other

    cs.LG hep-lat stat.ML

    Multi-Lattice Sampling of Quantum Field Theories via Neural Operator-based Flows

    Authors: Bálint Máté, François Fleuret

    Abstract: We consider the problem of sampling discrete field configurations $φ$ from the Boltzmann distribution $[dφ] Z^{-1} e^{-S[φ]}$, where $S$ is the lattice-discretization of the continuous Euclidean action $\mathcal S$ of some quantum field theory. Since such densities arise as the approximation of the underlying functional density $[\mathcal Dφ(x)] \mathcal Z^{-1} e^{-\mathcal S[φ(x)]}$, we frame the… ▽ More

    Submitted 17 January, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  8. arXiv:2311.09998  [pdf, other

    cs.LG cs.CV

    DeepEMD: A Transformer-based Fast Estimation of the Earth Mover's Distance

    Authors: Atul Kumar Sinha, Francois Fleuret

    Abstract: The Earth Mover's Distance (EMD) is the measure of choice between point clouds. However the computational cost to compute it makes it prohibitive as a training loss, and the standard approach is to use a surrogate such as the Chamfer distance. We propose an attention-based model to compute an accurate approximation of the EMD that can be used as a training loss for generative models. To get the ne… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  9. arXiv:2311.00586  [pdf, other

    cs.CV

    PAUMER: Patch Pausing Transformer for Semantic Segmentation

    Authors: Evann Courdier, Prabhu Teja Sivaprasad, François Fleuret

    Abstract: We study the problem of improving the efficiency of segmentation transformers by using disparate amounts of computation for different parts of the image. Our method, PAUMER, accomplishes this by pausing computation for patches that are deemed to not need any more computation before the final decoder. We use the entropy of predictions computed from intermediate activations as the pausing criterion,… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  10. arXiv:2306.01160  [pdf, other

    cs.LG cs.AI cs.CL

    Faster Causal Attention Over Large Sequences Through Sparse Flash Attention

    Authors: Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret

    Abstract: Transformer-based language models have found many diverse applications requiring them to process sequences of increasing length. For these applications, the causal self-attention -- which is the only component scaling quadratically w.r.t. the sequence length -- becomes a central concern. While many works have proposed schemes to sparsify the attention patterns and reduce the computational overhead… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  11. arXiv:2304.10857  [pdf, other

    cs.LG

    SequeL: A Continual Learning Library in PyTorch and JAX

    Authors: Nikolaos Dimitriadis, Francois Fleuret, Pascal Frossard

    Abstract: Continual Learning is an important and challenging problem in machine learning, where models must adapt to a continuous stream of new data without forgetting previously acquired knowledge. While existing frameworks are built on PyTorch, the rising popularity of JAX might lead to divergent codebases, ultimately hindering reproducibility and progress. To address this problem, we introduce SequeL, a… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

    Comments: 7 pages, 1 figure, 4 code listings

  12. arXiv:2302.05282  [pdf, other

    cs.LG cs.AI

    Graph Neural Networks Go Forward-Forward

    Authors: Daniele Paliotta, Mathieu Alain, Bálint Máté, François Fleuret

    Abstract: We present the Graph Forward-Forward (GFF) algorithm, an extension of the Forward-Forward procedure to graphs, able to handle features distributed over a graph's nodes. This allows training graph neural networks with forward passes only, without backpropagation. Our method is agnostic to the message-passing scheme, and provides a more biologically plausible learning scheme than backpropagation, wh… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  13. arXiv:2301.07388  [pdf, other

    stat.ML cs.LG

    Learning Interpolations between Boltzmann Densities

    Authors: Bálint Máté, François Fleuret

    Abstract: We introduce a training objective for continuous normalizing flows that can be used in the absence of samples but in the presence of an energy function. Our method relies on either a prescribed or a learnt interpolation $f_t$ of energy functions between the target energy $f_1$ and the energy function of a generalized Gaussian $f_0(x) = ||x/σ||_p^p$. The interpolation of energy functions induces an… ▽ More

    Submitted 30 May, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: TMLR

  14. arXiv:2211.11704  [pdf, other

    cs.CV

    ESLAM: Efficient Dense SLAM System Based on Hybrid Representation of Signed Distance Fields

    Authors: Mohammad Mahdi Johari, Camilla Carta, François Fleuret

    Abstract: We present ESLAM, an efficient implicit neural representation method for Simultaneous Localization and Map** (SLAM). ESLAM reads RGB-D frames with unknown camera poses in a sequential manner and incrementally reconstructs the scene representation while estimating the current camera position in the scene. We incorporate the latest advances in Neural Radiance Fields (NeRF) into a SLAM system, resu… ▽ More

    Submitted 3 April, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: CVPR 2023 Highlight. Project page: https://www.idiap.ch/paper/eslam/

  15. arXiv:2210.13772  [pdf, other

    hep-lat cond-mat.stat-mech cs.LG

    Deformations of Boltzmann Distributions

    Authors: Bálint Máté, François Fleuret

    Abstract: Consider a one-parameter family of Boltzmann distributions $p_t(x) = \tfrac{1}{Z_t}e^{-S_t(x)}$. This work studies the problem of sampling from $p_{t_0}$ by first sampling from $p_{t_1}$ and then applying a transformation $Ψ_{t_1}^{t_0}$ so that the transformed samples follow $p_{t_0}$. We derive an equation relating $Ψ$ and the corresponding family of unnormalized log-likelihoods $S_t$. The utili… ▽ More

    Submitted 14 November, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: Machine Learning for the Physical Sciences Workshop at NeurIPS '22

  16. arXiv:2210.11269  [pdf, other

    cs.LG physics.ao-ph physics.flu-dyn

    Inference from Real-World Sparse Measurements

    Authors: Arnaud Pannatier, Kyle Matoba, François Fleuret

    Abstract: Real-world problems often involve complex and unstructured sets of measurements, which occur when sensors are sparsely placed in either space or time. Being able to model this irregular spatiotemporal data and extract meaningful forecasts is crucial. Deep learning architectures capable of processing sets of measurements with positions varying from set to set, and extracting readouts anywhere are m… ▽ More

    Submitted 15 April, 2024; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: 27 pages, 12 figures, Published at TMLR https://openreview.net/forum?id=y9IDfODRns

  17. arXiv:2210.09759  [pdf, other

    cs.LG

    Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models

    Authors: Nikolaos Dimitriadis, Pascal Frossard, François Fleuret

    Abstract: In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution, superior to all its single-task trained counterparts. Since there is often not a unique solution optimal for all tasks, practitioners have to balance tradeoffs between tasks' performance, and resort to optimality in the Pareto sense. Most MTL methodo… ▽ More

    Submitted 14 June, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: Accepted ICML 2023

  18. arXiv:2209.00588  [pdf, other

    cs.LG cs.AI cs.CV

    Transformers are Sample-Efficient World Models

    Authors: Vincent Micheli, Eloi Alonso, François Fleuret

    Abstract: Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appeali… ▽ More

    Submitted 1 March, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

    Comments: ICLR 2023 (notable top 5%)

  19. arXiv:2206.07144  [pdf, other

    cs.LG

    Efficiently Training Low-Curvature Neural Networks

    Authors: Suraj Srinivas, Kyle Matoba, Himabindu Lakkaraju, Francois Fleuret

    Abstract: The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial examples and have unstable gradients which hinders interpretability. However, existing methods to solve these issues, such as adversarial training, are expensive and often sacrifice predictive accuracy. In this work, we consider curvature, which is a mathematical quantity which encodes the degree of… ▽ More

    Submitted 10 January, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  20. arXiv:2205.15209  [pdf, other

    cs.LG stat.ML

    Flowification: Everything is a Normalizing Flow

    Authors: Bálint Máté, Samuel Klein, Tobias Golling, François Fleuret

    Abstract: The two key characteristics of a normalizing flow is that it is invertible (in particular, dimension preserving) and that it monitors the amount by which it changes the likelihood of data points as samples are propagated along the network. Recently, multiple generalizations of normalizing flows have been introduced that relax these two conditions. On the other hand, neural networks only perform a… ▽ More

    Submitted 26 January, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022

  21. arXiv:2203.03691  [pdf, other

    cs.CL cs.AI cs.LG

    HyperMixer: An MLP-based Low Cost Alternative to Transformers

    Authors: Florian Mai, Arnaud Pannatier, Fabio Fehr, Haolin Chen, Francois Marelli, Francois Fleuret, James Henderson

    Abstract: Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token m… ▽ More

    Submitted 13 November, 2023; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: Published at ACL 2023

    Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

  22. arXiv:2203.01016  [pdf, other

    cs.LG

    The Theoretical Expressiveness of Maxpooling

    Authors: Kyle Matoba, Nikolaos Dimitriadis, François Fleuret

    Abstract: Over the decade since deep neural networks became state of the art image classifiers there has been a tendency towards less use of max pooling: the function that takes the largest of nearby pixels in an image. Since max pooling featured prominently in earlier generations of image classifiers, we wish to understand this trend, and whether it is justified. We develop a theoretical framework analyzin… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

    Comments: 31 pages, 6 figures

  23. arXiv:2202.10583  [pdf, other

    cs.LG cs.AI

    MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned

    Authors: Anssi Kanervisto, Stephanie Milani, Karolis Ramanauskas, Nicholay Topin, Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang, Weijun Hong, Zhongyue Huang, Haicheng Chen, Guangjun Zeng, Yue Lin, Vincent Micheli, Eloi Alonso, François Fleuret, Alexander Nikulin, Yury Belousov, Oleg Svidchenko, Aleksei Shpilman

    Abstract: Reinforcement learning competitions advance the field by providing appropriate scope and support to develop solutions toward a specific problem. To promote the development of more broadly applicable methods, organizers need to enforce the use of general techniques, the use of sample-efficient methods, and the reproducibility of the results. While beneficial for the research community, these restri… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: Under review for PMLR volume on NeurIPS 2021 competitions

  24. arXiv:2202.05748  [pdf, other

    cs.CV

    Borrowing from yourself: Faster future video segmentation with partial channel update

    Authors: Evann Courdier, François Fleuret

    Abstract: Semantic segmentation is a well-addressed topic in the computer vision literature, but the design of fast and accurate video processing networks remains challenging. In addition, to run on embedded hardware, computer vision models often have to make compromises on accuracy to run at the required speed, so that a latency/accuracy trade-off is usually at the heart of these real-time systems' design.… ▽ More

    Submitted 17 June, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

  25. arXiv:2202.05012  [pdf, other

    physics.data-an astro-ph.IM cs.LG hep-ex physics.acc-ph

    SUPA: A Lightweight Diagnostic Simulator for Machine Learning in Particle Physics

    Authors: Atul Kumar Sinha, Daniele Paliotta, Bálint Máté, Sebastian Pina-Otey, John A. Raine, Tobias Golling, François Fleuret

    Abstract: Deep learning methods have gained popularity in high energy physics for fast modeling of particle showers in detectors. Detailed simulation frameworks such as the gold standard Geant4 are computationally intensive, and current deep generative architectures work on discretized, lower resolution versions of the detailed simulation. The development of models that work at higher spatial resolutions is… ▽ More

    Submitted 21 October, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

  26. arXiv:2202.04414  [pdf, other

    cs.LG

    Agree to Disagree: Diversity through Disagreement for Better Transferability

    Authors: Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy

    Abstract: Gradient-based learning algorithms have an implicit simplicity bias which in effect can limit the diversity of predictors being sampled by the learning procedure. This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features -- present in the training data but absent from the test data -- and (ii) by only leveraging a small subset of p… ▽ More

    Submitted 23 November, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: 23 pages, 17 figures

  27. arXiv:2112.10408  [pdf, other

    cs.LG

    Efficient Wind Speed Nowcasting with GPU-Accelerated Nearest Neighbors Algorithm

    Authors: Arnaud Pannatier, Ricardo Picatoste, François Fleuret

    Abstract: This paper proposes a simple yet efficient high-altitude wind nowcasting pipeline. It processes efficiently a vast amount of live data recorded by airplanes over the whole airspace and reconstructs the wind field with good accuracy. It creates a unique context for each point in the dataset and then extrapolates from it. As creating such context is computationally intensive, this paper proposes a n… ▽ More

    Submitted 22 February, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: 9 pages, 5 figures, accepted at Siam Data Mining 2022 (SDM 2022)

  28. arXiv:2111.13539  [pdf, other

    cs.CV

    GeoNeRF: Generalizing NeRF with Geometry Priors

    Authors: Mohammad Mahdi Johari, Yann Lepoittevin, François Fleuret

    Abstract: We present GeoNeRF, a generalizable photorealistic novel view synthesis method based on neural radiance fields. Our approach consists of two main stages: a geometry reasoner and a renderer. To render a novel view, the geometry reasoner first constructs cascaded cost volumes for each nearby source view. Then, using a Transformer-based attention mechanism and the cascaded cost volumes, the renderer… ▽ More

    Submitted 21 March, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

    Comments: CVPR2022

  29. arXiv:2110.10232  [pdf, other

    cs.LG cs.CV

    Test time Adaptation through Perturbation Robustness

    Authors: Prabhu Teja Sivaprasad, François Fleuret

    Abstract: Data samples generated by several real world processes are dynamic in nature \textit{i.e.}, their characteristics vary with time. Thus it is not possible to train and tackle all possible distributional shifts between training and inference, using the host of transfer learning methods in literature. In this paper, we tackle this problem of adapting to domain shift at inference time \textit{i.e.}, w… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: Under review

  30. arXiv:2109.03709  [pdf, other

    cs.LG stat.ML

    Speeding up PCA with priming

    Authors: Bálint Máté, François Fleuret

    Abstract: We introduce primed-PCA (pPCA), a two-step algorithm for speeding up the approximation of principal components. This algorithm first runs any approximate-PCA method to get an initial estimate of the principal components (priming), and then applies an exact PCA in the subspace they span. Since this subspace is of small dimension in any practical use, the second step is extremely cheap computational… ▽ More

    Submitted 20 May, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

  31. arXiv:2104.07972  [pdf, other

    cs.CL cs.LG

    Language Models are Few-Shot Butlers

    Authors: Vincent Micheli, François Fleuret

    Abstract: Pretrained language models demonstrate strong performance in most NLP tasks when fine-tuned on small task-specific datasets. Hence, these autoregressive models constitute ideal agents to operate in text-based environments where language understanding and generative capabilities are essential. Nonetheless, collecting expert demonstrations in such environments is a time-consuming endeavour. We intro… ▽ More

    Submitted 20 September, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021

  32. arXiv:2104.06045  [pdf, other

    cs.CL cs.LG

    Structural analysis of an all-purpose question answering model

    Authors: Vincent Micheli, Quentin Heinrich, François Fleuret, Wacim Belblidia

    Abstract: Attention is a key component of the now ubiquitous pre-trained language models. By learning to focus on relevant pieces of information, these Transformer-based architectures have proven capable of tackling several tasks at once and sometimes even surpass their single-task counterparts. To better understand this phenomenon, we conduct a structural analysis of a new all-purpose question answering mo… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  33. arXiv:2101.10983  [pdf, other

    cs.LG

    Unsupervised clustering of series using dynamic programming and neural processes

    Authors: Karthigan Sinnathamby, Chang-Yu Hou, Lalitha Venkataramanan, Vasileios-Marios Gkortsas, François Fleuret

    Abstract: Following the work of arXiv:2101.09512, we are interested in clustering a given multi-variate series in an unsupervised manner. We would like to segment and cluster the series such that the resulting blocks present in each cluster are coherent with respect to a predefined model structure (e.g. a physics model with a functional form defined by a number of parameters). However, such approach might h… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

  34. arXiv:2101.09512  [pdf, other

    cs.LG stat.ML

    Unsupervised clustering of series using dynamic programming

    Authors: Karthigan Sinnathamby, Chang-Yu Hou, Lalitha Venkataramanan, Vasileios-Marios Gkortsas, François Fleuret

    Abstract: We are interested in clustering parts of a given single multi-variate series in an unsupervised manner. We would like to segment and cluster the series such that the resulting blocks present in each cluster are coherent with respect to a known model (e.g. physics model). Data points are said to be coherent if they can be described using this model with the same parameters. We have designed an algo… ▽ More

    Submitted 23 January, 2021; originally announced January 2021.

  35. arXiv:2010.03813  [pdf, other

    cs.CL cs.LG

    On the importance of pre-training data volume for compact language models

    Authors: Vincent Micheli, Martin d'Hoffschmidt, François Fleuret

    Abstract: Recent advances in language modeling have led to computationally intensive and resource-demanding state-of-the-art models. In an effort towards sustainable practices, we study the impact of pre-training data volume on compact language models. Multiple BERT-based models are trained on gradually increasing amounts of French text. Through fine-tuning on the French Question Answering Dataset (FQuAD),… ▽ More

    Submitted 9 October, 2020; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020; typo corrected

  36. arXiv:2007.04825  [pdf, other

    cs.LG stat.ML

    Fast Transformers with Clustered Attention

    Authors: Apoorv Vyas, Angelos Katharopoulos, François Fleuret

    Abstract: Transformers have been proven a successful model for a variety of tasks in sequence modeling. However, computing the attention matrix, which is their key component, has quadratic complexity with respect to the sequence length, thus making them prohibitively expensive for large sequences. To address this, we propose clustered attention, which instead of computing the attention for every query, grou… ▽ More

    Submitted 29 September, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

  37. arXiv:2006.16236  [pdf, other

    cs.LG stat.ML

    Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

    Authors: Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret

    Abstract: Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences. To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from… ▽ More

    Submitted 31 August, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: ICML 2020, project at https://linear-transformers.com/

  38. arXiv:2006.14567  [pdf, other

    stat.ML cs.LG

    Taming GANs with Lookahead-Minmax

    Authors: Tatjana Chavdarova, Matteo Pagliardini, Sebastian U. Stich, Francois Fleuret, Martin Jaggi

    Abstract: Generative Adversarial Networks are notoriously challenging to train. The underlying minmax optimization is highly susceptible to the variance of the stochastic gradient and the rotational component of the associated game vector field. To tackle these challenges, we propose the Lookahead algorithm for minmax optimization, originally developed for single objective minimization only. The backtrackin… ▽ More

    Submitted 23 June, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

    Journal ref: ICLR 2021

  39. arXiv:2006.09128  [pdf, other

    cs.LG cs.CV stat.ML

    Rethinking the Role of Gradient-Based Attribution Methods for Model Interpretability

    Authors: Suraj Srinivas, Francois Fleuret

    Abstract: Current methods for the interpretability of discriminative deep neural networks commonly rely on the model's input-gradients, i.e., the gradients of the output logits w.r.t. the inputs. The common assumption is that these input-gradients contain information regarding $p_θ ( y \mid x)$, the model's discriminative capabilities, thus justifying their use for interpretability. However, in this work we… ▽ More

    Submitted 3 March, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: Oral Presentation at ICLR 2021

  40. arXiv:2004.02574  [pdf, other

    cs.CV

    Real-Time Segmentation Networks should be Latency Aware

    Authors: Evann Courdier, Francois Fleuret

    Abstract: As scene segmentation systems reach visually accurate results, many recent papers focus on making these network architectures faster, smaller and more efficient. In particular, studies often aim at designingreal-time'systems. Achieving this goal is particularly relevant in the context of real-time video understanding for autonomous vehicles, and robots. In this paper, we argue that the commonly us… ▽ More

    Submitted 20 April, 2022; v1 submitted 6 April, 2020; originally announced April 2020.

  41. arXiv:2002.03240  [pdf, other

    cs.LG stat.ML

    Multi-task Reinforcement Learning with a Planning Quasi-Metric

    Authors: Vincent Micheli, Karthigan Sinnathamby, François Fleuret

    Abstract: We introduce a new reinforcement learning approach combining a planning quasi-metric (PQM) that estimates the number of steps required to go from any state to another, with task-specific "aimers" that compute a target state to reach a given goal. This decomposition allows the sharing across tasks of a task-agnostic model of the quasi-metric that captures the environment's dynamics and can be learn… ▽ More

    Submitted 5 December, 2020; v1 submitted 8 February, 2020; originally announced February 2020.

    Comments: Deep RL Workshop, NeurIPS 2020

  42. arXiv:1910.11758  [pdf, other

    cs.LG stat.ML

    Optimizer Benchmarking Needs to Account for Hyperparameter Tuning

    Authors: Prabhu Teja Sivaprasad, Florian Mai, Thijs Vogels, Martin Jaggi, François Fleuret

    Abstract: The performance of optimizers, particularly in deep learning, depends considerably on their chosen hyperparameter configuration. The efficacy of optimizers is often studied under near-optimal problem-specific hyperparameters, and finding these settings may be prohibitively costly for practitioners. In this work, we argue that a fair assessment of optimizers' performance must take the computational… ▽ More

    Submitted 15 August, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: published at International Conference on Machine Learning (ICML 2020)

  43. arXiv:1905.03711  [pdf, other

    cs.CV cs.LG stat.ML

    Processing Megapixel Images with Deep Attention-Sampling Models

    Authors: Angelos Katharopoulos, François Fleuret

    Abstract: Existing deep architectures cannot operate on very large signals such as megapixel images due to computational and memory constraints. To tackle this limitation, we propose a fully differentiable end-to-end trainable model that samples and processes only a fraction of the full resolution input image. The locations to process are sampled from an attention distribution computed from a low resolution… ▽ More

    Submitted 17 July, 2019; v1 submitted 3 May, 2019; originally announced May 2019.

    Comments: Presented in ICML 2019. Code is available at https://github.com/idiap/attention-sampling

    Journal ref: Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3282-3291, 2019

  44. arXiv:1905.00780  [pdf, other

    cs.LG cs.CV stat.ML

    Full-Gradient Representation for Neural Network Visualization

    Authors: Suraj Srinivas, Francois Fleuret

    Abstract: We introduce a new tool for interpreting neural net responses, namely full-gradients, which decomposes the neural net response into input sensitivity and per-neuron sensitivity components. This is the first proposed representation which satisfies two key properties: completeness and weak dependence, which provably cannot be satisfied by any saliency map-based interpretability method. For convoluti… ▽ More

    Submitted 3 December, 2019; v1 submitted 2 May, 2019; originally announced May 2019.

    Comments: NeurIPS 2019

  45. arXiv:1904.08598  [pdf, other

    stat.ML cs.LG math.OC

    Reducing Noise in GAN Training with Variance Reduced Extragradient

    Authors: Tatjana Chavdarova, Gauthier Gidel, François Fleuret, Simon Lacoste-Julien

    Abstract: We study the effect of the stochastic gradient noise on the training of generative adversarial networks (GANs) and show that it can prevent the convergence of standard game optimization methods, while the batch version converges. We address this issue with a novel stochastic variance-reduced extragradient (SVRE) optimization algorithm, which for a large class of games improves upon the previous co… ▽ More

    Submitted 25 June, 2020; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: latest NeurIPS'19 version

  46. arXiv:1806.01677  [pdf, other

    cs.CV cs.NE

    Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching

    Authors: Stepan Tulyakov, Anton Ivanov, Francois Fleuret

    Abstract: End-to-end deep-learning networks recently demonstrated extremely good perfor- mance for stereo matching. However, existing networks are difficult to use for practical applications since (1) they are memory-hungry and unable to process even modest-size images, (2) they have to be trained for a given disparity range. The Practical Deep Stereo (PDS) network that we propose addresses both issues: Fir… ▽ More

    Submitted 5 June, 2018; originally announced June 2018.

  47. arXiv:1803.00942  [pdf, other

    cs.LG

    Not All Samples Are Created Equal: Deep Learning with Importance Sampling

    Authors: Angelos Katharopoulos, François Fleuret

    Abstract: Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on "informative" examples, and reduces the variance of the stochastic gradients during training. Our contribution is twofold: first, we derive a tractable upper bound to… ▽ More

    Submitted 28 October, 2019; v1 submitted 2 March, 2018; originally announced March 2018.

    Comments: Accepted at ICML 2018 (short oral)

  48. arXiv:1803.00443  [pdf, other

    cs.LG cs.CV

    Knowledge Transfer with Jacobian Matching

    Authors: Suraj Srinivas, Francois Fleuret

    Abstract: Classical distillation methods transfer representations from a "teacher" neural network to a "student" network by matching their output activations. Recent methods also match the Jacobians, or the gradient of output activations with the input. However, this involves making some ad hoc decisions, in particular, the choice of the loss function. In this paper, we first establish an equivalence betw… ▽ More

    Submitted 1 March, 2018; originally announced March 2018.

  49. arXiv:1802.04016  [pdf, other

    cs.CE

    Geodesic Convolutional Shape Optimization

    Authors: Pierre Baqué, Edoardo Remelli, François Fleuret, Pascal Fua

    Abstract: Aerodynamic shape optimization has many industrial applications. Existing methods, however, are so computationally demanding that typical engineering practices are to either simply try a limited number of hand-designed shapes or restrict oneself to shapes that can be parameterized using only few degrees of freedom. In this work, we introduce a new way to optimize complex shapes fast and accurately… ▽ More

    Submitted 12 February, 2018; originally announced February 2018.

  50. arXiv:1712.02330  [pdf, other

    stat.ML cs.LG

    SGAN: An Alternative Training of Generative Adversarial Networks

    Authors: Tatjana Chavdarova, François Fleuret

    Abstract: The Generative Adversarial Networks (GANs) have demonstrated impressive performance for data synthesis, and are now used in a wide range of computer vision tasks. In spite of this success, they gained a reputation for being difficult to train, what results in a time-consuming and human-involved development process to use them. We consider an alternative training process, named SGAN, in which sev… ▽ More

    Submitted 6 December, 2017; originally announced December 2017.