Skip to main content

Showing 1–50 of 51 results for author: Belilovsky, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13653  [pdf, other

    cs.LG

    Controlling Forgetting with Test-Time Data in Continual Learning

    Authors: Vaibhav Singh, Rahaf Aljundi, Eugene Belilovsky

    Abstract: Foundational vision-language models have shown impressive performance on various downstream tasks. Yet, there is still a pressing need to update these models later as new tasks or domains become available. Ongoing Continual Learning (CL) research provides techniques to overcome catastrophic forgetting of previous information when new knowledge is acquired. To date, CL techniques focus only on the… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures

  2. arXiv:2406.02613  [pdf, other

    cs.LG cs.AI

    ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training

    Authors: Adel Nabli, Louis Fournier, Pierre Erbacher, Louis Serrano, Eugene Belilovsky, Edouard Oyallon

    Abstract: Training Large Language Models (LLMs) relies heavily on distributed implementations, employing multiple GPUs to compute stochastic gradients on model replicas in parallel. However, synchronizing gradients in data parallel settings induces a communication overhead increasing with the number of distributed workers, which can impede the efficiency gains of parallelization. To address this challenge,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  3. arXiv:2406.02052  [pdf, other

    cs.LG stat.ML

    PETRA: Parallel End-to-end Training with Reversible Architectures

    Authors: Stéphane Rivaud, Louis Fournier, Thomas Pumir, Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

    Abstract: Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep learning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2406.01365  [pdf, other

    cs.CV cs.CR cs.LG

    From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation

    Authors: Geraldin Nanfack, Michael Eickenberg, Eugene Belilovsky

    Abstract: Understanding the inner working functionality of large-scale deep neural networks is challenging yet crucial in several high-stakes applications. Mechanistic inter- pretability is an emergent field that tackles this challenge, often by identifying human-understandable subgraphs in deep neural networks known as circuits. In vision-pretrained models, these subgraphs are usually interpreted by visual… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Under review

  5. arXiv:2406.00272  [pdf, other

    cs.CV

    Temporally Consistent Object Editing in Videos using Extended Attention

    Authors: AmirHossein Zamani, Amir G. Aghdam, Tiberiu Popa, Eugene Belilovsky

    Abstract: Image generation and editing have seen a great deal of advancements with the rise of large-scale diffusion models that allow user control of different modalities such as text, mask, depth maps, etc. However, controlled editing of videos still lags behind. Prior work in this area has focused on using 2D diffusion models to globally change the style of an existing video. On the other hand, in many p… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  6. arXiv:2406.00153  [pdf, other

    cs.LG

    $μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers

    Authors: Benjamin Thérien, Charles-Étienne Joseph, Boris Knyazev, Edouard Oyallon, Irina Rish, Eugene Belilovsky

    Abstract: Learned optimizers (LOs) can significantly reduce the wall-clock training time of neural networks, substantially reducing training costs. However, they often suffer from poor meta-generalization, especially when training networks larger than those seen during meta-training. To address this, we use the recently proposed Maximal Update Parametrization ($μ$P), which allows zero-shot generalization of… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  7. arXiv:2405.17517  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average

    Authors: Louis Fournier, Adel Nabli, Masih Aminbeidokhti, Marco Pedersoli, Eugene Belilovsky, Edouard Oyallon

    Abstract: The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at an increased cost at inference. Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model by averaging the parameters of an ensemble of models. Yet, naive averaging results in poor performance as model… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  8. arXiv:2405.16397  [pdf, other

    cs.LG math.OC

    AdaFisher: Adaptive Second Order Optimization via Fisher Information

    Authors: Damien Martins Gomes, Yanlei Zhang, Eugene Belilovsky, Guy Wolf, Mahdi S. Hosseini

    Abstract: First-order optimization methods are currently the mainstream in training deep neural networks (DNNs). Optimizers like Adam incorporate limited curvature information by employing the diagonal matrix preconditioning of the stochastic gradient during the training. Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order coun… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  9. arXiv:2403.08763  [pdf, other

    cs.LG cs.AI cs.CL

    Simple and Scalable Strategies to Continually Pre-train Large Language Models

    Authors: Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

    Abstract: Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptati… ▽ More

    Submitted 26 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  10. arXiv:2402.04958  [pdf, other

    cs.CV

    Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation

    Authors: Pedro Vianna, Muawiz Chaudhary, Paria Mehrbod, An Tang, Guy Cloutier, Guy Wolf, Michael Eickenberg, Eugene Belilovsky

    Abstract: Deep neural networks have useful applications in many different tasks, however their performance can be severely affected by changes in the data distribution. For example, in the biomedical field, their performance can be affected by changes in the data (different machines, populations) between training and test datasets. To ensure robustness and generalization to real-world scenarios, test-time a… ▽ More

    Submitted 29 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted at the Conference on Lifelong Learning Agents (CoLLAs) 2024

  11. arXiv:2312.06795  [pdf, other

    cs.LG

    Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks

    Authors: MohammadReza Davari, Eugene Belilovsky

    Abstract: The rapid development of AI systems has been greatly influenced by the emergence of foundation models. A common approach for targeted problems involves fine-tuning these pre-trained foundation models for specific target tasks, resulting in a rapid spread of models fine-tuned across a diverse array of tasks. This work focuses on the problem of merging multiple fine-tunings of the same foundation mo… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  12. arXiv:2312.02204  [pdf, other

    cs.LG

    Can We Learn Communication-Efficient Optimizers?

    Authors: Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky

    Abstract: Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally, that is on each worker, before averaging model parameters, hel** relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they c… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  13. arXiv:2310.04561  [pdf, other

    cs.GR cs.LG

    DragD3D: Vertex-based Editing for Realistic Mesh Deformations using 2D Diffusion Priors

    Authors: Tianhao Xie, Eugene Belilovsky, Sudhir Mudur, Tiberiu Popa

    Abstract: Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline. Direct mesh editing methods are typically framed as optimization problems combining user-specified vertex constraints with a regularizer that determines the position of the rest of the vertices. The choice of the regularizer is key to the realism and authenticity of the final result. Physics an… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: 8 pages, 9 figures, project page: https://tianhaoxie.github.io/project/DragD3D/

  14. arXiv:2308.04014  [pdf, other

    cs.CL cs.LG

    Continual Pre-Training of Large Language Models: How to (re)warm your model?

    Authors: Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

    Abstract: Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data t… ▽ More

    Submitted 6 September, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

  15. arXiv:2306.08289  [pdf, other

    cs.LG cs.AI cs.DC

    $\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in Decentralized Deep Learning

    Authors: Adel Nabli, Eugene Belilovsky, Edouard Oyallon

    Abstract: Distributed training of Deep Learning models has been critical to many recent successes in the field. Current standard methods primarily rely on synchronous centralized algorithms which induce major communication bottlenecks and synchronization locks at scale. Decentralized asynchronous algorithms are emerging as a potential alternative but their practical applicability still lags. In order to mit… ▽ More

    Submitted 6 December, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Journal ref: Thirty-seventh Conference on Neural Information Processing Systems, Dec 2023, New Orleans, United States

  16. arXiv:2306.07397  [pdf, other

    cs.LG cs.CV

    Adversarial Attacks on the Interpretation of Neuron Activation Maximization

    Authors: Geraldin Nanfack, Alexander Fulleringer, Jonathan Marty, Michael Eickenberg, Eugene Belilovsky

    Abstract: The internal functional behavior of trained Deep Neural Networks is notoriously difficult to interpret. Activation-maximization approaches are one set of techniques used to interpret and analyze trained deep-learning models. These consist in finding inputs that maximally activate a given neuron or feature map. These inputs can be selected from a data set or obtained by optimization. However, inter… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  17. arXiv:2306.06968  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Can Forward Gradient Match Backpropagation?

    Authors: Louis Fournier, Stéphane Rivaud, Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

    Abstract: Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While c… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Journal ref: Fortieth International Conference on Machine Learning, Jul 2023, Honolulu (Hawaii), USA, United States

  18. arXiv:2306.03937  [pdf, other

    cs.LG cs.AI

    Guiding The Last Layer in Federated Learning with Pre-Trained Models

    Authors: Gwen Legate, Nicolas Bernier, Lucas Caccia, Edouard Oyallon, Eugene Belilovsky

    Abstract: Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data. Recent works have begun to consider the effects of using pre-trained models as an initialization point for existing FL algorithms; however, these approaches ignore the vast body of efficient transfer learning literature from the centralized learning setting. Here… ▽ More

    Submitted 6 November, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

  19. arXiv:2304.05260  [pdf, other

    cs.LG cs.AI

    Re-Weighted Softmax Cross-Entropy to Control Forgetting in Federated Learning

    Authors: Gwen Legate, Lucas Caccia, Eugene Belilovsky

    Abstract: In Federated Learning, a global model is learned by aggregating model updates computed at a set of independent client nodes, to reduce communication costs multiple gradient steps are performed at each node prior to aggregation. A key challenge in this setting is data heterogeneity across clients resulting in differing local objectives which can lead clients to overly minimize their own local objec… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  20. arXiv:2304.04858  [pdf, other

    cs.LG cs.CV

    Simulated Annealing in Early Layers Leads to Better Generalization

    Authors: Amirmohammad Sarfi, Zahra Karimpour, Muawiz Chaudhary, Nasir M. Khalid, Mirco Ravanelli, Sudhir Mudur, Eugene Belilovsky

    Abstract: Recently, a number of iterative learning methods have been introduced to improve generalization. These typically rely on training for longer periods of time in exchange for improved generalization. LLF (later-layer-forgetting) is a state-of-the-art method in this category. It strengthens learning in early layers by periodically re-initializing the last few layers of the network. Our principal inno… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

  21. arXiv:2303.14771  [pdf, other

    cs.LG

    Prototype-Sample Relation Distillation: Towards Replay-Free Continual Learning

    Authors: Nader Asadi, MohammadReza Davari, Sudhir Mudur, Rahaf Aljundi, Eugene Belilovsky

    Abstract: In Continual learning (CL) balancing effective adaptation while combating catastrophic forgetting is a central challenge. Many of the recent best-performing methods utilize various forms of prior task data, e.g. a replay buffer, to tackle the catastrophic forgetting problem. Having access to previous task data can be restrictive in many real-world scenarios, for example when task data is sensitive… ▽ More

    Submitted 6 June, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

    Comments: Accepted at ICML 2023

  22. arXiv:2302.06540  [pdf, other

    cs.AI cs.LG cs.MA

    Imitation from Observation With Bootstrapped Contrastive Learning

    Authors: Medric Sonwa, Johanna Hansen, Eugene Belilovsky

    Abstract: Imitation from observation (IfO) is a learning paradigm that consists of training autonomous agents in a Markov Decision Process (MDP) by observing expert demonstrations without access to its actions. These demonstrations could be sequences of environment states or raw visual observations of the environment. Recent work in IfO has focused on this problem in the case of observations of low-dimensio… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  23. arXiv:2301.07635  [pdf, other

    cs.LG cs.NE

    Local Learning with Neuron Groups

    Authors: Adeetya Patel, Michael Eickenberg, Eugene Belilovsky

    Abstract: Traditional deep network training methods optimize a monolithic objective function jointly for all the components. This can lead to various inefficiencies in terms of potential parallelization. Local learning is an approach to model-parallelism that removes the standard end-to-end learning setup and utilizes local objective functions to permit parallel learning amongst model components in a deep n… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

  24. arXiv:2210.16156  [pdf, other

    cs.LG cs.AI cs.CV

    Reliability of CKA as a Similarity Measure in Deep Learning

    Authors: MohammadReza Davari, Stefan Horoi, Amine Natik, Guillaume Lajoie, Guy Wolf, Eugene Belilovsky

    Abstract: Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained… ▽ More

    Submitted 16 November, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

  25. arXiv:2203.13381  [pdf, other

    cs.LG cs.AI cs.CV

    Probing Representation Forgetting in Supervised and Unsupervised Continual Learning

    Authors: MohammadReza Davari, Nader Asadi, Sudhir Mudur, Rahaf Aljundi, Eugene Belilovsky

    Abstract: Continual Learning research typically focuses on tackling the phenomenon of catastrophic forgetting in neural networks. Catastrophic forgetting is associated with an abrupt loss of knowledge previously learned by a model when the task, or more broadly the data distribution, being trained on changes. In supervised learning problems this forgetting, resulting from a change in the model's representat… ▽ More

    Submitted 5 April, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022

  26. arXiv:2203.13333  [pdf, other

    cs.CV cs.GR cs.LG

    CLIP-Mesh: Generating textured meshes from text using pretrained image-text models

    Authors: Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, Tiberiu Popa

    Abstract: We present a technique for zero-shot generation of a 3D model using only a target text prompt. Without any 3D supervision our method deforms the control shape of a limit subdivided surface along with its texture map and normal map to obtain a 3D asset that corresponds to the input text prompt and can be easily deployed into games or modeling applications. We rely only on a pre-trained CLIP model t… ▽ More

    Submitted 2 September, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: 8 pages, 8 figures, Accepted at SIGGRAPH ASIA 2022, Project Page at https://www.nasir.lol/clipmesh

  27. arXiv:2203.13307  [pdf, other

    cs.LG cs.AI

    Tackling Online One-Class Incremental Learning by Removing Negative Contrasts

    Authors: Nader Asadi, Sudhir Mudur, Eugene Belilovsky

    Abstract: Recent work studies the supervised online continual learning setting where a learner receives a stream of data whose class distribution changes over time. Distinct from other continual learning settings the learner is presented new samples only once and must distinguish between all seen classes. A number of successful methods in this setting focus on storing and replaying a subset of samples along… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted at NeurIPS 2021 Workshop on Distribution Shifts

  28. arXiv:2203.03798   

    cs.LG cs.AI

    New Insights on Reducing Abrupt Representation Change in Online Continual Learning

    Authors: Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky

    Abstract: In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. Experience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously un… ▽ More

    Submitted 25 April, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: This has been withdrawn as it is a new version of arXiv:2104.05025

  29. arXiv:2201.13415  [pdf, other

    cs.NE

    Towards Scaling Difference Target Propagation by Learning Backprop Targets

    Authors: Maxence Ernoult, Fabrice Normandin, Abhinav Moudgil, Sean Spinney, Eugene Belilovsky, Irina Rish, Blake Richards, Yoshua Bengio

    Abstract: The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on c… ▽ More

    Submitted 31 January, 2022; originally announced January 2022.

  30. arXiv:2201.11986  [pdf, other

    cs.LG cs.AI

    Gradient Masked Averaging for Federated Learning

    Authors: Irene Tenison, Sai Aravind Sreeramadas, Vaikkunth Mugunthan, Edouard Oyallon, Irina Rish, Eugene Belilovsky

    Abstract: Federated learning (FL) is an emerging paradigm that permits a large number of clients with heterogeneous data to coordinate learning of a unified global model without the need to share data amongst each other. A major challenge in federated learning is the heterogeneity of data across client, which can degrade the performance of standard FL algorithms. Standard FL algorithms involve averaging of… ▽ More

    Submitted 14 November, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

  31. arXiv:2107.09539  [pdf, other

    cs.LG eess.SP

    Parametric Scattering Networks

    Authors: Shanel Gauthier, Benjamin Thérien, Laurent Alsène-Racicot, Muawiz Chaudhary, Irina Rish, Eugene Belilovsky, Michael Eickenberg, Guy Wolf

    Abstract: The wavelet scattering transform creates geometric invariants and deformation stability. In multiple signal domains, it has been shown to yield more discriminative representations compared to other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering tra… ▽ More

    Submitted 15 August, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

    ACM Class: F.2.2; I.2.7

  32. arXiv:2106.06440  [pdf, other

    cs.CV cs.LG

    Learning Compositional Shape Priors for Few-Shot 3D Reconstruction

    Authors: Mateusz Michalkiewicz, Stavros Tsogkas, Sarah Parisot, Mahsa Baktashmotlagh, Anders Eriksson, Eugene Belilovsky

    Abstract: The impressive performance of deep convolutional neural networks in single-view 3D reconstruction suggests that these models perform non-trivial reasoning about the 3D structure of the output space. Recent work has challenged this belief, showing that, on standard benchmarks, complex encoder-decoder architectures perform similarly to nearest-neighbor baselines or simple linear decoder models that… ▽ More

    Submitted 16 June, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: 13 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2004.06302

  33. arXiv:2106.06401  [pdf, other

    cs.LG cs.DC

    Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

    Authors: Eugene Belilovsky, Louis Leconte, Lucas Caccia, Michael Eickenberg, Edouard Oyallon

    Abstract: A commonly cited inefficiency of neural network training using back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can alleviate this issue have been proposed. In this context, we consider a simple alternative based on minimal feedback, which we call Decoupled Greedy Learning (DGL). It… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:1901.08164

  34. arXiv:2104.05025  [pdf, other

    cs.LG

    New Insights on Reducing Abrupt Representation Change in Online Continual Learning

    Authors: Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky

    Abstract: In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. Experience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously un… ▽ More

    Submitted 2 May, 2022; v1 submitted 11 April, 2021; originally announced April 2021.

    Comments: Accepted at ICLR 2022. Code available at www.github.com/pclucas14/AML

  35. arXiv:2101.07528  [pdf, other

    cs.CV cs.LG

    The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

    Authors: Louis Thiry, Michael Arbel, Eugene Belilovsky, Edouard Oyallon

    Abstract: A recent line of work showed that various forms of convolutional kernel methods can be competitive with standard supervised deep convolutional networks on datasets like CIFAR-10, obtaining accuracies in the range of 87-90% while being more amenable to theoretical analysis. In this work, we highlight the importance of a data-dependent feature extraction step that is key to the obtain good performan… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

    Journal ref: International Conference on Learning Representation (ICLR 2021), 2021, Vienna (online), Austria

  36. arXiv:2007.05756  [pdf, other

    cs.CV cs.LG stat.ML

    Generative Compositional Augmentations for Scene Graph Prediction

    Authors: Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky

    Abstract: Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language. We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution. Current scene graph generation models are trained on a tiny fraction of the distribution corresponding to the… ▽ More

    Submitted 1 October, 2021; v1 submitted 11 July, 2020; originally announced July 2020.

    Comments: ICCV 2021 camera ready. Added more baselines, combining GANs with Neural Motifs and t-sne visualizations. Code is available at https://github.com/bknyaz/sgg

  37. arXiv:2005.08230  [pdf, other

    cs.CV cs.LG

    Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation

    Authors: Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky

    Abstract: Scene graph generation (SGG) aims to predict graph-structured descriptions of input images, in the form of objects and relationships between them. This task is becoming increasingly useful for progress at the interface of vision and language. Here, it is important - yet challenging - to perform well on novel (zero-shot) or rare (few-shot) compositions of objects and relationships. In this paper, w… ▽ More

    Submitted 17 August, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

    Comments: accepted at BMVC 2020, the code is available at https://github.com/bknyaz/sgg

  38. arXiv:2005.04623  [pdf, other

    cs.CV

    A Simple and Scalable Shape Representation for 3D Reconstruction

    Authors: Mateusz Michalkiewicz, Eugene Belilovsky, Mahsa Baktashmotlagh, Anders Eriksson

    Abstract: Deep learning applied to the reconstruction of 3D shapes has seen growing interest. A popular approach to 3D reconstruction and generation in recent years has been the CNN encoder-decoder model usually applied in voxel space. However, this often scales very poorly with the resolution limiting the effectiveness of these models. Several sophisticated alternatives for decoding to 3D shapes have been… ▽ More

    Submitted 10 May, 2020; originally announced May 2020.

    Comments: 9 pages plus 3 pages of references. 4 figures

    MSC Class: 65D19

  39. arXiv:2004.06302  [pdf, other

    cs.CV cs.LG

    Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors

    Authors: Mateusz Michalkiewicz, Sarah Parisot, Stavros Tsogkas, Mahsa Baktashmotlagh, Anders Eriksson, Eugene Belilovsky

    Abstract: The impressive performance of deep convolutional neural networks in single-view 3D reconstruction suggests that these models perform non-trivial reasoning about the 3D structure of the output space. However, recent work has challenged this belief, showing that complex encoder-decoder architectures perform similarly to nearest-neighbor baselines or simple linear decoder models that exploit large am… ▽ More

    Submitted 2 May, 2020; v1 submitted 14 April, 2020; originally announced April 2020.

  40. arXiv:1911.08019  [pdf, other

    cs.LG cs.CV stat.ML

    Online Learned Continual Compression with Adaptive Quantization Modules

    Authors: Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Joelle Pineau

    Abstract: We introduce and study the problem of Online Continual Compression, where one attempts to simultaneously learn to compress and store a representative dataset from a non i.i.d data stream, while only observing each sample once. A naive application of auto-encoders in this setting encounters a major challenge: representations derived from earlier encoder states must be usable by later decoder states… ▽ More

    Submitted 20 August, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

  41. arXiv:1908.04950  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering

    Authors: Cătălina Cangea, Eugene Belilovsky, Pietro Liò, Aaron Courville

    Abstract: Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely on its egocentric input to answer a given question. The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world. However, initi… ▽ More

    Submitted 14 August, 2019; originally announced August 2019.

    Comments: To appear at BMVC 2019. 15 pages, 5 figures

  42. arXiv:1908.04742  [pdf, other

    cs.LG stat.ML

    Online Continual Learning with Maximally Interfered Retrieval

    Authors: Rahaf Aljundi, Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Min Lin, Laurent Charlin, Tinne Tuytelaars

    Abstract: Continual learning, the setting where a learning agent is faced with a never ending stream of data, continues to be a great challenge for modern machine learning systems. In particular the online or "single-pass through the data" setting has gained attention recently as a natural setting that is difficult to tackle. Methods based on replay, either generative or from a stored memory, have been show… ▽ More

    Submitted 29 October, 2019; v1 submitted 11 August, 2019; originally announced August 2019.

    Journal ref: NeurIPS 2019

  43. arXiv:1901.08164  [pdf, other

    cs.LG stat.ML

    Decoupled Greedy Learning of CNNs

    Authors: Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

    Abstract: A commonly cited inefficiency of neural network training by back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can alleviate this issue have been proposed. In this context, we consider a simpler, but more effective, substitute that uses minimal feedback, which we call Decoupled Greedy… ▽ More

    Submitted 19 June, 2020; v1 submitted 23 January, 2019; originally announced January 2019.

  44. arXiv:1812.11446  [pdf, other

    cs.LG stat.ML

    Greedy Layerwise Learning Can Scale to ImageNet

    Authors: Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon

    Abstract: Shallow supervised 1-hidden layer neural networks have a number of favorable properties that make them easier to interpret, analyze, and optimize than their deep counterparts, but lack their representational power. Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks. Contrary to previous approaches usin… ▽ More

    Submitted 23 April, 2019; v1 submitted 29 December, 2018; originally announced December 2018.

  45. arXiv:1812.11214  [pdf, ps, other

    cs.LG cs.CV cs.SD eess.AS stat.ML

    Kymatio: Scattering Transforms in Python

    Authors: Mathieu Andreux, Tomás Angles, Georgios Exarchakis, Roberto Leonarduzzi, Gaspar Rochette, Louis Thiry, John Zarka, Stéphane Mallat, Joakim andén, Eugene Belilovsky, Joan Bruna, Vincent Lostanlen, Muawiz Chaudhary, Matthew J. Hirn, Edouard Oyallon, Sixin Zhang, Carmine Cella, Michael Eickenberg

    Abstract: The wavelet scattering transform is an invariant signal representation suitable for many signal processing and machine learning applications. We present the Kymatio software package, an easy-to-use, high-performance Python implementation of the scattering transform in 1D, 2D, and 3D that is compatible with modern deep learning frameworks. All transforms may be executed on a GPU (in addition to CPU… ▽ More

    Submitted 31 May, 2022; v1 submitted 28 December, 2018; originally announced December 2018.

  46. arXiv:1811.05013  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Blindfold Baselines for Embodied QA

    Authors: Ankesh Anand, Eugene Belilovsky, Kyle Kastner, Hugo Larochelle, Aaron Courville

    Abstract: We explore blindfold (question-only) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question by intelligently navigating in a simulated environment, gathering necessary visual information only through first-person vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate sol… ▽ More

    Submitted 12 November, 2018; originally announced November 2018.

    Comments: NIPS 2018 Visually-Grounded Interaction and Language (ViGilL) Workshop

  47. Compressing the Input for CNNs with the First-Order Scattering Transform

    Authors: Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko, Michal Valko

    Abstract: We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN). We show theoretical and empirical evidence that in the case of natural images and sufficiently small translation invariance, this transform preserves most of the signal information needed for classification while substantially reducing the spatial resolution and t… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

    Journal ref: ECCV 2018

  48. arXiv:1809.06367  [pdf, other

    cs.LG cs.CV stat.ML

    Scattering Networks for Hybrid Representation Learning

    Authors: Edouard Oyallon, Sergey Zagoruyko, Gabriel Huang, Nikos Komodakis, Simon Lacoste-Julien, Matthew Blaschko, Eugene Belilovsky

    Abstract: Scattering networks are a class of designed Convolutional Neural Networks (CNNs) with fixed weights. We argue they can serve as generic representations for modelling images. In particular, by working in scattering space, we achieve competitive results both for supervised and unsupervised learning tasks, while making progress towards constructing more interpretable CNNs. For supervised learning, we… ▽ More

    Submitted 17 September, 2018; originally announced September 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1703.08961

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, 2018, pp.11

  49. arXiv:1703.08961  [pdf, ps, other

    cs.CV cs.LG

    Scaling the Scattering Transform: Deep Hybrid Networks

    Authors: Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko

    Abstract: We use the scattering network as a generic and fixed ini-tialization of the first layers of a supervised hybrid deep network. We show that early layers do not necessarily need to be learned, providing the best results to-date with pre-defined representations while being competitive with Deep CNNs. Using a shallow cascade of 1 x 1 convolutions, which encodes scattering coefficients that correspond… ▽ More

    Submitted 4 April, 2017; v1 submitted 27 March, 2017; originally announced March 2017.

  50. arXiv:1611.05740  [pdf, other

    cs.AI

    Fast Non-Parametric Tests of Relative Dependency and Similarity

    Authors: Wacha Bounliphone, Eugene Belilovsky, Arthur Tenenhaus, Ioannis Antonoglou, Arthur Gretton, Matthew B. Blashcko

    Abstract: We introduce two novel non-parametric statistical hypothesis tests. The first test, called the relative test of dependency, enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the Hilbert-Schmidt Independence Criterion (HSIC). The second test, called the relative test of similarity, is use to determi… ▽ More

    Submitted 17 November, 2016; originally announced November 2016.