Skip to main content

Showing 1–50 of 86 results for author: Rabbat, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05183  [pdf, other

    cs.LG cs.AI cs.CL

    The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

    Authors: Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, Mike Rabbat, Mark Ibrahim

    Abstract: Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorizatio… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 figures

  2. arXiv:2404.16717  [pdf, other

    cs.CV cs.AI cs.HC

    Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class

    Authors: Mazda Moayeri, Michael Rabbat, Mark Ibrahim, Diane Bouchacourt

    Abstract: Vision-language models enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today's best models exhibit skewed performance when objects are dissimilar from their typical depiction. Real world objects such as pears appear in a variety of forms -- from diced to whole, on a table or in a bowl -- yet standard V… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted to FAccT 2024

  3. arXiv:2404.08471  [pdf, other

    cs.CV cs.AI cs.LG

    Revisiting Feature Prediction for Learning Visual Representations from Video

    Authors: Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann LeCun, Mahmoud Assran, Nicolas Ballas

    Abstract: This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datase… ▽ More

    Submitted 15 February, 2024; originally announced April 2024.

  4. arXiv:2403.14421  [pdf, other

    cs.LG cs.CR cs.CV

    DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

    Authors: Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo

    Abstract: Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guara… ▽ More

    Submitted 13 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  5. arXiv:2402.14083  [pdf, other

    cs.AI

    Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrap**

    Authors: Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul Mcvay, Michael Rabbat, Yuandong Tian

    Abstract: While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the $A^*$ se… ▽ More

    Submitted 26 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  6. arXiv:2309.06497  [pdf, other

    cs.LG cs.DC cs.MS math.OC

    A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

    Authors: Hao-Jun Michael Shi, Tsung-Hsien Lee, Shintaro Iwasaki, Jose Gallego-Posada, Zhi**g Li, Kaushik Rangadurai, Dheevatsa Mudigere, Michael Rabbat

    Abstract: Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the perform… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: 38 pages, 8 figures, 5 tables

  7. arXiv:2306.07179  [pdf, other

    cs.LG stat.ML

    Benchmarking Neural Network Training Algorithms

    Authors: George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson

    Abstract: Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a communi… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 102 pages, 8 figures, 41 tables

  8. arXiv:2304.07193  [pdf, other

    cs.CV

    DINOv2: Learning Robust Visual Features without Supervision

    Authors: Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin , et al. (1 additional authors not shown)

    Abstract: The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pr… ▽ More

    Submitted 2 February, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  9. arXiv:2303.14604  [pdf, other

    cs.LG

    Green Federated Learning

    Authors: Ashkan Yousefpour, Shen Guo, Ashish Shenoy, Sayan Ghosh, Pierre Stock, Kiwan Maeng, Schalk-Willem Krüger, Michael Rabbat, Carole-Jean Wu, Ilya Mironov

    Abstract: The rapid progress of AI is fueled by increasingly large and computationally intensive machine learning models and datasets. As a consequence, the amount of compute used in training state-of-the-art models is exponentially increasing (doubling every 10 months between 2015 and 2022), resulting in a large carbon footprint. Federated Learning (FL) - a collaborative machine learning technique for trai… ▽ More

    Submitted 1 August, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

  10. arXiv:2301.08243  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

    Authors: Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas

    Abstract: This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target block… ▽ More

    Submitted 13 April, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: 2023 IEEE/CVF International Conference on Computer Vision

  11. arXiv:2211.03942  [pdf, other

    cs.LG cs.CR

    Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design

    Authors: Chuan Guo, Kamalika Chaudhuri, Pierre Stock, Mike Rabbat

    Abstract: In private federated learning (FL), a server aggregates differentially private updates from a large number of clients in order to train a machine learning model. The main challenge in this setting is balancing privacy with both classification accuracy of the learnt model as well as the number of bits communicated between the clients and server. Prior work has achieved a good trade-off by designing… ▽ More

    Submitted 9 August, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

  12. arXiv:2210.11948  [pdf, other

    cs.LG

    lo-fi: distributed fine-tuning without communication

    Authors: Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos

    Abstract: When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contrast, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node is fine-tuned independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base… ▽ More

    Submitted 12 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

  13. arXiv:2210.08090  [pdf, other

    cs.LG cs.AI

    Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning

    Authors: John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael Rabbat

    Abstract: An oft-cited challenge of federated learning is the presence of heterogeneity. \emph{Data heterogeneity} refers to the fact that data from different clients may follow very different distributions. \emph{System heterogeneity} refers to the fact that client devices have different system capabilities. A considerable number of federated optimization methods address this challenge. In the literature,… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: v2. arXiv admin note: substantial text overlap with arXiv:2206.15387

  14. arXiv:2210.07277  [pdf, other

    cs.LG cs.AI cs.CV

    The Hidden Uniform Cluster Prior in Self-Supervised Learning

    Authors: Mahmoud Assran, Randall Balestriero, Quentin Duval, Florian Bordes, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Nicolas Ballas

    Abstract: A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e.g., SimCLR, VICReg, SwAV, MSN). We show that in the formulation of all these methods is an overlooked prior to learn features that enable uniform clustering of the data. While this prior has led to remarkably semantic representations when pretraining on class-bal… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  15. arXiv:2206.15387  [pdf, other

    cs.LG cs.AI

    Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning

    Authors: John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael Rabbat

    Abstract: An oft-cited challenge of federated learning is the presence of heterogeneity. \emph{Data heterogeneity} refers to the fact that data from different clients may follow very different distributions. \emph{System heterogeneity} refers to client devices having different system capabilities. A considerable number of federated optimization methods address this challenge. In the literature, empirical ev… ▽ More

    Submitted 24 March, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: Accepted at ICLR

    Journal ref: International Conference on Learning Representations 2023

  16. arXiv:2206.02633  [pdf, other

    cs.IR cs.LG

    Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity

    Authors: Kiwan Maeng, Haiyu Lu, Luca Melis, John Nguyen, Mike Rabbat, Carole-Jean Wu

    Abstract: Federated learning (FL) is an effective mechanism for data privacy in recommender systems by running machine learning model training on-device. While prior FL optimizations tackled the data and system heterogeneity challenges faced by FL, they assume the two are independent of each other. This fundamental assumption is not reflective of real-world, large-scale recommender systems -- data and syste… ▽ More

    Submitted 30 May, 2022; originally announced June 2022.

  17. arXiv:2206.01206  [pdf, other

    cs.LG cs.AI

    Positive Unlabeled Contrastive Learning

    Authors: Anish Acharya, Sujay Sanghavi, Li **g, Bhargav Bhushanam, Dhruv Choudhary, Michael Rabbat, Inderjit Dhillon

    Abstract: Self-supervised pretraining on unlabeled data followed by supervised fine-tuning on labeled data is a popular paradigm for learning from limited labeled examples. We extend this paradigm to the classical positive unlabeled (PU) setting, where the task is to learn a binary classifier given only a few labeled positive samples, and (often) a large amount of unlabeled samples (which could be positive… ▽ More

    Submitted 28 March, 2024; v1 submitted 1 June, 2022; originally announced June 2022.

  18. arXiv:2204.13169  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    FedShuffle: Recipes for Better Use of Local Work in Federated Learning

    Authors: Samuel Horváth, Maziar Sanjabi, Lin Xiao, Peter Richtárik, Michael Rabbat

    Abstract: The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). Such methods are usually implemented by having clients perform one or more epochs of local training per round while randomly reshuffling their finite dataset in each epoch. Data imbalance, wher… ▽ More

    Submitted 27 September, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

    Comments: Published in Transactions on Machine Learning Research (09/2022)

  19. arXiv:2204.07141  [pdf, other

    cs.LG cs.AI cs.CV eess.IV

    Masked Siamese Networks for Label-Efficient Learning

    Authors: Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas

    Abstract: We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  20. arXiv:2204.03809  [pdf, other

    cs.LG cs.DC math.OC

    Federated Learning with Partial Model Personalization

    Authors: Krishna Pillutla, Kshitiz Malik, Abdelrahman Mohamed, Michael Rabbat, Maziar Sanjabi, Lin Xiao

    Abstract: We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices. Both algorithms have been proposed in the literature, but their convergence properties are not fully understood, especially for the alternating variant. We provide convergence analyses of both algorithms… ▽ More

    Submitted 15 August, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Journal ref: ICML 2022: 17716-17758

  21. arXiv:2203.08134  [pdf, other

    cs.LG cs.CR

    Privacy-Aware Compression for Federated Data Analysis

    Authors: Kamalika Chaudhuri, Chuan Guo, Mike Rabbat

    Abstract: Federated data analytics is a framework for distributed data analysis where a server compiles noisy responses from a group of distributed low-bandwidth user devices to estimate aggregate statistics. Two major challenges in this framework are privacy, since user data is often sensitive, and compression, since the user devices have low network bandwidth. Prior work has addressed these challenges sep… ▽ More

    Submitted 9 June, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

  22. arXiv:2111.04877  [pdf, other

    cs.LG cs.DC

    Papaya: Practical, Private, and Scalable Federated Learning

    Authors: Dzmitry Huba, John Nguyen, Kshitiz Malik, Ruiyu Zhu, Mike Rabbat, Ashkan Yousefpour, Carole-Jean Wu, Hongyuan Zhan, Pavel Ustinov, Harish Srinivas, Kaikai Wang, Anthony Shoumikhin, Jesik Min, Mani Malek

    Abstract: Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges that differentiate it from traditional distributed learning, variability in the system characteristics on each device, and millions of clients coordinating with a central server being primary ones. Most FL systems described in the literature are synchronous - they perform a synchronized aggregation of m… ▽ More

    Submitted 25 April, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

  23. arXiv:2111.00364  [pdf, other

    cs.LG cs.AI cs.AR

    Sustainable AI: Environmental Implications, Challenges and Opportunities

    Authors: Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga Behram, James Huang, Charles Bai, Michael Gschwind, Anurag Gupta, Myle Ott, Anastasia Melnikov, Salvatore Candido, David Brooks, Geeta Chauhan, Benjamin Lee, Hsien-Hsin S. Lee, Bugra Akyildiz, Maximilian Balandat, Joe Spisak, Ravi Jain, Mike Rabbat, Kim Hazelwood

    Abstract: This paper explores the environmental impact of the super-linear growth trends for AI from a holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the carbon footprint of AI computing by examining the model development cycle across industry-scale machine learning use cases and, at the same time, considering the life cycle of system hardware. Taking a step further, w… ▽ More

    Submitted 9 January, 2022; v1 submitted 30 October, 2021; originally announced November 2021.

  24. arXiv:2110.08133  [pdf, other

    cs.LG cs.CV

    Trade-offs of Local SGD at Scale: An Empirical Study

    Authors: Jose Javier Gonzalez Ortiz, Jonathan Frankle, Mike Rabbat, Ari Morcos, Nicolas Ballas

    Abstract: As datasets and models become increasingly large, distributed training has become a necessary component to allow deep neural networks to train in reasonable amounts of time. However, distributed training can have substantial communication overhead that hinders its scalability. One strategy for reducing this overhead is to perform multiple unsynchronized SGD steps independently on each worker betwe… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

  25. arXiv:2106.11851  [pdf, other

    cs.LG math.OC

    Stochastic Polyak Stepsize with a Moving Target

    Authors: Robert M. Gower, Aaron Defazio, Michael Rabbat

    Abstract: We propose a new stochastic gradient method called MOTAPS (Moving Targetted Polyak Stepsize) that uses recorded past loss values to compute adaptive stepsizes. MOTAPS can be seen as a variant of the Stochastic Polyak (SP) which is also a method that also uses loss values to adjust the stepsize. The downside to the SP method is that it only converges when the interpolation condition holds. MOTAPS i… ▽ More

    Submitted 23 September, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: 49 pages, 13 figures, 1 table

    MSC Class: 90C53; 74S60; 90C06; 62L20; 68W20; 15B52; 65Y20; 68W40 ACM Class: G.1.6

  26. arXiv:2106.06639  [pdf, other

    cs.LG

    Federated Learning with Buffered Asynchronous Aggregation

    Authors: John Nguyen, Kshitiz Malik, Hongyuan Zhan, Ashkan Yousefpour, Michael Rabbat, Mani Malek, Dzmitry Huba

    Abstract: Scalability and privacy are two critical concerns for cross-device federated learning (FL) systems. In this work, we identify that synchronous FL - synchronized aggregation of client updates in FL - cannot scale efficiently beyond a few hundred clients training in parallel. It leads to diminishing returns in model performance and training speed, analogous to large-batch training. On the other hand… ▽ More

    Submitted 7 March, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: Accepted at AISTATS 2022. Previously accepted at FL-ICML 2021

  27. arXiv:2104.13963  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples

    Authors: Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Armand Joulin, Nicolas Ballas, Michael Rabbat

    Abstract: This paper proposes a novel method of learning by predicting view assignments with support samples (PAWS). The method trains a model to minimize a consistency loss, which ensures that different views of the same unlabeled instance are assigned similar pseudo-labels. The pseudo-labels are generated non-parametrically, by comparing the representations of the image views to those of a set of randomly… ▽ More

    Submitted 30 July, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

    Journal ref: ICCV 2021

  28. arXiv:2101.04968  [pdf, other

    stat.ML cs.LG math.ST

    Learning with Gradient Descent and Weakly Convex Losses

    Authors: Dominic Richards, Mike Rabbat

    Abstract: We study the learning performance of gradient descent when the empirical risk is weakly convex, namely, the smallest negative eigenvalue of the empirical risk's Hessian is bounded in magnitude. By showing that this eigenvalue can control the stability of gradient descent, generalisation error bounds are proven that hold under a wider range of step sizes compared to previous work. Out of sample gua… ▽ More

    Submitted 1 June, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Comments: Updated References

  29. arXiv:2011.02999  [pdf, other

    cs.LG cs.DC

    CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery

    Authors: Kiwan Maeng, Shivam Bharuka, Isabel Gao, Mark C. Jeffrey, Vikram Saraph, Bor-Yiing Su, Caroline Trippel, Jiyan Yang, Mike Rabbat, Brandon Lucia, Carole-Jean Wu

    Abstract: The paper proposes and optimizes a partial recovery training system, CPR, for recommendation models. CPR relaxes the consistency requirement by enabling non-failed nodes to proceed without loading checkpoints when a node fails during training, improving failure-related overheads. The paper is the first to the extent of our knowledge to perform a data-driven, in-depth analysis of applying partial r… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

  30. arXiv:2010.02838  [pdf, other

    cs.LG cs.DC math.OC

    A Closer Look at Codistillation for Distributed Training

    Authors: Shagun Sodhani, Olivier Delalleau, Mahmoud Assran, Koustuv Sinha, Nicolas Ballas, Michael Rabbat

    Abstract: Codistillation has been proposed as a mechanism to share knowledge among concurrently trained models by encouraging them to represent the same function through an auxiliary loss. This contrasts with the more commonly used fully-synchronous data-parallel stochastic gradient descent methods, where different model replicas average their gradients (or parameters) at every iteration and thus maintain i… ▽ More

    Submitted 25 July, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Under review

  31. arXiv:2009.05445  [pdf, other

    math.OC cs.MA

    Stability of Decentralized Gradient Descent in Open Multi-Agent Systems

    Authors: Julien M. Hendrickx, Michael G. Rabbat

    Abstract: The aim of decentralized gradient descent (DGD) is to minimize a sum of $n$ functions held by interconnected agents. We study the stability of DGD in open contexts where agents can join or leave the system, resulting each time in the addition or the removal of their function from the global objective. Assuming all functions are smooth, strongly convex, and their minimizers all lie in a given ball,… ▽ More

    Submitted 11 September, 2020; originally announced September 2020.

    Comments: 8 pages, 2 figures, 3 pdf files for the figures

  32. arXiv:2006.13838  [pdf, other

    cs.LG math.OC stat.ML

    Advances in Asynchronous Parallel and Distributed Optimization

    Authors: Mahmoud Assran, Arda Aytekin, Hamid Feyzmahdavian, Mikael Johansson, Michael Rabbat

    Abstract: Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. Consequently, they generally can make more efficient use of compu… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: 33 pages, 4 figures

  33. arXiv:2006.10803  [pdf, other

    cs.LG cs.CV stat.ML

    Supervision Accelerates Pre-training in Contrastive Semi-Supervised Learning of Visual Representations

    Authors: Mahmoud Assran, Nicolas Ballas, Lluis Castrejon, Michael Rabbat

    Abstract: We investigate a strategy for improving the efficiency of contrastive learning of visual representations by leveraging a small amount of supervised information during pre-training. We propose a semi-supervised loss, SuNCEt, based on noise-contrastive estimation and neighbourhood component analysis, that aims to distinguish examples of different classes in addition to the self-supervised instance-w… ▽ More

    Submitted 1 December, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

  34. arXiv:2002.12414  [pdf, other

    cs.LG math.OC stat.ML

    On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings

    Authors: Mahmoud Assran, Michael Rabbat

    Abstract: We study Nesterov's accelerated gradient method with constant step-size and momentum parameters in the stochastic approximation setting (unbiased gradients with bounded variance) and the finite-sum setting (where randomness is due to sampling mini-batches). To build better insight into the behavior of Nesterov's method in stochastic settings, we focus throughout on objectives that are smooth, stro… ▽ More

    Submitted 27 June, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

    Journal ref: International Conference on Machine Learning (ICML 2020)

  35. arXiv:2001.02518  [pdf, other

    eess.IV cs.CV

    Advancing machine learning for MR image reconstruction with an open competition: Overview of the 2019 fastMRI challenge

    Authors: Florian Knoll, Tullie Murrell, Anuroop Sriram, Nafissa Yakubova, Jure Zbontar, Michael Rabbat, Aaron Defazio, Matthew J. Muckley, Daniel K. Sodickson, C. Lawrence Zitnick, Michael P. Recht

    Abstract: Purpose: To advance research in the field of machine learning for MR image reconstruction with an open challenge. Methods: We provided participants with a dataset of raw k-space data from 1,594 consecutive clinical exams of the knee. The goal of the challenge was to reconstruct images from these data. In order to strike a balance between realistic data and a shallow learning curve for those not al… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

  36. arXiv:1910.04054  [pdf, other

    cs.LG cs.DC cs.NI stat.ML

    MVFST-RL: An Asynchronous RL Framework for Congestion Control with Delayed Actions

    Authors: Viswanath Sivakumar, Olivier Delalleau, Tim Rocktäschel, Alexander H. Miller, Heinrich Küttler, Nantas Nardelli, Mike Rabbat, Joelle Pineau, Sebastian Riedel

    Abstract: Effective network congestion control strategies are key to kee** the Internet (or any large computer network) operational. Network congestion control has been dominated by hand-crafted heuristics for decades. Recently, ReinforcementLearning (RL) has emerged as an alternative to automatically optimize such control strategies. Research so far has primarily considered RL interfaces which block the… ▽ More

    Submitted 26 May, 2021; v1 submitted 9 October, 2019; originally announced October 2019.

    Comments: Workshop on ML for Systems at NeurIPS 2019

  37. arXiv:1910.00643  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

    Authors: Jianyu Wang, Vinayak Tantia, Nicolas Ballas, Michael Rabbat

    Abstract: Distributed optimization is essential for training large models on large datasets. Multiple approaches have been proposed to reduce the communication overhead in distributed training, such as synchronizing only after performing multiple local SGD steps, and decentralized methods (e.g., using gossip algorithms) to decouple communications among workers. Although these methods run faster than AllRedu… ▽ More

    Submitted 19 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted to ICLR 2020

  38. arXiv:1906.04585  [pdf, other

    cs.LG cs.AI cs.MA math.OC stat.ML

    Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

    Authors: Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Michael Rabbat

    Abstract: Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning by stabilizing learning and allowing for higher training throughputs. We propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take a… ▽ More

    Submitted 21 April, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

    Journal ref: Advances in Neural Information Processing Systems (2019) 13299-13309

  39. arXiv:1812.03096  [pdf, other

    cs.SI physics.soc-ph

    Effectiveness of Alter Sampling in Social Networks

    Authors: Naghmeh Momeni, Michael G. Rabbat

    Abstract: Social networks play a key role in studying various individual and social behaviors. To use social networks in a study, their structural properties must be measured. For offline social networks, the conventional procedure is surveying/interviewing a set of randomly-selected respondents. In many practical applications, inferring the network structure via sampling is too prohibitively costly. There… ▽ More

    Submitted 14 December, 2018; v1 submitted 7 December, 2018; originally announced December 2018.

  40. arXiv:1812.01711  [pdf, other

    cs.CV cs.LG stat.ML

    A Graph-CNN for 3D Point Cloud Classification

    Authors: Yingxue Zhang, Michael Rabbat

    Abstract: Graph convolutional neural networks (Graph-CNNs) extend traditional CNNs to handle data that is supported on a graph. Major challenges when working with data on graphs are that the support set (the vertices of the graph) do not typically have a natural ordering, and in general, the topology of the graph is not regular (i.e., vertices do not all have the same number of neighbors). Thus, Graph-CNNs… ▽ More

    Submitted 28 November, 2018; originally announced December 2018.

    Comments: Published as a conference paper at ICASSP 2018

  41. arXiv:1811.10792  [pdf, other

    cs.LG cs.AI cs.DC cs.MA math.OC stat.ML

    Stochastic Gradient Push for Distributed Deep Learning

    Authors: Mahmoud Assran, Nicolas Loizou, Nicolas Ballas, Michael Rabbat

    Abstract: Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes. Approaches that synchronize nodes using exact distributed averaging (e.g., via AllReduce) are sensitive to stragglers and communication delays. The PushSum gossip algorithm is robust to these issues, but only perfor… ▽ More

    Submitted 14 May, 2019; v1 submitted 26 November, 2018; originally announced November 2018.

    Comments: ICML 2019

    Journal ref: International Conference on Machine Learning 97 (2019) 344-353

  42. arXiv:1811.08839  [pdf, other

    cs.CV cs.LG eess.SP physics.med-ph stat.ML

    fastMRI: An Open Dataset and Benchmarks for Accelerated MRI

    Authors: Jure Zbontar, Florian Knoll, Anuroop Sriram, Tullie Murrell, Zhengnan Huang, Matthew J. Muckley, Aaron Defazio, Ruben Stern, Patricia Johnson, Mary Bruno, Marc Parente, Krzysztof J. Geras, Joe Katsnelson, Hersh Chandarana, Zizhao Zhang, Michal Drozdzal, Adriana Romero, Michael Rabbat, Pascal Vincent, Nafissa Yakubova, James Pinkerton, Duo Wang, Erich Owens, C. Lawrence Zitnick, Michael P. Recht , et al. (2 additional authors not shown)

    Abstract: Accelerating Magnetic Resonance Imaging (MRI) by taking fewer measurements has the potential to reduce medical costs, minimize stress to patients and make MRI possible in applications where it is currently prohibitively slow or expensive. We introduce the fastMRI dataset, a large-scale collection of both raw MR measurements and clinical MR images, that can be used for training and evaluation of ma… ▽ More

    Submitted 11 December, 2019; v1 submitted 21 November, 2018; originally announced November 2018.

    Comments: 35 pages, 10 figures

  43. arXiv:1810.13084  [pdf, other

    math.OC cs.DC cs.LG cs.MA eess.SY

    Provably Accelerated Randomized Gossip Algorithms

    Authors: Nicolas Loizou, Michael Rabbat, Peter Richtárik

    Abstract: In this work we present novel provably accelerated gossip algorithms for solving the average consensus problem. The proposed protocols are inspired from the recently developed accelerated variants of the randomized Kaczmarz method - a popular method for solving linear systems. In each gossip iteration all nodes of the network update their values but only a pair of them exchange their private infor… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

  44. arXiv:1810.11187  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    TarMAC: Targeted Multi-Agent Communication

    Authors: Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Michael Rabbat, Joelle Pineau

    Abstract: We propose a targeted communication architecture for multi-agent reinforcement learning, where agents learn both what messages to send and whom to address them to while performing cooperative tasks in partially-observable environments. This targeting behavior is learnt solely from downstream task-specific reward without any communication supervision. We additionally augment this with a multi-round… ▽ More

    Submitted 21 February, 2020; v1 submitted 26 October, 2018; originally announced October 2018.

    Comments: ICML 2019

  45. arXiv:1806.00848  [pdf, other

    cs.LG cs.SI stat.ML

    Learning graphs from data: A signal representation perspective

    Authors: Xiaowen Dong, Dorina Thanou, Michael Rabbat, Pascal Frossard

    Abstract: The construction of a meaningful graph topology plays a crucial role in the effective representation, processing, analysis and visualization of structured data. When a natural choice of the graph is not readily available from the data sets, it is thus desirable to infer or learn a graph topology from the data. In this tutorial overview, we survey solutions to the problem of graph learning, includi… ▽ More

    Submitted 20 May, 2019; v1 submitted 3 June, 2018; originally announced June 2018.

    Comments: corrected several imprecise statements in previous versions of the manuscript as well as in the article of the same title in the May 2019 issue of IEEE Signal Processing Magazine (vol. 36, no. 3, pp. 44-63, May 2019)

  46. arXiv:1803.08950  [pdf, other

    cs.MA eess.SY math.OC

    Asynchronous Gradient-Push

    Authors: Mahmoud Assran, Michael Rabbat

    Abstract: We consider a multi-agent framework for distributed optimization where each agent has access to a local smooth strongly convex function, and the collective goal is to achieve consensus on the parameters that minimize the sum of the agents' local functions. We propose an algorithm wherein each agent operates asynchronously and independently of the other agents. When the local functions are strongly… ▽ More

    Submitted 2 March, 2020; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: 33 pages, 9 figures, accepted to IEEE Transactions on Automatic Control

    Journal ref: IEEE Transactions on Automatic Control (2020)

  47. arXiv:1709.08765  [pdf, other

    math.OC cs.DC cs.MA

    Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization

    Authors: Angelia Nedić, Alex Olshevsky, Michael G. Rabbat

    Abstract: In decentralized optimization, nodes cooperate to minimize an overall objective function that is the sum (or average) of per-node private objective functions. Algorithms interleave local computations with communication among all or a subset of the nodes. Motivated by a variety of applications---distributed estimation in sensor networks, fitting models to massive data sets, and distributed control… ▽ More

    Submitted 15 January, 2018; v1 submitted 25 September, 2017; originally announced September 2017.

    Comments: 32 pages, 3 figures

  48. arXiv:1706.07828  [pdf, other

    cs.SI physics.soc-ph

    Inferring Structural Characteristics of Networks with Strong and Weak Ties from Fixed-Choice Surveys

    Authors: Naghmeh Momeni, Michael Rabbat

    Abstract: Knowing the structure of an offline social network facilitates a variety of analyses, including studying the rate at which infectious diseases may spread and identifying a subset of actors to immunize in order to reduce, as much as possible, the rate of spread. Offline social network topologies are typically estimated by surveying actors and asking them to list their neighbours. While identifying… ▽ More

    Submitted 23 June, 2017; originally announced June 2017.

    Comments: 24 pages, 13 figures

  49. arXiv:1605.05251  [pdf, ps, other

    cs.IT math.SP

    Graph reconstruction from the observation of diffused signals

    Authors: Bastien Pasdeloup, Michael Rabbat, Vincent Gripon, Dominique Pastor, Grégoire Mercier

    Abstract: Signal processing on graphs has received a lot of attention in the recent years. A lot of techniques have arised, inspired by classical signal processing ones, to allow studying signals on any kind of graph. A common aspect of these technique is that they require a graph correctly modeling the studied support to explain the signals that are observed on it. However, in many cases, such a graph is u… ▽ More

    Submitted 27 April, 2016; originally announced May 2016.

    Comments: Allerton 2015 : 53th Annual Allerton Conference on Communication, Control and Computing, 30 september - 02 october 2015, Allerton, United States, 2015

  50. arXiv:1605.02569  [pdf, other

    cs.DS

    Characterization and Inference of Graph Diffusion Processes from Observations of Stationary Signals

    Authors: Bastien Pasdeloup, Vincent Gripon, Grégoire Mercier, Dominique Pastor, Michael G. Rabbat

    Abstract: Many tools from the field of graph signal processing exploit knowledge of the underlying graph's structure (e.g., as encoded in the Laplacian matrix) to process signals on the graph. Therefore, in the case when no graph is available, graph signal processing tools cannot be used anymore. Researchers have proposed approaches to infer a graph topology from observations of signals on its nodes. Since… ▽ More

    Submitted 6 June, 2017; v1 submitted 9 May, 2016; originally announced May 2016.

    Comments: Submitted to IEEE Transactions on Signal and Information Processing over Networks