Skip to main content

Showing 1–11 of 11 results for author: Vogels, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.13033  [pdf, other

    cs.NE cs.AI cs.IT cs.LG

    LASER: Linear Compression in Wireless Distributed Optimization

    Authors: Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael C. Gastpar

    Abstract: Data-parallel SGD is the de facto algorithm for distributed optimization, especially for large scale machine learning. Despite its merits, communication bottleneck is one of its persistent issues. Most compression schemes to alleviate this either assume noiseless communication links, or fail to achieve good performance on practical tasks. In this paper, we close this gap and introduce LASER: LineA… ▽ More

    Submitted 6 February, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

  2. arXiv:2309.14118  [pdf, other

    cs.LG

    MultiModN- Multimodal, Multi-Task, Interpretable Modular Networks

    Authors: Vinitra Swamy, Malika Satayeva, Jibril Frej, Thierry Bossy, Thijs Vogels, Martin Jaggi, Tanja Käser, Mary-Anne Hartley

    Abstract: Predicting multiple real-world tasks in a single model often requires a particularly diverse feature space. Multimodal (MM) models aim to extract the synergistic predictive potential of multiple data types to create a shared feature space with aligned semantic meaning across inputs of drastically varying sizes (i.e. images, text, sound). Most current MM architectures fuse these representations in… ▽ More

    Submitted 6 November, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted as a full paper at NeurIPS 2023 in New Orleans, USA

  3. arXiv:2301.02151  [pdf, other

    cs.LG cs.DC math.OC

    Beyond spectral gap (extended): The role of the topology in decentralized learning

    Authors: Thijs Vogels, Hadrien Hendrikx, Martin Jaggi

    Abstract: In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster. In the decentralized setting, in which workers communicate over a sparse graph, current theory fails to capture important aspects of real-world behavior. First, the `spectral gap' of the communica… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

    Comments: Extended version of the other paper (with the same name), that includes (among other things) theory for the heterogeneous case. arXiv admin note: substantial text overlap with arXiv:2206.03093

  4. arXiv:2211.06637  [pdf, other

    cs.LG

    Modular Clinical Decision Support Networks (MoDN) -- Updatable, Interpretable, and Portable Predictions for Evolving Clinical Environments

    Authors: Cécile Trottet, Thijs Vogels, Martin Jaggi, Mary-Anne Hartley

    Abstract: Data-driven Clinical Decision Support Systems (CDSS) have the potential to improve and standardise care with personalised probabilistic guidance. However, the size of data required necessitates collaborative learning from analogous CDSS's, which are often unsharable or imperfectly interoperable (IIO), meaning their feature sets are not perfectly overlap**. We propose Modular Clinical Decision Su… ▽ More

    Submitted 12 November, 2022; originally announced November 2022.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 9 pages

  5. arXiv:2206.03093  [pdf, other

    cs.LG math.OC stat.ML

    Beyond spectral gap: The role of the topology in decentralized learning

    Authors: Thijs Vogels, Hadrien Hendrikx, Martin Jaggi

    Abstract: In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster. We consider the setting in which all workers sample from the same dataset, and communicate over a sparse graph (decentralized). In this setting, current theory fails to capture important aspects o… ▽ More

    Submitted 8 November, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  6. arXiv:2110.04175  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    RelaySum for Decentralized Deep Learning on Heterogeneous Data

    Authors: Thijs Vogels, Lie He, Anastasia Koloskova, Tao Lin, Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

    Abstract: In decentralized machine learning, workers compute model updates on their local data. Because the workers only communicate with few neighbors without central coordination, these updates propagate progressively over the network. This paradigm enables distributed training on networks without all-to-all connectivity, hel** to protect data privacy as well as to reduce the communication cost of distr… ▽ More

    Submitted 31 January, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Presented at NeurIPS 2021

    Journal ref: Advances in Neural Information Processing Systems 34, 2021

  7. arXiv:2008.01425  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

    Authors: Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi

    Abstract: Lossy gradient compression has become a practical tool to overcome the communication bottleneck in centrally coordinated distributed training of machine learning models. However, algorithms for decentralized training with compressed communication over arbitrary connected networks have been more complicated, requiring additional memory and hyperparameters. We introduce a simple algorithm that direc… ▽ More

    Submitted 19 October, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: To appear in NeurIPS 2020

  8. arXiv:1910.11758  [pdf, other

    cs.LG stat.ML

    Optimizer Benchmarking Needs to Account for Hyperparameter Tuning

    Authors: Prabhu Teja Sivaprasad, Florian Mai, Thijs Vogels, Martin Jaggi, François Fleuret

    Abstract: The performance of optimizers, particularly in deep learning, depends considerably on their chosen hyperparameter configuration. The efficacy of optimizers is often studied under near-optimal problem-specific hyperparameters, and finding these settings may be prohibitively costly for practitioners. In this work, we argue that a fair assessment of optimizers' performance must take the computational… ▽ More

    Submitted 15 August, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: published at International Conference on Machine Learning (ICML 2020)

  9. arXiv:1905.13727  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

    Authors: Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi

    Abstract: We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well or fail to achieve the target test accuracy. We propose a new low-rank gradient compressor based on power iteration that can i) compress gradients rapidly, ii) efficiently aggregat… ▽ More

    Submitted 18 February, 2020; v1 submitted 31 May, 2019; originally announced May 2019.

    Comments: Presented at NeurIPS 2019

    ACM Class: I.2.6; I.5.1

    Journal ref: NeurIPS 2019

  10. arXiv:1801.02607  [pdf, other

    cs.IR

    Web2Text: Deep Structured Boilerplate Removal

    Authors: Thijs Vogels, Octavian-Eugen Ganea, Carsten Eickhoff

    Abstract: Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is essential for the performance of derived applications. To address this issue, we introduce a novel model that performs sequence labeling to collectively classify all text blocks in an HTML page as either boilerplate or main content… ▽ More

    Submitted 27 March, 2018; v1 submitted 8 January, 2018; originally announced January 2018.

    Comments: To appear in ECIR 2018

  11. arXiv:1711.02448  [pdf, other

    q-bio.NC cs.NE stat.ML

    Cortical microcircuits as gated-recurrent neural networks

    Authors: Rui Ponte Costa, Yannis M. Assael, Brendan Shillingford, Nando de Freitas, Tim P. Vogels

    Abstract: Cortical circuits exhibit intricate recurrent architectures that are remarkably similar across different brain areas. Such stereotyped structure suggests the existence of common computational principles. However, such principles have remained largely elusive. Inspired by gated-memory networks, namely long short-term memory networks (LSTMs), we introduce a recurrent neural network in which informat… ▽ More

    Submitted 3 January, 2018; v1 submitted 7 November, 2017; originally announced November 2017.

    Comments: To appear in Advances in Neural Information Processing Systems 30 (NIPS 2017). 13 pages, 2 figures (and 1 supp. figure)