Skip to main content

Showing 1–8 of 8 results for author: Marfoq, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.12034  [pdf, ps, other

    cs.DS cs.PF

    Count-Min Sketch with Conservative Updates: Worst-Case Analysis

    Authors: Younes Ben Mazziane, Othmane Marfoq

    Abstract: Count-Min Sketch with Conservative Updates (CMS-CU) is a memory-efficient hash-based data structure used to estimate the occurrences of items within a data stream. CMS-CU stores $m$ counters and employs $d$ hash functions to map items to these counters. We first argue that the estimation error in CMS-CU is maximal when each item appears at most once in the stream. Next, we study CMS-CU in this set… ▽ More

    Submitted 21 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  2. arXiv:2310.12112  [pdf, other

    cs.CR cs.AI cs.LG

    A Cautionary Tale: On the Role of Reference Data in Empirical Privacy Defenses

    Authors: Caelin G. Kaplan, Chuan Xu, Othmane Marfoq, Giovanni Neglia, Anderson Santana de Oliveira

    Abstract: Within the realm of privacy-preserving machine learning, empirical privacy defenses have been proposed as a solution to achieve satisfactory levels of training data privacy without a significant drop in model utility. Most existing defenses against membership inference attacks assume access to reference data, defined as an additional dataset coming from the same (or a similar) underlying distribut… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  3. arXiv:2301.04632  [pdf, other

    cs.LG cs.AI cs.DC

    Federated Learning under Heterogeneous and Correlated Client Availability

    Authors: Angelo Rodio, Francescomaria Faticanti, Othmane Marfoq, Giovanni Neglia, Emilio Leonardi

    Abstract: The enormous amount of data produced by mobile and IoT devices has motivated the development of federated learning (FL), a framework allowing such devices (or clients) to collaboratively train machine learning models without sharing their local data. FL algorithms (like FedAvg) iteratively aggregate model updates computed by clients on their own datasets. Clients may exhibit different levels of pa… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

    Comments: 11 pages, accepted as conference paper at IEEE INFOCOM 2023

  4. arXiv:2301.01542  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Federated Learning for Data Streams

    Authors: Othmane Marfoq, Giovanni Neglia, Laetitia Kameni, Richard Vidal

    Abstract: Federated learning (FL) is an effective solution to train machine learning models on the increasing amount of data generated by IoT devices and smartphones while kee** such data localized. Most previous work on federated learning assumes that clients operate on static datasets collected before training starts. This approach may be inefficient because 1) it ignores new samples clients collect dur… ▽ More

    Submitted 4 January, 2023; originally announced January 2023.

    Comments: 34 pages

  5. arXiv:2210.04620  [pdf, other

    cs.LG cs.CV

    FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings

    Authors: Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix Grimberg, Chaoyang He, Regis Loeb, Paul Mangold, Tanguy Marchand, Othmane Marfoq, Erum Mushtaq, Boris Muzellec, Constantin Philippenko, Santiago Silva, Maria Teleńczuk, Shadi Albarqouni, Salman Avestimehr, Aurélien Bellet, Aymeric Dieuleveut, Martin Jaggi, Sai Praneeth Karimireddy, Marco Lorenzi, Giovanni Neglia, Marc Tommasi, Mathieu Andreux

    Abstract: Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works hav… ▽ More

    Submitted 5 May, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS, Datasets and Benchmarks Track, this version fixes typos in the datasets' table and the appendix

  6. arXiv:2111.09360  [pdf, other

    cs.LG stat.ML

    Personalized Federated Learning through Local Memorization

    Authors: Othmane Marfoq, Giovanni Neglia, Laetitia Kameni, Richard Vidal

    Abstract: Federated learning allows clients to collaboratively learn statistical models while kee** their data local. Federated learning was originally used to train a unique global model to be served to all clients, but this approach might be sub-optimal when clients' local data distributions are heterogeneous. In order to tackle this limitation, recent personalized federated learning methods train a sep… ▽ More

    Submitted 17 June, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: 23 pages, ICML 2022

  7. arXiv:2108.10252  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Federated Multi-Task Learning under a Mixture of Distributions

    Authors: Othmane Marfoq, Giovanni Neglia, Aurélien Bellet, Laetitia Kameni, Richard Vidal

    Abstract: The increasing size of data generated by smartphones and IoT devices motivated the development of Federated Learning (FL), a framework for on-device collaborative training of machine learning models. First efforts in FL focused on learning a single global model with good average performance across clients, but the global model may be arbitrarily bad for a given client, due to the inherent heteroge… ▽ More

    Submitted 7 November, 2022; v1 submitted 23 August, 2021; originally announced August 2021.

    Comments: 77 pages, NeurIPS 2021

  8. arXiv:2010.12229  [pdf, other

    cs.LG cs.DC cs.NI math.OC

    Throughput-Optimal Topology Design for Cross-Silo Federated Learning

    Authors: Othmane Marfoq, Chuan Xu, Giovanni Neglia, Richard Vidal

    Abstract: Federated learning usually employs a client-server architecture where an orchestrator iteratively aggregates model updates from remote clients and pushes them back a refined model. This approach may be inefficient in cross-silo settings, as close-by data silos with high-speed access links may exchange information faster than with the orchestrator, and the orchestrator may become a communication bo… ▽ More

    Submitted 17 November, 2020; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: 41 pages, NeurIPS 2020