Skip to main content

Showing 1–10 of 10 results for author: van Dalen, R

.
  1. arXiv:2405.06368  [pdf, other

    cs.LG cs.CR cs.DC

    DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation

    Authors: Jie Xu, Karthikeyan Saravanan, Rogier van Dalen, Haaris Mehmood, David Tuckey, Mete Ozay

    Abstract: Federated learning (FL) allows clients in an Internet of Things (IoT) system to collaboratively train a global model without sharing their local data with a server. However, clients' contributions to the server can still leak sensitive information. Differential privacy (DP) addresses such leakage by providing formal privacy guarantees, with mechanisms that add randomness to the clients' contributi… ▽ More

    Submitted 28 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: 16 pages, 10 figures, 5 tables

  2. arXiv:2404.06430  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    pfl-research: simulation framework for accelerating research in Private Federated Learning

    Authors: Filip Granqvist, Congzheng Song, Áine Cahill, Rogier van Dalen, Martin Pelikan, Yi Sheng Chan, Xiaojun Feng, Natarajan Krishnaswami, Vojta **a, Mona Chitnis

    Abstract: Federated learning (FL) is an emerging machine learning (ML) training paradigm where clients own their data and collaborate to train a global model, without revealing any data to the server and other participants. Researchers commonly perform experiments in a simulation environment to quickly iterate on ideas. However, existing open-source tools do not offer the efficiency required to simulate FL… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  3. arXiv:2307.10975  [pdf, other

    eess.AS cs.LG cs.SD

    Globally Normalising the Transducer for Streaming Speech Recognition

    Authors: Rogier van Dalen

    Abstract: The Transducer (e.g. RNN-Transducer or Conformer-Transducer) generates an output label sequence as it traverses the input sequence. It is straightforward to use in streaming mode, where it generates partial hypotheses before the complete input has been seen. This makes it popular in speech recognition. However, in streaming mode the Transducer has a mathematical flaw which, simply put, restricts t… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 9 pages plus references and appendices

    MSC Class: 68T10

  4. arXiv:2307.07421  [pdf, other

    cs.CL cs.SD eess.AS

    SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding

    Authors: Titouan Parcollet, Rogier van Dalen, Shucong Zhang, Sourav Bhattacharya

    Abstract: Modern speech processing systems rely on self-attention. Unfortunately, token mixing with self-attention takes quadratic time in the length of the speech utterance, slowing down inference as well as training and increasing memory consumption. Cheaper alternatives to self-attention for ASR have been developed, but they fail to consistently reach the same level of accuracy. This paper, therefore, pr… ▽ More

    Submitted 17 January, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

  5. arXiv:2207.08988  [pdf, other

    cs.LG cs.CL cs.CR

    Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices

    Authors: Mingbin Xu, Congzheng Song, Ye Tian, Neha Agrawal, Filip Granqvist, Rogier van Dalen, Xiao Zhang, Arturo Argueta, Shiyi Han, Yaqiao Deng, Leo Liu, Anmol Walia, Alex **

    Abstract: Federated Learning (FL) is a technique to train models using data distributed across devices. Differential Privacy (DP) provides a formal privacy guarantee for sensitive data. Our goal is to train a large neural network language model (NNLM) on compute-constrained devices while preserving privacy using FL and DP. However, the DP-noise introduced to the model increases as the model size grows, whic… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  6. arXiv:2203.09943  [pdf, other

    cs.CR cs.CL cs.LG

    Training a Tokenizer for Free with Private Federated Learning

    Authors: Eugene Bagdasaryan, Congzheng Song, Rogier van Dalen, Matt Seigel, Áine Cahill

    Abstract: Federated learning with differential privacy, i.e. private federated learning (PFL), makes it possible to train models on private data distributed across users' devices without harming privacy. PFL is efficient for models, such as neural networks, that have a fixed number of parameters, and thus a fixed-dimensional gradient vector. Such models include neural-net language models, but not tokenizers… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  7. arXiv:2109.08604  [pdf, other

    cs.LG stat.ML

    Enforcing fairness in private federated learning via the modified method of differential multipliers

    Authors: Borja Rodríguez-Gálvez, Filip Granqvist, Rogier van Dalen, Matt Seigel

    Abstract: Federated learning with differential privacy, or private federated learning, provides a strategy to train machine learning models while respecting users' privacy. However, differential privacy can disproportionately degrade the performance of the models on under-represented groups, as these parts of the distribution are difficult to learn in the presence of noise. Existing approaches for enforcing… ▽ More

    Submitted 15 April, 2022; v1 submitted 17 September, 2021; originally announced September 2021.

    Comments: Presented at PriML workshop at NeurIPS 2021. 20 pages: 11 of main content, 3 of references, and 6 of supplementary material

  8. arXiv:2107.05396  [pdf, other

    cs.SE

    Data-Driven Extract Method Recommendations: A Study at ING

    Authors: David van der Leij, Jasper Binda, Robbert van Dalen, Pieter Vallen, Ya** Luo, Maurício Aniche

    Abstract: The sound identification of refactoring opportunities is still an open problem in software engineering. Recent studies have shown the effectiveness of machine learning models in recommending methods that should undergo different refactoring operations. In this work, we experiment with such approaches to identify methods that should undergo an Extract Method refactoring, in the context of ING, a la… ▽ More

    Submitted 22 July, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

  9. arXiv:2102.08503  [pdf, other

    cs.LG

    Federated Evaluation and Tuning for On-Device Personalization: System Design & Applications

    Authors: Matthias Paulik, Matt Seigel, Henry Mason, Dominic Telaar, Joris Kluivers, Rogier van Dalen, Chi Wai Lau, Luke Carlson, Filip Granqvist, Chris Vandevelde, Sudeep Agarwal, Julien Freudiger, Andrew Byde, Abhishek Bhowmick, Gaurav Kapoor, Si Beaumont, Áine Cahill, Dominic Hughes, Omid Javidbakht, Fei Dong, Rehan Rishi, Stanley Hung

    Abstract: We describe the design of our federated task processing system. Originally, the system was created to support two specific federated tasks: evaluation and tuning of on-device ML systems, primarily for the purpose of personalizing these systems. In recent years, support for an additional federated task has been added: federated learning (FL) of deep neural networks. To our knowledge, only one other… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: 11 pages, 1 figure

  10. arXiv:2008.02651  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Improving on-device speaker verification using federated learning with privacy

    Authors: Filip Granqvist, Matt Seigel, Rogier van Dalen, Áine Cahill, Stephen Shum, Matthias Paulik

    Abstract: Information on speaker characteristics can be useful as side information in improving speaker recognition accuracy. However, such information is often private. This paper investigates how privacy-preserving learning can improve a speaker verification system, by enabling the use of privacy-sensitive speaker data to train an auxiliary classification model that predicts vocal characteristics of speak… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: To appear in proceedings of INTERSPEECH 2020