Skip to main content

Showing 1–24 of 24 results for author: Sablayrolles, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.04088  [pdf, other

    cs.LG cs.CL

    Mixtral of Experts

    Authors: Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix , et al. (1 additional authors not shown)

    Abstract: We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected e… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: See more details at https://mistral.ai/news/mixtral-of-experts/

  2. arXiv:2310.06825  [pdf, other

    cs.CL cs.AI cs.LG

    Mistral 7B

    Authors: Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed

    Abstract: We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences o… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Models and code are available at https://mistral.ai/news/announcing-mistral-7b/

  3. arXiv:2306.04803  [pdf, other

    cs.LG cs.CL cs.CR

    Privately generating tabular data using language models

    Authors: Alexandre Sablayrolles, Yue Wang, Brian Karrer

    Abstract: Privately generating synthetic data from a table is an important brick of a privacy-first world. We propose and investigate a simple approach of treating each row in a table as a sentence and training a language model with differential privacy. We show this approach obtains competitive results in modelling tabular data across multiple datasets, even at small scales that favor alternative methods b… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: 9 pages, 3 figures

  4. arXiv:2305.12997  [pdf, other

    cs.LG cs.AI cs.CR

    Evaluating Privacy Leakage in Split Learning

    Authors: Xinchi Qiu, Ilias Leontiadis, Luca Melis, Alex Sablayrolles, Pierre Stock

    Abstract: Privacy-Preserving machine learning (PPML) can help us train and deploy models that utilize private information. In particular, on-device machine learning allows us to avoid sharing raw data with a third-party server during inference. On-device models are typically less accurate when compared to their server counterparts due to the fact that (1) they typically only rely on a small set of on-device… ▽ More

    Submitted 19 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 10 pages

  5. arXiv:2210.13662  [pdf, other

    cs.LG cs.CR cs.IT

    Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis Testing: A Lesson From Fano

    Authors: Chuan Guo, Alexandre Sablayrolles, Maziar Sanjabi

    Abstract: Differential privacy (DP) is by far the most widely accepted framework for mitigating privacy risks in machine learning. However, exactly how small the privacy parameter $ε$ needs to be to protect against certain privacy risks in practice is still not well-understood. In this work, we study data reconstruction attacks for discrete data and analyze it under the framework of multiple hypothesis test… ▽ More

    Submitted 9 August, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

  6. arXiv:2210.03403  [pdf, other

    cs.LG cs.CR stat.ML

    TAN Without a Burn: Scaling Laws of DP-SGD

    Authors: Tom Sander, Pierre Stock, Alexandre Sablayrolles

    Abstract: Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently, in particular with the use of massive batches and aggregated data augmentations for a large number of training steps. These techniques require much more computing resources than their non-private counterparts, shifting the traditional privacy-accuracy trade-off to a privacy-accuracy-compute trade-off… ▽ More

    Submitted 24 May, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

  7. arXiv:2210.02912  [pdf, other

    cs.LG cs.CR

    CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning

    Authors: Samuel Maddock, Alexandre Sablayrolles, Pierre Stock

    Abstract: Federated Learning (FL) is a setting for training machine learning models in distributed environments where the clients do not share their raw data but instead send model updates to a server. However, model updates can be subject to attacks and leak private information. Differential Privacy (DP) is a leading mitigation strategy which involves adding noise to clipped model updates, trading off perf… ▽ More

    Submitted 1 March, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: Accepted to ICLR 2023

  8. arXiv:2204.06106  [pdf, other

    cs.CR cs.LG

    Optimal Membership Inference Bounds for Adaptive Composition of Sampled Gaussian Mechanisms

    Authors: Saeed Mahloujifar, Alexandre Sablayrolles, Graham Cormode, Somesh Jha

    Abstract: Given a trained model and a data sample, membership-inference (MI) attacks predict whether the sample was in the model's training set. A common countermeasure against MI attacks is to utilize differential privacy (DP) during model training to mask the presence of individual examples. While this use of DP is a principled approach to limit the efficacy of MI attacks, there is a gap between the bound… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

  9. arXiv:2202.07623  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Defending against Reconstruction Attacks with Rényi Differential Privacy

    Authors: Pierre Stock, Igor Shilov, Ilya Mironov, Alexandre Sablayrolles

    Abstract: Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model. It has been recently shown that simple heuristics can reconstruct data samples from language models, making this threat scenario an important aspect of model release. Differential privacy is a known solution to such attacks, but is often used with a relatively large privac… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

  10. arXiv:2112.09581  [pdf, other

    cs.CV cs.LG

    Watermarking Images in Self-Supervised Latent Spaces

    Authors: Pierre Fernandez, Alexandre Sablayrolles, Teddy Furon, Hervé Jégou, Matthijs Douze

    Abstract: We revisit watermarking techniques based on pre-trained deep networks, in the light of self-supervised approaches. We present a way to embed both marks and binary messages into their latent spaces, leveraging data augmentation at marking time. Our method can operate at any resolution and creates watermarks robust to a broad range of transformations (rotations, crops, JPEG, contrast, etc). It signi… ▽ More

    Submitted 23 March, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

  11. arXiv:2112.09568  [pdf, other

    cs.CV cs.LG

    Nearest neighbor search with compact codes: A decoder perspective

    Authors: Kenza Amara, Matthijs Douze, Alexandre Sablayrolles, Hervé Jégou

    Abstract: Modern approaches for fast retrieval of similar vectors on billion-scaled datasets rely on compressed-domain approaches such as binary sketches or product quantization. These methods minimize a certain loss, typically the mean squared error or other objective functions tailored to the retrieval problem. In this paper, we re-interpret popular methods such as binary hashing or product quantizers as… ▽ More

    Submitted 21 February, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

  12. arXiv:2111.08440  [pdf, other

    cs.CR cs.LG

    On the Importance of Difficulty Calibration in Membership Inference Attacks

    Authors: Lauren Watson, Chuan Guo, Graham Cormode, Alex Sablayrolles

    Abstract: The vulnerability of machine learning models to membership inference attacks has received much attention in recent years. However, existing attacks mostly remain impractical due to having high false positive rates, where non-member samples are often erroneously predicted as members. This type of error makes the predicted membership signal unreliable, especially since most samples are non-members i… ▽ More

    Submitted 11 April, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 16 pages

  13. arXiv:2109.12298  [pdf, other

    cs.LG cs.CR

    Opacus: User-Friendly Differential Privacy Library in PyTorch

    Authors: Ashkan Yousefpour, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen, Sayan Ghosh, Akash Bharadwaj, Jessica Zhao, Graham Cormode, Ilya Mironov

    Abstract: We introduce Opacus, a free, open-source PyTorch library for training deep learning models with differential privacy (hosted at opacus.ai). Opacus is designed for simplicity, flexibility, and speed. It provides a simple and user-friendly API, and enables machine learning practitioners to make a training pipeline private by adding as little as two lines to their code. It supports a wide variety of… ▽ More

    Submitted 22 August, 2022; v1 submitted 25 September, 2021; originally announced September 2021.

    Comments: Privacy in Machine Learning (PriML) workshop, NeurIPS 2021

  14. arXiv:2104.13733  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Gradient-based Adversarial Attacks against Text Transformers

    Authors: Chuan Guo, Alexandre Sablayrolles, Hervé Jégou, Douwe Kiela

    Abstract: We propose the first general-purpose gradient-based attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix, hence enabling gradient-based optimization. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of nat… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  15. arXiv:2103.17239  [pdf, other

    cs.CV

    Going deeper with Image Transformers

    Authors: Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, Hervé Jégou

    Abstract: Transformers have been recently adapted for large scale image classification, achieving high scores shaking up the long supremacy of convolutional neural networks. However the optimization of image transformers has been little studied so far. In this work, we build and optimize deeper transformer networks for image classification. In particular, we investigate the interplay of architecture and opt… ▽ More

    Submitted 7 April, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

  16. arXiv:2012.12877  [pdf, other

    cs.CV

    Training data-efficient image transformers & distillation through attention

    Authors: Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou

    Abstract: Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. However, these visual transformers are pre-trained with hundreds of millions of images using an expensive infrastructure, thereby limiting their adoption. In this work, we produce a competitive convolution-free transformer by training on Imagenet only. We train them o… ▽ More

    Submitted 15 January, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

  17. arXiv:2011.12982  [pdf, other

    cs.CV

    Grafit: Learning fine-grained image representations with coarse labels

    Authors: Hugo Touvron, Alexandre Sablayrolles, Matthijs Douze, Matthieu Cord, Hervé Jégou

    Abstract: This paper tackles the problem of learning a finer representation than the one provided by training labels. This enables fine-grained category retrieval of images in a collection annotated with coarse labels only. Our network is learned with a nearest-neighbor classifier objective, and an instance loss inspired by self-supervised learning. By jointly leveraging the coarse labels and the underlyi… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  18. arXiv:2002.00937  [pdf, other

    stat.ML cs.CR cs.CV cs.LG

    Radioactive data: tracing through training

    Authors: Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou

    Abstract: We want to detect whether a particular image dataset has been used to train a model. We propose a new technique, \emph{radioactive data}, that makes imperceptible changes to this dataset such that any model trained on it will bear an identifiable mark. The mark is robust to strong variations such as different architectures or optimization methods. Given a trained model, our technique detects the u… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

  19. arXiv:1908.11229  [pdf, other

    stat.ML cs.CR cs.LG

    White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

    Authors: Alexandre Sablayrolles, Matthijs Douze, Yann Ollivier, Cordelia Schmid, Hervé Jégou

    Abstract: Membership inference determines, given a sample and trained parameters of a machine learning model, whether the sample was part of the training set. In this paper, we derive the optimal strategy for membership inference with a few assumptions on the distribution of the parameters. We show that optimal attacks only depend on the loss function, and thus black-box attacks are as good as white-box att… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

  20. arXiv:1907.05242  [pdf, other

    cs.CL cs.LG

    Large Memory Layers with Product Keys

    Authors: Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

    Abstract: This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase th… ▽ More

    Submitted 15 December, 2019; v1 submitted 10 July, 2019; originally announced July 2019.

    Comments: Advances in Neural Information Processing Systems, 2019

  21. arXiv:1809.06396  [pdf, other

    cs.CV

    Déjà Vu: an empirical evaluation of the memorization properties of ConvNets

    Authors: Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou

    Abstract: Convolutional neural networks memorize part of their training data, which is why strategies such as data augmentation and drop-out are employed to mitigate overfitting. This paper considers the related question of "membership inference", where the goal is to determine if an image was used during training. We consider it under three complementary angles. We show how to detect which dataset was used… ▽ More

    Submitted 17 September, 2018; originally announced September 2018.

  22. arXiv:1806.03198  [pdf, other

    stat.ML cs.LG

    Spreading vectors for similarity search

    Authors: Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou

    Abstract: Discretizing multi-dimensional data distributions is a fundamental step of modern indexing methods. State-of-the-art techniques learn parameters of quantizers on training data for optimal performance, thus adapting quantizers to the data. In this work, we propose to reverse this paradigm and adapt the data to the quantizer: we train a neural net which last layer forms a fixed parameter-free quanti… ▽ More

    Submitted 30 August, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

    Comments: Published at ICLR 2019

  23. arXiv:1804.09996  [pdf, other

    cs.CV cs.DB cs.DS cs.IR

    Link and code: Fast indexing with graphs and compact regression codes

    Authors: Matthijs Douze, Alexandre Sablayrolles, Hervé Jégou

    Abstract: Similarity search approaches based on graph walks have recently attained outstanding speed-accuracy trade-offs, taking aside the memory requirements. In this paper, we revisit these approaches by considering, additionally, the memory constraint required to index billions of images on a single server. This leads us to propose a method based both on graph traversal and compact representations. We en… ▽ More

    Submitted 27 April, 2018; v1 submitted 26 April, 2018; originally announced April 2018.

  24. arXiv:1609.06753  [pdf, other

    cs.CV

    How should we evaluate supervised hashing?

    Authors: Alexandre Sablayrolles, Matthijs Douze, Hervé Jégou, Nicolas Usunier

    Abstract: Hashing produces compact representations for documents, to perform tasks like classification or retrieval based on these short codes. When hashing is supervised, the codes are trained using labels on the training data. This paper first shows that the evaluation protocols used in the literature for supervised hashing are not satisfactory: we show that a trivial solution that encodes the output of a… ▽ More

    Submitted 10 August, 2017; v1 submitted 21 September, 2016; originally announced September 2016.