Skip to main content

Showing 1–17 of 17 results for author: Stock, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.04088  [pdf, other

    cs.LG cs.CL

    Mixtral of Experts

    Authors: Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix , et al. (1 additional authors not shown)

    Abstract: We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected e… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: See more details at https://mistral.ai/news/mixtral-of-experts/

  2. arXiv:2310.06825  [pdf, other

    cs.CL cs.AI cs.LG

    Mistral 7B

    Authors: Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed

    Abstract: We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences o… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Models and code are available at https://mistral.ai/news/announcing-mistral-7b/

  3. arXiv:2305.17888  [pdf, other

    cs.CL

    LLM-QAT: Data-Free Quantization Aware Training for Large Language Models

    Authors: Zechun Liu, Barlas Oguz, Changsheng Zhao, Ernie Chang, Pierre Stock, Yashar Mehdad, Yangyang Shi, Raghuraman Krishnamoorthi, Vikas Chandra

    Abstract: Several post-training quantization methods have been applied to large language models (LLMs), and have been shown to perform well down to 8-bits. We find that these methods break down at lower bit precision, and investigate quantization aware training for LLMs (LLM-QAT) to push quantization levels even further. We propose a data-free distillation method that leverages generations produced by the p… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  4. arXiv:2305.12997  [pdf, other

    cs.LG cs.AI cs.CR

    Evaluating Privacy Leakage in Split Learning

    Authors: Xinchi Qiu, Ilias Leontiadis, Luca Melis, Alex Sablayrolles, Pierre Stock

    Abstract: Privacy-Preserving machine learning (PPML) can help us train and deploy models that utilize private information. In particular, on-device machine learning allows us to avoid sharing raw data with a third-party server during inference. On-device models are typically less accurate when compared to their server counterparts due to the fact that (1) they typically only rely on a small set of on-device… ▽ More

    Submitted 19 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 10 pages

  5. arXiv:2303.14604  [pdf, other

    cs.LG

    Green Federated Learning

    Authors: Ashkan Yousefpour, Shen Guo, Ashish Shenoy, Sayan Ghosh, Pierre Stock, Kiwan Maeng, Schalk-Willem Krüger, Michael Rabbat, Carole-Jean Wu, Ilya Mironov

    Abstract: The rapid progress of AI is fueled by increasingly large and computationally intensive machine learning models and datasets. As a consequence, the amount of compute used in training state-of-the-art models is exponentially increasing (doubling every 10 months between 2015 and 2022), resulting in a large carbon footprint. Federated Learning (FL) - a collaborative machine learning technique for trai… ▽ More

    Submitted 1 August, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

  6. arXiv:2211.03942  [pdf, other

    cs.LG cs.CR

    Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design

    Authors: Chuan Guo, Kamalika Chaudhuri, Pierre Stock, Mike Rabbat

    Abstract: In private federated learning (FL), a server aggregates differentially private updates from a large number of clients in order to train a machine learning model. The main challenge in this setting is balancing privacy with both classification accuracy of the learnt model as well as the number of bits communicated between the clients and server. Prior work has achieved a good trade-off by designing… ▽ More

    Submitted 9 August, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

  7. arXiv:2210.03403  [pdf, other

    cs.LG cs.CR stat.ML

    TAN Without a Burn: Scaling Laws of DP-SGD

    Authors: Tom Sander, Pierre Stock, Alexandre Sablayrolles

    Abstract: Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently, in particular with the use of massive batches and aggregated data augmentations for a large number of training steps. These techniques require much more computing resources than their non-private counterparts, shifting the traditional privacy-accuracy trade-off to a privacy-accuracy-compute trade-off… ▽ More

    Submitted 24 May, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

  8. arXiv:2210.02912  [pdf, other

    cs.LG cs.CR

    CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning

    Authors: Samuel Maddock, Alexandre Sablayrolles, Pierre Stock

    Abstract: Federated Learning (FL) is a setting for training machine learning models in distributed environments where the clients do not share their raw data but instead send model updates to a server. However, model updates can be subject to attacks and leak private information. Differential Privacy (DP) is a leading mitigation strategy which involves adding noise to clipped model updates, trading off perf… ▽ More

    Submitted 1 March, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: Accepted to ICLR 2023

  9. arXiv:2207.12779  [pdf, other

    cs.LG cs.AI cs.DC

    Reconciling Security and Communication Efficiency in Federated Learning

    Authors: Karthik Prasad, Sayan Ghosh, Graham Cormode, Ilya Mironov, Ashkan Yousefpour, Pierre Stock

    Abstract: Cross-device Federated Learning is an increasingly popular machine learning setting to train a model by leveraging a large population of client devices with high privacy and security guarantees. However, communication efficiency remains a major bottleneck when scaling federated learning to production environments, particularly due to bandwidth constraints during uplink communication. In this paper… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

  10. arXiv:2202.07623  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Defending against Reconstruction Attacks with Rényi Differential Privacy

    Authors: Pierre Stock, Igor Shilov, Ilya Mironov, Alexandre Sablayrolles

    Abstract: Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model. It has been recently shown that simple heuristics can reconstruct data samples from language models, making this threat scenario an important aspect of model release. Differential privacy is a known solution to such attacks, but is often used with a relatively large privac… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

  11. An Embedding of ReLU Networks and an Analysis of their Identifiability

    Authors: Pierre Stock, Rémi Gribonval

    Abstract: Neural networks with the Rectified Linear Unit (ReLU) nonlinearity are described by a vector of parameters $θ$, and realized as a piecewise linear continuous function $R_θ: x \in \mathbb R^{d} \mapsto R_θ(x) \in \mathbb R^{k}$. Natural scalings and permutations operations on the parameters $θ$ leave the realization unchanged, leading to equivalence classes of parameters that yield the same realiza… ▽ More

    Submitted 7 June, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

    Comments: Constructive Approximation camera-ready

  12. arXiv:2104.01136  [pdf, other

    cs.CV

    LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

    Authors: Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze

    Abstract: We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime. Our work exploits recent findings in attention-based architectures, which are competitive on highly parallel processing hardware. We revisit principles from the extensive literature on convolutional neural networks to apply them to transformers, in particular… ▽ More

    Submitted 6 May, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  13. arXiv:2012.00328  [pdf, other

    cs.CV cs.LG

    Low Bandwidth Video-Chat Compression using Deep Generative Models

    Authors: Maxime Oquab, Pierre Stock, Oran Gafni, Daniel Haziza, Tao Xu, Peizhao Zhang, Onur Celebi, Yana Hasson, Patrick Labatut, Bobo Bose-Kolanu, Thibault Peyronel, Camille Couprie

    Abstract: To unlock video chat for hundreds of millions of people hindered by poor connectivity or unaffordable data costs, we propose to authentically reconstruct faces on the receiver's device using facial landmarks extracted at the sender's side and transmitted over the network. In this context, we discuss and evaluate the benefits and disadvantages of several deep adversarial approaches. In particular,… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: 11 pages

  14. arXiv:2004.07320  [pdf, other

    cs.LG stat.ML

    Training with Quantization Noise for Extreme Model Compression

    Authors: Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi Gribonval, Herve Jegou, Armand Joulin

    Abstract: We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression… ▽ More

    Submitted 28 February, 2021; v1 submitted 15 April, 2020; originally announced April 2020.

  15. arXiv:1907.05686  [pdf, other

    cs.CV

    And the Bit Goes Down: Revisiting the Quantization of Neural Networks

    Authors: Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou

    Abstract: In this paper, we address the problem of reducing the memory footprint of convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs rather than its weights. The principle of our approach is that it minimizes the loss reconstruction error for in-domain inputs. Our method only requires a set of unla… ▽ More

    Submitted 9 November, 2020; v1 submitted 12 July, 2019; originally announced July 2019.

    Comments: ICLR 2020 camera-ready

  16. arXiv:1902.10416  [pdf, other

    cs.CV cs.LG

    Equi-normalization of Neural Networks

    Authors: Pierre Stock, Benjamin Graham, Rémi Gribonval, Hervé Jégou

    Abstract: Modern neural networks are over-parametrized. In particular, each rectified linear hidden unit can be modified by a multiplicative factor by adjusting input and output weights, without changing the rest of the network. Inspired by the Sinkhorn-Knopp algorithm, we introduce a fast iterative method for minimizing the L2 norm of the weights, equivalently the weight decay regularizer. It provably conv… ▽ More

    Submitted 27 February, 2019; originally announced February 2019.

    Comments: ICLR 2019 camera-ready

  17. arXiv:1711.11443  [pdf, other

    cs.LG cs.AI cs.CV cs.CY stat.ML

    ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases

    Authors: Pierre Stock, Moustapha Cisse

    Abstract: ConvNets and Imagenet have driven the recent success of deep learning for image classification. However, the marked slowdown in performance improvement combined with the lack of robustness of neural networks to adversarial examples and their tendency to exhibit undesirable biases question the reliability of these methods. This work investigates these questions from the perspective of the end-user… ▽ More

    Submitted 20 July, 2018; v1 submitted 30 November, 2017; originally announced November 2017.

    Comments: ECCV 2018 camera-ready