Skip to main content

Showing 1–36 of 36 results for author: Blondel, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.14574  [pdf, other

    stat.ML cs.LG

    Learning with Fitzpatrick Losses

    Authors: Seta Rakotomandimby, Jean-Philippe Chancelier, Michel de Lara, Mathieu Blondel

    Abstract: Fenchel-Young losses are a family of convex loss functions, encompassing the squared, logistic and sparsemax losses, among others. Each Fenchel-Young loss is implicitly associated with a link function, for map** model outputs to predictions. For instance, the logistic loss is associated with the soft argmax link function. Can we build new loss functions associated with the same link function as… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2403.14606  [pdf, other

    cs.LG cs.AI cs.PL

    The Elements of Differentiable Programming

    Authors: Mathieu Blondel, Vincent Roulet

    Abstract: Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization o… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Draft version 1

  3. arXiv:2402.05787  [pdf, other

    stat.ML cs.LG

    How do Transformers perform In-Context Autoregressive Learning?

    Authors: Michael E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré

    Abstract: Transformers have achieved state-of-the-art performance in language modeling tasks. However, the reasons behind their tremendous success are still unclear. In this paper, towards a better understanding, we train a Transformer model on a simple next token prediction task, where sequences are generated as a first-order autoregressive process $s_{t+1} = W s_t$. We show how a trained Transformer predi… ▽ More

    Submitted 5 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 20 pages ICML 2024

  4. arXiv:2402.05468  [pdf, other

    cs.LG

    Implicit Diffusion: Efficient Optimization through Stochastic Sampling

    Authors: Pierre Marion, Anna Korba, Peter Bartlett, Mathieu Blondel, Valentin De Bortoli, Arnaud Doucet, Felipe Llinares-López, Courtney Paquette, Quentin Berthet

    Abstract: We present a new algorithm to optimize distributions defined implicitly by parameterized stochastic diffusions. Doing so allows us to modify the outcome distribution of sampling processes by optimizing over their parameters. We introduce a general framework for first-order optimization of these processes, that performs jointly, in a single loop, optimization and sampling steps. This approach is in… ▽ More

    Submitted 22 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 38 pages, 16 figures. Updated with additional experiments

  5. arXiv:2402.04792  [pdf, other

    cs.AI cs.CL cs.HC

    Direct Language Model Alignment from Online AI Feedback

    Authors: Shangmin Guo, Biao Zhang, Tianlin Liu, Tianqi Liu, Misha Khalman, Felipe Llinares, Alexandre Rame, Thomas Mesnard, Yao Zhao, Bilal Piot, Johan Ferret, Mathieu Blondel

    Abstract: Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF), that do not require a separate reward model. However, the preference datasets used in DAP methods are usually collected ahead of training and never updated, thus the feedback is purely offline. Moreover, responses in these datasets are… ▽ More

    Submitted 29 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 18 pages, 9 figures, 4 tables

  6. arXiv:2402.02992  [pdf, other

    cs.LG cs.AI cs.CL

    Decoding-time Realignment of Language Models

    Authors: Tianlin Liu, Shangmin Guo, Leonardo Bianco, Daniele Calandriello, Quentin Berthet, Felipe Llinares, Jessica Hoffmann, Lucas Dixon, Michal Valko, Mathieu Blondel

    Abstract: Aligning language models with human preferences is crucial for reducing errors and biases in these models. Alignment techniques, such as reinforcement learning from human feedback (RLHF), are typically cast as optimizing a tradeoff between human preference rewards and a proximity regularization term that encourages staying close to the unaligned model. Selecting an appropriate level of regularizat… ▽ More

    Submitted 24 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: In Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

  7. arXiv:2401.15969  [pdf, other

    cs.CV cs.AI cs.LG

    Routers in Vision Mixture of Experts: An Empirical Study

    Authors: Tianlin Liu, Mathieu Blondel, Carlos Riquelme, Joan Puigcerver

    Abstract: Mixture-of-Experts (MoE) models are a promising way to scale up model capacity without significantly increasing computational cost. A key component of MoEs is the router, which decides which subset of parameters (experts) process which feature embeddings (tokens). In this paper, we present a comprehensive study of routers in MoEs for computer vision tasks. We introduce a unified MoE formulation th… ▽ More

    Submitted 18 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  8. arXiv:2308.08886  [pdf, other

    cs.LG math.OC

    Dual Gauss-Newton Directions for Deep Learning

    Authors: Vincent Roulet, Mathieu Blondel

    Abstract: Inspired by Gauss-Newton-like methods, we study the benefit of leveraging the structure of deep learning objectives, namely, the composition of a convex loss function and of a nonlinear network, in order to derive better direction oracles than stochastic gradients, based on the idea of partial linearization. In a departure from previous works, we propose to compute such direction oracles via their… ▽ More

    Submitted 26 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Presented at the Duality Principles for Modern Machine Learning Workshop at ICML 2023

  9. arXiv:2302.01425  [pdf, other

    cs.LG stat.ML

    Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective

    Authors: Michael E. Sander, Joan Puigcerver, Josip Djolonga, Gabriel Peyré, Mathieu Blondel

    Abstract: The top-k operator returns a sparse vector, where the non-zero values correspond to the k largest values of the input. Unfortunately, because it is a discontinuous function, it is difficult to incorporate in neural networks trained end-to-end with backpropagation. Recent works have considered differentiable relaxations, based either on regularization or perturbation techniques. However, to date, n… ▽ More

    Submitted 4 June, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: ICML 2023 18 pages

  10. arXiv:2209.15466  [pdf, other

    stat.ML cs.LG

    Sparsity-Constrained Optimal Transport

    Authors: Tianlin Liu, Joan Puigcerver, Mathieu Blondel

    Abstract: Regularized optimal transport (OT) is now increasingly used as a loss or as a matching layer in neural networks. Entropy-regularized OT can be computed using the Sinkhorn algorithm but it leads to fully-dense transportation plans, meaning that all sources are (fractionally) matched with all targets. To address this issue, several works have investigated quadratic regularization instead. This regul… ▽ More

    Submitted 14 April, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: Camera-ready ICLR 2023

  11. arXiv:2205.09589  [pdf, other

    cs.LG stat.ML

    Learning Energy Networks with Generalized Fenchel-Young Losses

    Authors: Mathieu Blondel, Felipe Llinares-López, Robert Dadashi, Léonard Hussenot, Matthieu Geist

    Abstract: Energy-based models, a.k.a. energy networks, perform inference by optimizing an energy function, typically parametrized by a neural network. This allows one to capture potentially complex relationships between inputs and outputs. To learn the parameters of the energy function, the solution to that optimization problem is typically fed into a loss function. The key challenge for training energy net… ▽ More

    Submitted 12 October, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

  12. arXiv:2202.12328  [pdf, other

    cs.LG

    Cutting Some Slack for SGD with Adaptive Polyak Stepsizes

    Authors: Robert M. Gower, Mathieu Blondel, Nidham Gazagnadou, Fabian Pedregosa

    Abstract: Tuning the step size of stochastic gradient descent is tedious and error prone. This has motivated the development of methods that automatically adapt the step size using readily available information. In this paper, we consider the family of SPS (Stochastic gradient with a Polyak Stepsize) adaptive methods. These are methods that make use of gradient and loss value at the sampled points to adapti… ▽ More

    Submitted 20 May, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

    Comments: 48 pages, 7 figures

    MSC Class: 90C53; 74S60; 90C06; 62L20; 68W20; 15B52; 65Y20; 68W40 ACM Class: G.1.6

  13. arXiv:2110.11773  [pdf, other

    cs.LG stat.ML

    Sinkformers: Transformers with Doubly Stochastic Attention

    Authors: Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

    Abstract: Attention based models such as Transformers involve pairwise interactions between data points, modeled with a learnable attention matrix. Importantly, this attention matrix is normalized with the SoftMax operator, which makes it row-wise stochastic. In this paper, we propose instead to use Sinkhorn's algorithm to make attention matrices doubly stochastic. We call the resulting model a Sinkformer.… ▽ More

    Submitted 24 January, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

    Comments: Accepted at AISTATS

  14. arXiv:2108.01988  [pdf, other

    cs.LG cs.AI stat.ML

    Sparse Continuous Distributions and Fenchel-Young Losses

    Authors: André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae

    Abstract: Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $α$-entmax, and fused… ▽ More

    Submitted 4 August, 2022; v1 submitted 4 August, 2021; originally announced August 2021.

    Comments: JMLR 2022 camera ready version. arXiv admin note: text overlap with arXiv:2006.07214

  15. arXiv:2105.15183  [pdf, other

    cs.LG math.NA stat.ML

    Efficient and Modular Implicit Differentiation

    Authors: Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-López, Fabian Pedregosa, Jean-Philippe Vert

    Abstract: Automatic differentiation (autodiff) has revolutionized machine learning. It allows to express complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization layers, and in bi-level problems suc… ▽ More

    Submitted 12 October, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: V3: added more related work and Jacobian precision figure

  16. arXiv:2105.01637  [pdf, other

    stat.ML cs.LG math.OC

    Implicit differentiation for fast hyperparameter selection in non-smooth convex learning

    Authors: Quentin Bertrand, Quentin Klopfenstein, Mathurin Massias, Mathieu Blondel, Samuel Vaiter, Alexandre Gramfort, Joseph Salmon

    Abstract: Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zero-order techniques. In this work we study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward th… ▽ More

    Submitted 8 August, 2022; v1 submitted 4 May, 2021; originally announced May 2021.

  17. arXiv:2103.09879  [pdf, other

    cs.SD cs.AI eess.AS

    Self-Supervised Learning of Audio Representations from Permutations with Differentiable Ranking

    Authors: Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour

    Abstract: Self-supervised pre-training using so-called "pretext" tasks has recently shown impressive performance across a wide range of modalities. In this work, we advance self-supervised learning from permutations, by pre-training a model to reorder shuffled parts of the spectrogram of an audio signal, to improve downstream classification performance. We make two main contributions. First, we overcome the… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

  18. arXiv:2102.07870  [pdf, other

    cs.LG cs.AI stat.ML

    Momentum Residual Neural Networks

    Authors: Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

    Abstract: The training of deep residual neural networks (ResNets) with backpropagation has a memory cost that increases linearly with respect to the depth of the network. A way to circumvent this issue is to use reversible architectures. In this paper, we propose to change the forward rule of a ResNet by adding a momentum term. The resulting networks, momentum residual neural networks (Momentum ResNets), ar… ▽ More

    Submitted 22 July, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: 24 pages

  19. arXiv:2010.08354  [pdf, other

    cs.LG stat.ML

    Differentiable Divergences Between Time Series

    Authors: Mathieu Blondel, Arthur Mensch, Jean-Philippe Vert

    Abstract: Computing the discrepancy between time series of variable sizes is notoriously challenging. While dynamic time war** (DTW) is popularly used for this purpose, it is not differentiable everywhere and is known to lead to bad local optima when used as a "loss". Soft-DTW addresses these issues, but it is not a positive definite divergence: due to the bias introduced by entropic regularization, it ca… ▽ More

    Submitted 25 February, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

    Comments: V3: AISTATS 2021 camera-ready

  20. arXiv:2002.08943  [pdf, other

    stat.ML cs.LG

    Implicit differentiation of Lasso-type models for hyperparameter optimization

    Authors: Quentin Bertrand, Quentin Klopfenstein, Mathieu Blondel, Samuel Vaiter, Alexandre Gramfort, Joseph Salmon

    Abstract: Setting regularization parameters for Lasso-type estimators is notoriously difficult, though crucial in practice. The most popular hyperparameter optimization approach is grid-search using held-out validation data. Grid-search however requires to choose a predefined grid for each parameter, which scales exponentially in the number of parameters. Another approach is to cast hyperparameter optimizat… ▽ More

    Submitted 3 September, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

  21. arXiv:2002.08871  [pdf, other

    stat.ML cs.LG

    Fast Differentiable Sorting and Ranking

    Authors: Mathieu Blondel, Olivier Teboul, Quentin Berthet, Josip Djolonga

    Abstract: The sorting operation is one of the most commonly used building blocks in computer programming. In machine learning, it is often used for robust statistics. However, seen as a function, it is piecewise linear and as a result includes many kinks where it is non-differentiable. More problematic is the related ranking operator, often used for order statistics and ranking metrics. It is a piecewise co… ▽ More

    Submitted 29 June, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: In proceedings of ICML 2020

  22. arXiv:2002.08676  [pdf, other

    cs.LG math.OC stat.ML

    Learning with Differentiable Perturbed Optimizers

    Authors: Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

    Abstract: Machine learning pipelines often rely on optimization procedures to make discrete decisions (e.g., sorting, picking closest neighbors, or shortest paths). Although these discrete decisions are easily computed, they break the back-propagation of computational graphs. In order to expand the scope of learning problems that can be solved in an end-to-end fashion, we propose a systematic method to tran… ▽ More

    Submitted 9 June, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

  23. arXiv:1910.11369  [pdf, other

    stat.ML cs.LG

    Structured Prediction with Projection Oracles

    Authors: Mathieu Blondel

    Abstract: We propose in this paper a general framework for deriving loss functions for structured prediction. In our framework, the user chooses a convex set including the output space and provides an oracle for projecting onto that set. Given that oracle, our framework automatically generates a corresponding convex and smooth loss function. As we show, adding a projection as output layer provably makes the… ▽ More

    Submitted 26 February, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

    Comments: In proceedings of NeurIPS 2019 (v2: minor modifications in Appendix A)

  24. arXiv:1905.06005  [pdf, other

    stat.ML cs.LG math.OC

    Geometric Losses for Distributional Learning

    Authors: Arthur Mensch, Mathieu Blondel, Gabriel Peyré

    Abstract: Building upon recent advances in entropy-regularized optimal transport, and upon Fenchel duality between measures and continuous functions , we propose a generalization of the logistic loss that incorporates a metric or cost between classes. Unlike previous attempts to use optimal transport distances for learning, our loss results in unconstrained convex objective functions, supports infinite (or… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Journal ref: Proceedings of the International Conference on Machine Learning, 2019, Long Beach, United States

  25. arXiv:1901.02324  [pdf, other

    stat.ML cs.LG

    Learning with Fenchel-Young Losses

    Authors: Mathieu Blondel, André F. T. Martins, Vlad Niculae

    Abstract: Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction. Understanding the core principles and theoretical properties underpinning these losses is key to choose the right loss for the right problem, as well as to create new losses which combine their st… ▽ More

    Submitted 2 March, 2020; v1 submitted 8 January, 2019; originally announced January 2019.

    Comments: In Journal of Machine Learning Research, volume 21

  26. arXiv:1805.09717  [pdf, other

    stat.ML cs.LG

    Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms

    Authors: Mathieu Blondel, André F. T. Martins, Vlad Niculae

    Abstract: This paper studies Fenchel-Young losses, a generic way to construct convex loss functions from a regularization function. We analyze their properties in depth, showing that they unify many well-known loss functions and allow to create useful new ones easily. Fenchel-Young losses constructed from a generalized entropy, including the Shannon and Tsallis entropies, induce predictive probability distr… ▽ More

    Submitted 22 February, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

    Comments: In proceedings of AISTATS 2019

  27. arXiv:1802.05429  [pdf, ps, other

    cs.SD eess.AS stat.ML

    Blind Source Separation with Optimal Transport Non-negative Matrix Factorization

    Authors: Antoine Rolet, Vivien Seguy, Mathieu Blondel, Hiroshi Sawada

    Abstract: Optimal transport as a loss for machine learning optimization problems has recently gained a lot of attention. Building upon recent advances in computational optimal transport, we develop an optimal transport non-negative matrix factorization (NMF) algorithm for supervised speech blind source separation (BSS). Optimal transport allows us to design and leverage a cost between short-time Fourier tra… ▽ More

    Submitted 15 February, 2018; originally announced February 2018.

    Comments: 22 pages, 7 figures, 2 additional files

  28. arXiv:1802.04223  [pdf, other

    stat.ML cs.CL cs.LG

    SparseMAP: Differentiable Sparse Structured Inference

    Authors: Vlad Niculae, André F. T. Martins, Mathieu Blondel, Claire Cardie

    Abstract: Structured prediction requires searching over a combinatorial number of structures. To tackle it, we introduce SparseMAP: a new method for sparse structured inference, and its natural loss function. SparseMAP automatically selects only a few global structures: it is situated between MAP inference, which picks a single structure, and marginal inference, which assigns probability mass to all structu… ▽ More

    Submitted 20 June, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: Published in ICML 2018. 14 pages, including appendix

    MSC Class: 68T50 ACM Class: I.2.6; I.2.6

  29. arXiv:1802.03676  [pdf, other

    stat.ML cs.LG

    Differentiable Dynamic Programming for Structured Prediction and Attention

    Authors: Arthur Mensch, Mathieu Blondel

    Abstract: Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, usi… ▽ More

    Submitted 20 February, 2018; v1 submitted 10 February, 2018; originally announced February 2018.

  30. arXiv:1710.06276  [pdf, other

    stat.ML cs.LG

    Smooth and Sparse Optimal Transport

    Authors: Mathieu Blondel, Vivien Seguy, Antoine Rolet

    Abstract: Entropic regularization is quickly emerging as a new standard in optimal transport (OT). It enables to cast the OT computation as a differentiable and unconstrained convex optimization problem, which can be efficiently solved using the Sinkhorn algorithm. However, entropy keeps the transportation plan strictly positive and therefore completely dense, unlike unregularized OT. This lack of sparsity… ▽ More

    Submitted 20 February, 2018; v1 submitted 17 October, 2017; originally announced October 2017.

    Comments: Accepted to AISTATS 2018

  31. arXiv:1705.07704  [pdf, other

    stat.ML cs.CL cs.LG

    A Regularized Framework for Sparse and Structured Neural Attention

    Authors: Vlad Niculae, Mathieu Blondel

    Abstract: Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input. We propose in this paper a new framework for sparse and structured attention, building upon a smoothed max operator. We show that the gradient of this operator defines a map** from real values to probabilities, suitable as an attention mechanism. Our framework includes… ▽ More

    Submitted 22 February, 2019; v1 submitted 22 May, 2017; originally announced May 2017.

    Comments: In proceedings of NeurIPS 2017; added errata

  32. arXiv:1705.07603  [pdf, other

    stat.ML cs.LG

    Multi-output Polynomial Networks and Factorization Machines

    Authors: Mathieu Blondel, Vlad Niculae, Takuma Otsuka, Naonori Ueda

    Abstract: Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation… ▽ More

    Submitted 4 November, 2017; v1 submitted 22 May, 2017; originally announced May 2017.

    Comments: Published at NIPS 2017. 17 pages, including appendix

  33. arXiv:1607.08810  [pdf, other

    stat.ML cs.LG

    Polynomial Networks and Factorization Machines: New Insights and Efficient Training Algorithms

    Authors: Mathieu Blondel, Masakazu Ishihata, Akinori Fu**o, Naonori Ueda

    Abstract: Polynomial networks and factorization machines are two recently-proposed models that can efficiently use feature interactions in classification and regression tasks. In this paper, we revisit both models from a unified perspective. Based on this new view, we study the properties of both models and propose new efficient training algorithms. Key to our approach is to cast parameter learning as a low… ▽ More

    Submitted 29 July, 2016; originally announced July 2016.

  34. arXiv:1607.07195  [pdf, other

    stat.ML cs.LG

    Higher-Order Factorization Machines

    Authors: Mathieu Blondel, Akinori Fu**o, Naonori Ueda, Masakazu Ishihata

    Abstract: Factorization machines (FMs) are a supervised learning approach that can use second-order feature combinations even when the data is very high-dimensional. Unfortunately, despite increasing interest in FMs, there exists to date no efficient training algorithm for higher-order FMs (HOFMs). In this paper, we present the first generic yet efficient algorithms for training arbitrary-order HOFMs. We al… ▽ More

    Submitted 14 October, 2016; v1 submitted 25 July, 2016; originally announced July 2016.

  35. arXiv:1309.0238  [pdf, ps, other

    cs.LG cs.MS

    API design for machine learning software: experiences from the scikit-learn project

    Authors: Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake Vanderplas, Arnaud Joly, Brian Holt, Gaël Varoquaux

    Abstract: Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and p… ▽ More

    Submitted 1 September, 2013; originally announced September 2013.

    Journal ref: European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Databases (2013)

  36. arXiv:1201.0490  [pdf, ps, other

    cs.LG cs.MS

    Scikit-learn: Machine Learning in Python

    Authors: Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay

    Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distribute… ▽ More

    Submitted 5 June, 2018; v1 submitted 2 January, 2012; originally announced January 2012.

    Comments: Update authors list and URLs

    Journal ref: Journal of Machine Learning Research (2011)