Skip to main content

Showing 1–42 of 42 results for author: Bottou, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.06394  [pdf, other

    cs.LG cs.AI cs.NE

    Memory Mosaics

    Authors: Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan, Beidi Chen, Léon Bottou

    Abstract: Memory Mosaics are networks of associative memories working in concert to achieve a prediction task of interest. Like transformers, memory mosaics possess compositional capabilities and in-context learning capabilities. Unlike transformers, memory mosaics achieve these capabilities in comparatively transparent ways. We demonstrate these capabilities on toy examples and we also show that memory mos… ▽ More

    Submitted 13 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

  2. arXiv:2403.00946  [pdf, other

    cs.LG cs.CV

    Fine-tuning with Very Large Dropout

    Authors: Jianyu Zhang, Léon Bottou

    Abstract: It is impossible today to pretend that the practice of machine learning is compatible with the idea that training and testing data follow the same distribution. Several authors have recently used ensemble techniques to show how scenarios involving multiple data distributions are best served by representations that are both richer than those obtained by regularizing for the best in-distribution per… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 13 pages

  3. arXiv:2310.01425  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Borges and AI

    Authors: Léon Bottou, Bernhard Schölkopf

    Abstract: Many believe that Large Language Models (LLMs) open the era of Artificial Intelligence (AI). Some see opportunities while others see dangers. Yet both proponents and opponents grasp AI through the imagery popularised by science fiction. Will the machine become sentient and rebel against its creators? Will we experience a paperclip apocalypse? Before answering such questions, we should first ask wh… ▽ More

    Submitted 4 October, 2023; v1 submitted 27 September, 2023; originally announced October 2023.

  4. arXiv:2306.00802  [pdf, other

    stat.ML cs.CL cs.LG

    Birth of a Transformer: A Memory Viewpoint

    Authors: Alberto Bietti, Vivien Cabannes, Diane Bouchacourt, Herve Jegou, Leon Bottou

    Abstract: Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We stu… ▽ More

    Submitted 6 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023

  5. arXiv:2303.15256  [pdf, other

    cs.LG cs.AI cs.HC

    Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need

    Authors: Vivien Cabannes, Leon Bottou, Yann Lecun, Randall Balestriero

    Abstract: Self-Supervised Learning (SSL) has emerged as the solution of choice to learn transferable representations from unlabeled data. However, SSL requires to build samples that are known to be semantically akin, i.e. positive views. Requiring such knowledge is the main limitation of SSL and is often tackled by ad-hoc strategies e.g. applying known data-augmentations to the same input. In this work, we… ▽ More

    Submitted 29 September, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: 8 main pages, 20 totals, 10 figures

    ACM Class: I.2.6

  6. arXiv:2212.10445  [pdf, other

    cs.LG cs.AI cs.CV

    Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization

    Authors: Alexandre Ramé, Kartik Ahuja, Jianyu Zhang, Matthieu Cord, Léon Bottou, David Lopez-Paz

    Abstract: Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions: from a pre-trained foundation model, they fine-tune the weights on the target task of interest. So, the Internet is swarmed by a handful of foundation models fine-tuned on many diverse tasks: these individual fine-tunings exist in isolation without ben… ▽ More

    Submitted 9 August, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: 24 pages, 10 tables, 21 figures

  7. arXiv:2212.07346  [pdf, other

    cs.LG cs.CV

    Learning useful representations for shifting tasks and distributions

    Authors: Jianyu Zhang, Léon Bottou

    Abstract: Does the dominant approach to learn representations (as a side effect of optimizing an expected cost for a single training distribution) remain a good approach when we are dealing with multiple distributions? Our thesis is that such scenarios are better served by representations that are richer than those obtained with a single optimization episode. We support this thesis with simple theoretical a… ▽ More

    Submitted 31 July, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: Published at ICML 2023. Blog post available at https://www.jianyuzhang.com/blog/rich-representation-learning

  8. arXiv:2204.03632  [pdf, other

    cs.LG cs.CV stat.ML

    The Effects of Regularization and Data Augmentation are Class Dependent

    Authors: Randall Balestriero, Leon Bottou, Yann LeCun

    Abstract: Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that… ▽ More

    Submitted 8 April, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

  9. arXiv:2203.15516  [pdf, other

    cs.LG cs.AI

    Rich Feature Construction for the Optimization-Generalization Dilemma

    Authors: Jianyu Zhang, David Lopez-Paz, Léon Bottou

    Abstract: There often is a dilemma between ease of optimization and robust out-of-distribution (OoD) generalization. For instance, many OoD methods rely on penalty terms whose optimization is challenging. They are either too strong to optimize reliably or too weak to achieve their goals. We propose to initialize the networks with a rich representation containing a palette of potentially useful features, r… ▽ More

    Submitted 8 July, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: 15 pages, ICML2022

  10. arXiv:2106.09671  [pdf, other

    cs.SI cs.AI

    Pseudo-Euclidean Attract-Repel Embeddings for Undirected Graphs

    Authors: Alexander Peysakhovich, Anna Klimovskaia Susmel, Leon Bottou

    Abstract: Dot product embeddings take a graph and construct vectors for nodes such that dot products between two vectors give the strength of the edge. Dot products make a strong transitivity assumption, however, many important forces generating graphs in the real world lead to non-transitive relationships. We remove the transitivity assumption by embedding nodes into a pseudo-Euclidean space - giving each… ▽ More

    Submitted 23 March, 2023; v1 submitted 17 June, 2021; originally announced June 2021.

  11. arXiv:2106.09467  [pdf, other

    cs.LG stat.ML

    Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation

    Authors: Agnieszka Słowik, Léon Bottou

    Abstract: Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data, which is not exposed by a low average error for the entire dataset. In consequential social and economic applications, where data represent people, this can lead to discrimination of underrepresented gender and ethnic groups. Given the importance of bias mitigati… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  12. arXiv:2102.10867  [pdf, other

    cs.LG cs.AI

    Linear unit-tests for invariance discovery

    Authors: Benjamin Aubin, Agnieszka Słowik, Martin Arjovsky, Leon Bottou, David Lopez-Paz

    Abstract: There is an increasing interest in algorithms to learn invariant correlations across training environments. A big share of the current proposals find theoretical support in the causality literature but, how useful are they in practice? The purpose of this note is to propose six linear low-dimensional problems -- unit tests -- to evaluate different types of out-of-distribution generalization in a p… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: 5 pages, Causal Discovery & Causality-Inspired Machine Learning Workshop at Neural Information Processing Systems

  13. arXiv:2003.02395  [pdf, other

    stat.ML cs.LG

    A Simple Convergence Proof of Adam and Adagrad

    Authors: Alexandre Défossez, Léon Bottou, Francis Bach, Nicolas Usunier

    Abstract: We provide a simple proof of convergence covering both the Adam and Adagrad adaptive optimization algorithms when applied to smooth (possibly non-convex) objective functions with bounded gradients. We show that in expectation, the squared norm of the objective gradient averaged over the trajectory has an upper-bound which is explicit in the constants of the problem, parameters of the optimizer, th… ▽ More

    Submitted 17 October, 2022; v1 submitted 4 March, 2020; originally announced March 2020.

    Comments: final TMLR version

  14. arXiv:1911.13254  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Music Source Separation in the Waveform Domain

    Authors: Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

    Abstract: Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song. Such components include voice, bass, drums and any other accompaniments.Contrarily to many audio synthesis tasks where the best performances are achieved by models that directly generate the waveform, the state-of-the-art in source… ▽ More

    Submitted 28 April, 2021; v1 submitted 27 November, 2019; originally announced November 2019.

  15. arXiv:1909.13334  [pdf, other

    cs.LG stat.ML

    Symplectic Recurrent Neural Networks

    Authors: Zhengdao Chen, Jianyu Zhang, Martin Arjovsky, Léon Bottou

    Abstract: We propose Symplectic Recurrent Neural Networks (SRNNs) as learning algorithms that capture the dynamics of physical systems from observed trajectories. An SRNN models the Hamiltonian function of the system by a neural network and furthermore leverages symplectic integration, multiple-step training and initial state optimization to address the challenging numerical issues associated with Hamiltoni… ▽ More

    Submitted 25 April, 2020; v1 submitted 29 September, 2019; originally announced September 2019.

    Comments: Added link to GitHub repository

    Journal ref: 8th International Conference on Learning Representations (ICLR 2020)

  16. arXiv:1909.01174  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed

    Authors: Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

    Abstract: We study the problem of source separation for music using deep learning with four known sources: drums, bass, vocals and other accompaniments. State-of-the-art approaches predict soft masks over mixture spectrograms while methods working on the waveform are lagging behind as measured on the standard MusDB benchmark. Our contribution is two fold. (i) We introduce a simple convolutional and recurren… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

  17. arXiv:1907.02893  [pdf, other

    stat.ML cs.AI cs.LG

    Invariant Risk Minimization

    Authors: Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, David Lopez-Paz

    Abstract: We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top of that data representation, matches for all training distributions. Through theory and experiments, we show how the invariances learned by IRM relate to the cau… ▽ More

    Submitted 27 March, 2020; v1 submitted 5 July, 2019; originally announced July 2019.

  18. Beyond Folklore: A Scaling Calculus for the Design and Initialization of ReLU Networks

    Authors: Aaron Defazio, Léon Bottou

    Abstract: We propose a system for calculating a "scaling constant" for layers and weights of neural networks. We relate this scaling constant to two important quantities that relate to the optimizability of neural networks, and argue that a network that is "preconditioned" via scaling, in the sense that all weights have the same scaling constant, will be easier to train. This scaling calculus results in a n… ▽ More

    Submitted 11 February, 2021; v1 submitted 10 June, 2019; originally announced June 2019.

    Journal ref: Neural Comput & Applic (2022)

  19. arXiv:1905.10498  [pdf, other

    cs.LG cs.CV stat.ML

    Cold Case: The Lost MNIST Digits

    Authors: Chhavi Yadav, Léon Bottou

    Abstract: Although the popular MNIST dataset [LeCun et al., 1994] is derived from the NIST database [Grother and Hanaoka, 1995], the precise processing steps for this derivation have been lost to time. We propose a reconstruction that is accurate enough to serve as a replacement for the MNIST dataset, with insignificant changes in accuracy. We trace each MNIST digit to its NIST source and its rich metadata… ▽ More

    Submitted 4 November, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

    Comments: Final NeurIPS version

  20. arXiv:1812.04549  [pdf, other

    cs.LG stat.ML

    Controlling Covariate Shift using Balanced Normalization of Weights

    Authors: Aaron Defazio, Léon Bottou

    Abstract: We introduce a new normalization technique that exhibits the fast convergence properties of batch normalization using a transformation of layer weights instead of layer outputs. The proposed technique keeps the contribution of positive and negative weights to the layer output balanced. We validate our method on a set of standard benchmarks including CIFAR-10/100, SVHN and ILSVRC 2012 ImageNet.

    Submitted 10 May, 2019; v1 submitted 11 December, 2018; originally announced December 2018.

  21. arXiv:1812.04529  [pdf, other

    cs.LG stat.ML

    On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

    Authors: Aaron Defazio, Léon Bottou

    Abstract: The application of stochastic variance reduction to optimization has shown remarkable recent theoretical and practical success. The applicability of these techniques to the hard non-convex optimization problems encountered during training of modern deep neural networks is an open problem. We show that naive application of the SVRG technique and related approaches fail, and explore why.

    Submitted 20 November, 2019; v1 submitted 11 December, 2018; originally announced December 2018.

  22. arXiv:1810.09785  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    SING: Symbol-to-Instrument Neural Generator

    Authors: Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach

    Abstract: Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their successes, current state-of-the-art neural audio synthesizers such as WaveNet and SampleRNN suffer from prohibitive training and inference times because… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

    Journal ref: Conference on Neural Information Processing Systems (NIPS), Dec 2018, Montr{é}al, Canada

  23. arXiv:1806.01811  [pdf, ps, other

    stat.ML cs.LG

    AdaGrad stepsizes: Sharp convergence over nonconvex landscapes

    Authors: Rachel Ward, Xiaoxia Wu, Leon Bottou

    Abstract: Adaptive gradient methods such as AdaGrad and its variants update the stepsize in stochastic gradient descent on the fly according to the gradients received along the way; such methods have gained widespread use in large-scale optimization for their ability to converge robustly, without the need to fine-tune the stepsize schedule. Yet, the theoretical guarantees to date for AdaGrad are for online… ▽ More

    Submitted 18 April, 2021; v1 submitted 5 June, 2018; originally announced June 2018.

    Journal ref: journal = {Journal of Machine Learning Research}, year = {2020}, volume = {21}, number = {219}, pages = {1-30}, url = {http://jmlr.org/papers/v21/18-352.html}

  24. arXiv:1803.02865  [pdf, ps, other

    stat.ML cs.AI cs.LG math.NA math.OC

    WNGrad: Learn the Learning Rate in Gradient Descent

    Authors: Xiaoxia Wu, Rachel Ward, Léon Bottou

    Abstract: Adjusting the learning rate schedule in stochastic gradient methods is an important unresolved problem which requires tuning in practice. If certain parameters of the loss function such as smoothness or strong convexity constants are known, theoretical learning rate schedules can be applied. However, in practice, such parameters are not known, and the loss function of interest is not convex in any… ▽ More

    Submitted 19 November, 2020; v1 submitted 7 March, 2018; originally announced March 2018.

    Comments: 10 pages, 3 figures, conference

    MSC Class: 80M50; 90C15; 90C26; 90C30; 68T05

  25. arXiv:1802.01421  [pdf, other

    stat.ML cs.CV cs.LG

    First-order Adversarial Vulnerability of Neural Networks and Input Dimension

    Authors: Carl-Johann Simon-Gabriel, Yann Ollivier, Léon Bottou, Bernhard Schölkopf, David Lopez-Paz

    Abstract: Over the past few years, neural networks were proven vulnerable to adversarial images: targeted but imperceptible image perturbations lead to drastically different predictions. We show that adversarial vulnerability increases with the gradients of the training objective when viewed as a function of the inputs. Surprisingly, vulnerability does not depend on network topology: for many standard netwo… ▽ More

    Submitted 16 June, 2019; v1 submitted 5 February, 2018; originally announced February 2018.

    Comments: Paper previously called: "Adversarial Vulnerability of Neural Networks Increases with Input Dimension". 9 pages main text and references, 11 pages appendix, 14 figures

    MSC Class: 68T45 ACM Class: I.2.6

    Journal ref: Proceedings of ICML 2019

  26. arXiv:1712.07822  [pdf, other

    stat.ML cs.AI cs.LG

    Geometrical Insights for Implicit Generative Modeling

    Authors: Leon Bottou, Martin Arjovsky, David Lopez-Paz, Maxime Oquab

    Abstract: Learning algorithms for implicit generative models can optimize a variety of criteria that measure how the data distribution differs from the implicit model distribution, including the Wasserstein distance, the Energy distance, and the Maximum Mean Discrepancy criterion. A careful look at the geometries induced by these distances on the space of probability measures reveals interesting differences… ▽ More

    Submitted 21 August, 2019; v1 submitted 21 December, 2017; originally announced December 2017.

    Comments: this version fixes a typo in a definition

  27. arXiv:1706.04454  [pdf, other

    cs.LG

    Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

    Authors: Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou

    Abstract: We study the properties of common loss surfaces through their Hessian matrix. In particular, in the context of deep learning, we empirically show that the spectrum of the Hessian is composed of two parts: (1) the bulk centered near zero, (2) and outliers away from the bulk. We present numerical evidence and mathematical justifications to the following conjectures laid out by Sagun et al. (2016): F… ▽ More

    Submitted 7 May, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

    Comments: Minor update for ICLR 2018 Workshop Track presentation

  28. arXiv:1705.09319  [pdf, other

    cs.LG stat.ML

    Diagonal Rescaling For Neural Networks

    Authors: Jean Lafond, Nicolas Vasilache, Léon Bottou

    Abstract: We define a second-order neural network stochastic gradient training algorithm whose block-diagonal structure effectively amounts to normalizing the unit activations. Investigating why this algorithm lacks in robustness then reveals two interesting insights. The first insight suggests a new way to scale the stepsizes, clarifying popular algorithms such as RMSProp as well as old neural network tric… ▽ More

    Submitted 25 May, 2017; originally announced May 2017.

  29. arXiv:1701.07875  [pdf, other

    stat.ML cs.LG

    Wasserstein GAN

    Authors: Martin Arjovsky, Soumith Chintala, Léon Bottou

    Abstract: We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical wor… ▽ More

    Submitted 6 December, 2017; v1 submitted 26 January, 2017; originally announced January 2017.

  30. arXiv:1701.04862  [pdf, other

    stat.ML cs.LG

    Towards Principled Methods for Training Generative Adversarial Networks

    Authors: Martin Arjovsky, Léon Bottou

    Abstract: The goal of this paper is not to introduce a single algorithm or method, but to make theoretical steps towards fully understanding the training dynamics of generative adversarial networks. In order to substantiate our theoretical analysis, we perform targeted experiments to verify our assumptions, illustrate our claims, and quantify the phenomena. This paper is divided into three sections. The fir… ▽ More

    Submitted 17 January, 2017; originally announced January 2017.

  31. arXiv:1611.07476  [pdf, other

    cs.LG

    Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

    Authors: Levent Sagun, Leon Bottou, Yann LeCun

    Abstract: We look at the eigenvalues of the Hessian of a loss function before and after training. The eigenvalue distribution is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. We present empirical evidence for the bulk indicating how over-parametrized the system is, and for the edges that depend on the input data.

    Submitted 5 October, 2017; v1 submitted 22 November, 2016; originally announced November 2016.

    Comments: ICLR submission, 2016 - updated to match the openreview.net version

  32. arXiv:1606.04838  [pdf, other

    stat.ML cs.LG math.OC

    Optimization Methods for Large-Scale Machine Learning

    Authors: Léon Bottou, Frank E. Curtis, Jorge Nocedal

    Abstract: This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine… ▽ More

    Submitted 8 February, 2018; v1 submitted 15 June, 2016; originally announced June 2016.

  33. arXiv:1605.08179  [pdf, other

    stat.ML cs.CV

    Discovering Causal Signals in Images

    Authors: David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, Léon Bottou

    Abstract: This paper establishes the existence of observable footprints that reveal the "causal dispositions" of the object categories appearing in collections of images. We achieve this goal in two steps. First, we take a learning approach to observational causal discovery, and build a classifier that achieves state-of-the-art performance on finding the causal direction between pairs of random variables, g… ▽ More

    Submitted 31 October, 2017; v1 submitted 26 May, 2016; originally announced May 2016.

  34. arXiv:1511.03643  [pdf, other

    stat.ML cs.LG

    Unifying distillation and privileged information

    Authors: David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik

    Abstract: Distillation (Hinton et al., 2015) and privileged information (Vapnik & Izmailov, 2015) are two techniques that enable machines to learn from other machines. This paper unifies these two techniques into generalized distillation, a framework to learn from multiple machines and data representations. We provide theoretical and causal insight about the inner workings of generalized distillation, exten… ▽ More

    Submitted 25 February, 2016; v1 submitted 11 November, 2015; originally announced November 2015.

    Journal ref: Proceedings of the International Conference on Learning Representations (2016) 1-10

  35. arXiv:1508.02933  [pdf, ps, other

    stat.ML cs.LG math.OC math.ST

    No Regret Bound for Extreme Bandits

    Authors: Robert Nishihara, David Lopez-Paz, Léon Bottou

    Abstract: Algorithms for hyperparameter optimization abound, all of which work well under different and often unverifiable assumptions. Motivated by the general challenge of sequentially choosing which algorithm to use, we study the more specific task of choosing among distributions to use for random hyperparameter optimization. This work is naturally framed in the extreme bandit setting, which deals with s… ▽ More

    Submitted 11 April, 2016; v1 submitted 12 August, 2015; originally announced August 2015.

    Comments: 11 pages, International Conference on Artificial Intelligence and Statistics, 2016

  36. arXiv:1409.4814  [pdf

    cs.AI cs.IR

    ICE: Enabling Non-Experts to Build Models Interactively for Large-Scale Lopsided Problems

    Authors: Patrice Simard, David Chickering, Aparna Lakshmiratan, Denis Charles, Leon Bottou, Carlos Garcia Jurado Suarez, David Grangier, Saleema Amershi, Johan Verwey, **a Suh

    Abstract: Quick interaction between a human teacher and a learning machine presents numerous benefits and challenges when working with web-scale data. The human teacher guides the machine towards accomplishing the task of interest. The learning machine leverages big data to find examples that maximize the training value of its interaction with the teacher. When the teacher is restricted to labeling examples… ▽ More

    Submitted 16 September, 2014; originally announced September 2014.

  37. arXiv:1311.0636  [pdf, ps, other

    cs.LG cs.DC

    A Parallel SGD method with Strong Convergence

    Authors: Dhruv Mahajan, S. Sathiya Keerthi, S. Sundararajan, Leon Bottou

    Abstract: This paper proposes a novel parallel stochastic gradient descent (SGD) method that is obtained by applying parallel sets of SGD iterations (each set operating on one node using the data residing in it) for finding the direction in each iteration of a batch descent method. The method has strong convergence properties. Experiments on datasets with high dimensional feature spaces show the value of th… ▽ More

    Submitted 4 November, 2013; originally announced November 2013.

  38. arXiv:1310.8418  [pdf, ps, other

    cs.LG

    An efficient distributed learning algorithm based on effective local functional approximations

    Authors: Dhruv Mahajan, Nikunj Agrawal, S. Sathiya Keerthi, S. Sundararajan, Leon Bottou

    Abstract: Scalable machine learning over big data is an important problem that is receiving a lot of attention in recent years. On popular distributed environments such as Hadoop running on a cluster of commodity machines, communication costs are substantial and algorithms need to be designed suitably considering those costs. In this paper we give a novel approach to the distributed training of linear class… ▽ More

    Submitted 16 March, 2015; v1 submitted 31 October, 2013; originally announced October 2013.

  39. arXiv:1310.8243  [pdf, other

    cs.LG stat.ML

    Para-active learning

    Authors: Alekh Agarwal, Leon Bottou, Miroslav Dudik, John Langford

    Abstract: Training examples are not all equally informative. Active learning strategies leverage this observation in order to massively reduce the number of examples that need to be labeled. We leverage the same observation to build a generic strategy for parallelizing learning algorithms. This strategy is effective because the search for informative examples is highly parallelizable and because we show tha… ▽ More

    Submitted 30 October, 2013; originally announced October 2013.

  40. arXiv:1209.2355  [pdf, other

    cs.LG cs.AI cs.IR math.ST

    Counterfactual Reasoning and Learning Systems

    Authors: Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, Ed Snelson

    Abstract: This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select changes that improve both the short-term and long-term performance of such systems. This work is illustrated by experiments carried out on the ad… ▽ More

    Submitted 27 July, 2013; v1 submitted 11 September, 2012; originally announced September 2012.

    Comments: revised version

  41. arXiv:1103.0398  [pdf, other

    cs.LG cs.CL

    Natural Language Processing (almost) from Scratch

    Authors: Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa

    Abstract: We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input… ▽ More

    Submitted 2 March, 2011; originally announced March 2011.

  42. arXiv:1102.1808  [pdf

    cs.AI cs.LG

    From Machine Learning to Machine Reasoning

    Authors: Leon Bottou

    Abstract: A plausible definition of "reasoning" could be "algebraically manipulating previously acquired knowledge in order to answer a new question". This definition covers first-order logical inference or probabilistic inference. It also includes much simpler manipulations commonly used to build large learning systems. For instance, we can build an optical character recognition system by first training a… ▽ More

    Submitted 11 February, 2011; v1 submitted 9 February, 2011; originally announced February 2011.

    Comments: 15 pages - fix broken pagination in v2

    Report number: tr-2011-02-08