Skip to main content

Showing 1–6 of 6 results for author: Nocedal, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:1802.05374  [pdf, other

    math.OC cs.LG stat.ML

    A Progressive Batching L-BFGS Method for Machine Learning

    Authors: Raghu Bollapragada, Dheevatsa Mudigere, Jorge Nocedal, Hao-Jun Michael Shi, ** Tak Peter Tang

    Abstract: The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization pr… ▽ More

    Submitted 30 May, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

    Comments: ICML 2018. 25 pages, 17 figures, 2 tables

  2. arXiv:1705.06211  [pdf, other

    math.OC cs.LG stat.ML

    An Investigation of Newton-Sketch and Subsampled Newton Methods

    Authors: Albert S. Berahas, Raghu Bollapragada, Jorge Nocedal

    Abstract: Sketching, a dimensionality reduction technique, has received much attention in the statistics community. In this paper, we study sketching in the context of Newton's method for solving finite-sum optimization problems in which the number of variables and data points are both large. We study two forms of sketching that perform dimensionality reduction in data space: Hessian subsampling and randomi… ▽ More

    Submitted 30 May, 2019; v1 submitted 17 May, 2017; originally announced May 2017.

    Comments: 36 pages, 22 figures

  3. arXiv:1609.04836  [pdf, other

    cs.LG math.OC

    On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

    Authors: Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, ** Tak Peter Tang

    Abstract: The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, say $32$-$512$ data points, is sampled to compute an approximation to the gradient. It has been observed in practice that when using a larger batch there is a degradation in the quality of the mod… ▽ More

    Submitted 9 February, 2017; v1 submitted 15 September, 2016; originally announced September 2016.

    Comments: Accepted as a conference paper at ICLR 2017

  4. arXiv:1606.04838  [pdf, other

    stat.ML cs.LG math.OC

    Optimization Methods for Large-Scale Machine Learning

    Authors: Léon Bottou, Frank E. Curtis, Jorge Nocedal

    Abstract: This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine… ▽ More

    Submitted 8 February, 2018; v1 submitted 15 June, 2016; originally announced June 2016.

  5. arXiv:1605.06049  [pdf, other

    math.OC cs.LG stat.ML

    A Multi-Batch L-BFGS Method for Machine Learning

    Authors: Albert S. Berahas, Jorge Nocedal, Martin Takáč

    Abstract: The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods that use a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employ second-order information. In order to improve the learning process, we follow a multi-batch approach in which the… ▽ More

    Submitted 23 October, 2016; v1 submitted 19 May, 2016; originally announced May 2016.

    Comments: NIPS 2016. 31 pages, 22 figures

  6. arXiv:1401.7020  [pdf, other

    math.OC cs.LG stat.ML

    A Stochastic Quasi-Newton Method for Large-Scale Optimization

    Authors: R. H. Byrd, S. L. Hansen, J. Nocedal, Y. Singer

    Abstract: The question of how to incorporate curvature information in stochastic approximation methods is challenging. The direct application of classical quasi- Newton updating techniques for deterministic optimization leads to noisy curvature estimates that have harmful effects on the robustness of the iteration. In this paper, we propose a stochastic quasi-Newton method that is efficient, robust and scal… ▽ More

    Submitted 18 February, 2015; v1 submitted 27 January, 2014; originally announced January 2014.