Skip to main content

Showing 51–87 of 87 results for author: Lacoste-Julien, S

.
  1. arXiv:1810.11544  [pdf, other

    cs.LG cs.AI stat.ML

    Quantifying Learning Guarantees for Convex but Inconsistent Surrogates

    Authors: Kirill Struminsky, Simon Lacoste-Julien, Anton Osokin

    Abstract: We study consistency properties of machine learning methods based on minimizing convex surrogates. We extend the recent framework of Osokin et al. (2017) for the quantitative analysis of consistency properties to the case of inconsistent surrogates. Our key technical contribution consists in a new lower bound on the calibration function for the quadratic surrogate, which is non-trivial (not always… ▽ More

    Submitted 9 January, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

    Comments: Appears in: Advances in Neural Information Processing Systems 31 (NeurIPS 2018). 18 pages

  2. arXiv:1810.08591  [pdf, other

    cs.LG stat.ML

    A Modern Take on the Bias-Variance Tradeoff in Neural Networks

    Authors: Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: The bias-variance tradeoff tells us that as model complexity increases, bias falls and variances increases, leading to a U-shaped test error curve. However, recent empirical results with over-parameterized neural networks are marked by a striking absence of the classic U-shaped test error curve: test error keeps decreasing in wider networks. This suggests that there might not be a bias-variance tr… ▽ More

    Submitted 18 December, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

    Journal ref: ICML 2019 Workshop on Identifying and Understanding Deep Learning Phenomena

  3. arXiv:1809.06367  [pdf, other

    cs.LG cs.CV stat.ML

    Scattering Networks for Hybrid Representation Learning

    Authors: Edouard Oyallon, Sergey Zagoruyko, Gabriel Huang, Nikos Komodakis, Simon Lacoste-Julien, Matthew Blaschko, Eugene Belilovsky

    Abstract: Scattering networks are a class of designed Convolutional Neural Networks (CNNs) with fixed weights. We argue they can serve as generic representations for modelling images. In particular, by working in scattering space, we achieve competitive results both for supervised and unsupervised learning tasks, while making progress towards constructing more interpretable CNNs. For supervised learning, we… ▽ More

    Submitted 17 September, 2018; originally announced September 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1703.08961

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, 2018, pp.11

  4. Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

    Authors: Eric Larsen, Sébastien Lachapelle, Yoshua Bengio, Emma Fre**ger, Simon Lacoste-Julien, Andrea Lodi

    Abstract: This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a methodology to quickly predict expected tactical descriptions of operational solutions (TDOSs). The problem we address occurs in the context of two-stage stochastic programming where the second stage is demanding computationally. We aim to predict at a high speed th… ▽ More

    Submitted 1 March, 2021; v1 submitted 31 July, 2018; originally announced July 2018.

    Journal ref: INFORMS Journal on Computing 34(1):227-242, 2021

  5. arXiv:1807.04740  [pdf, other

    cs.LG stat.ML

    Negative Momentum for Improved Game Dynamics

    Authors: Gauthier Gidel, Reyhane Askari Hemmat, Mohammad Pezeshki, Remi Lepriol, Gabriel Huang, Simon Lacoste-Julien, Ioannis Mitliagkas

    Abstract: Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiable games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optim… ▽ More

    Submitted 28 August, 2020; v1 submitted 12 July, 2018; originally announced July 2018.

    Comments: Appears in: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019). Minor changes with respect to the AISTATS version: typo corrected in Thm. 6 (squared condition number instead of condition number; and small change in constant) and dependence in $β$ changed in Theorem 5 for the formal statement; not changing the conclusions. 28 pages

    ACM Class: I.2.6; G.1.6

  6. arXiv:1804.03176  [pdf, other

    math.OC cs.LG stat.ML

    Frank-Wolfe Splitting via Augmented Lagrangian Method

    Authors: Gauthier Gidel, Fabian Pedregosa, Simon Lacoste-Julien

    Abstract: Minimizing a function over an intersection of convex sets is an important task in optimization that is often much more challenging than minimizing it over each individual constraint set. While traditional methods such as Frank-Wolfe (FW) or proximal gradient descent assume access to a linear or quadratic oracle on the intersection, splitting techniques take advantage of the structure of each sets,… ▽ More

    Submitted 9 April, 2018; originally announced April 2018.

    Comments: Appears in: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018). 30 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

  7. arXiv:1802.10551  [pdf, other

    cs.LG math.OC stat.ML

    A Variational Inequality Perspective on Generative Adversarial Networks

    Authors: Gauthier Gidel, Hugo Berard, Gaëtan Vignoud, Pascal Vincent, Simon Lacoste-Julien

    Abstract: Generative adversarial networks (GANs) form a generative modeling approach known for producing appealing samples, but they are notably difficult to train. One common way to tackle this issue has been to propose new formulations of the GAN objective. Yet, surprisingly few studies have looked at optimization methods designed for this adversarial training. In this work, we cast GAN optimization probl… ▽ More

    Submitted 28 August, 2020; v1 submitted 28 February, 2018; originally announced February 2018.

    Comments: Appears in: Proceedings of the Seventh International Conference on Learning Representations (ICLR 2019). Minor modifications with respect to the ICLR version (First paragraph of page 2 and section 3.3): New reference [Popov 1980] and discussion with regards to the novelty of extrapolation from the past. 38 pages

    ACM Class: I.2.6; G.1.6

  8. arXiv:1801.04055  [pdf, other

    cs.LG stat.ML

    A3T: Adversarially Augmented Adversarial Training

    Authors: Akram Erraqabi, Aristide Baratin, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: Recent research showed that deep neural networks are highly sensitive to so-called adversarial perturbations, which are tiny perturbations of the input data purposely designed to fool a machine learning classifier. Most classification models, including deep learning models, are highly vulnerable to adversarial attacks. In this work, we investigate a procedure to improve adversarial robustness of d… ▽ More

    Submitted 11 January, 2018; originally announced January 2018.

    Comments: accepted for an oral presentation in Machine Deception Workshop, NIPS 2017

  9. arXiv:1801.03749  [pdf, other

    math.OC cs.LG stat.ML

    Improved asynchronous parallel optimization analysis for stochastic incremental methods

    Authors: Rémi Leblond, Fabian Pedregosa, Simon Lacoste-Julien

    Abstract: As datasets continue to increase in size and multi-core computer architectures are developed, asynchronous parallel optimization algorithms become more and more essential to the field of Machine Learning. Unfortunately, conducting the theoretical analysis asynchronous methods is difficult, notably due to the introduction of delay and inconsistency in inherently sequential algorithms. Handling thes… ▽ More

    Submitted 21 March, 2019; v1 submitted 11 January, 2018; originally announced January 2018.

    Comments: 67 pages, published in JMLR, can be found online at http://jmlr.org/papers/v19/17-650.html. arXiv admin note: substantial text overlap with arXiv:1606.04809

  10. arXiv:1712.08577  [pdf, other

    stat.ML cs.LG

    Adaptive Stochastic Dual Coordinate Ascent for Conditional Random Fields

    Authors: Rémi Le Priol, Alexandre Piché, Simon Lacoste-Julien

    Abstract: This work investigates the training of conditional random fields (CRFs) via the stochastic dual coordinate ascent (SDCA) algorithm of Shalev-Shwartz and Zhang (2016). SDCA enjoys a linear convergence rate and a strong empirical performance for binary classification problems. However, it has never been used to train CRFs. Yet it benefits from an `exact' line search with a single marginalization ora… ▽ More

    Submitted 9 July, 2018; v1 submitted 22 December, 2017; originally announced December 2017.

    Comments: Published as a conference paper at UAI 2018. 22 pages

    MSC Class: 90C52; 90C90; 90C06; 68T05 ACM Class: G.1.6; I.2.6

  11. arXiv:1708.02511  [pdf, other

    cs.LG stat.ML

    Parametric Adversarial Divergences are Good Losses for Generative Modeling

    Authors: Gabriel Huang, Hugo Berard, Ahmed Touati, Gauthier Gidel, Pascal Vincent, Simon Lacoste-Julien

    Abstract: Parametric adversarial divergences, which are a generalization of the losses used to train generative adversarial networks (GANs), have often been described as being approximations of their nonparametric counterparts, such as the Jensen-Shannon divergence, which can be derived under the so-called optimal discriminator assumption. In this position paper, we argue that despite being "non-optimal", p… ▽ More

    Submitted 21 October, 2021; v1 submitted 8 August, 2017; originally announced August 2017.

  12. arXiv:1707.06468  [pdf, other

    math.OC cs.LG stat.ML

    Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization

    Authors: Fabian Pedregosa, Rémi Leblond, Simon Lacoste-Julien

    Abstract: Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Yet, despite their practical success, support for nonsmooth objectives is still lacking, making them unsuitable for many problems of interest in machine learning, such as… ▽ More

    Submitted 5 November, 2017; v1 submitted 20 July, 2017; originally announced July 2017.

    Comments: Appears in Advances in Neural Information Processing Systems 30 (NIPS 2017), 28 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

    Journal ref: Advances in Neural Information Processing Systems 30 (NIPS 2017)

  13. arXiv:1706.05394  [pdf, other

    stat.ML cs.LG

    A Closer Look at Memorization in Deep Networks

    Authors: Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. r… ▽ More

    Submitted 1 July, 2017; v1 submitted 16 June, 2017; originally announced June 2017.

    Comments: Appears in Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, and David Krueger contributed equally to this work

  14. arXiv:1706.04499  [pdf, other

    cs.LG stat.ML

    SEARNN: Training RNNs with Global-Local Losses

    Authors: Rémi Leblond, Jean-Baptiste Alayrac, Anton Osokin, Simon Lacoste-Julien

    Abstract: We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the "learning to search" (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an appropria… ▽ More

    Submitted 4 March, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

    Comments: Published as a conference paper at ICLR 2018, 16 pages

  15. arXiv:1703.02403  [pdf, other

    cs.LG stat.ML

    On Structured Prediction Theory with Calibrated Convex Surrogate Losses

    Authors: Anton Osokin, Francis Bach, Simon Lacoste-Julien

    Abstract: We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual risk. In contrast to prio… ▽ More

    Submitted 29 January, 2018; v1 submitted 7 March, 2017; originally announced March 2017.

    Comments: Appears in: Advances in Neural Information Processing Systems 30 (NIPS 2017). 30 pages

  16. arXiv:1702.02738  [pdf, other

    cs.CV cs.LG

    Joint Discovery of Object States and Manipulation Actions

    Authors: Jean-Baptiste Alayrac, Josev Sivic, Ivan Laptev, Simon Lacoste-Julien

    Abstract: Many human activities involve object manipulations aiming to modify the object state. Examples of common state changes include full/empty bottle, open/closed door, and attached/detached car wheel. In this work, we seek to automatically discover the states of objects and the associated manipulation actions. Given a set of videos for a particular task, we propose a joint model that learns to identif… ▽ More

    Submitted 28 August, 2017; v1 submitted 9 February, 2017; originally announced February 2017.

    Comments: Appears in: International Conference on Computer Vision 2017 (ICCV 2017). 15 pages

    ACM Class: I.5.1; I.5.4; I.2

  17. arXiv:1610.07797  [pdf, other

    math.OC cs.LG stat.ML

    Frank-Wolfe Algorithms for Saddle Point Problems

    Authors: Gauthier Gidel, Tony Jebara, Simon Lacoste-Julien

    Abstract: We extend the Frank-Wolfe (FW) optimization algorithm to solve constrained smooth convex-concave saddle point (SP) problems. Remarkably, the method only requires access to linear minimization oracles. Leveraging recent advances in FW optimization, we provide the first proof of convergence of a FW-type saddle point solver over polytopes, thereby partially answering a 30 year-old conjecture. We also… ▽ More

    Submitted 3 March, 2017; v1 submitted 25 October, 2016; originally announced October 2016.

    Comments: Appears in: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017). 39 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

  18. arXiv:1607.00345  [pdf, other

    math.OC cs.LG math.NA stat.ML

    Convergence Rate of Frank-Wolfe for Non-Convex Objectives

    Authors: Simon Lacoste-Julien

    Abstract: We give a simple proof that the Frank-Wolfe algorithm obtains a stationary point at a rate of $O(1/\sqrt{t})$ on non-convex objectives with a Lipschitz continuous gradient. Our analysis is affine invariant and is the first, to the best of our knowledge, giving a similar rate to what was already proven for projected gradient methods (though on slightly different measures of stationarity).

    Submitted 1 July, 2016; originally announced July 2016.

    Comments: 6 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

  19. arXiv:1606.04809  [pdf, other

    math.OC cs.LG stat.ML

    ASAGA: Asynchronous Parallel SAGA

    Authors: Rémi Leblond, Fabian Pedregosa, Simon Lacoste-Julien

    Abstract: We describe ASAGA, an asynchronous parallel version of the incremental gradient algorithm SAGA that enjoys fast linear convergence rates. Through a novel perspective, we revisit and clarify a subtle but important technical issue present in a large fraction of the recent convergence rate proofs for asynchronous parallel optimization algorithms, and propose a simplification of the recently introduce… ▽ More

    Submitted 8 November, 2017; v1 submitted 15 June, 2016; originally announced June 2016.

    Comments: Appears in: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017), 37 pages

  20. arXiv:1605.09346  [pdf, other

    cs.LG math.OC stat.ML

    Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs

    Authors: Anton Osokin, Jean-Baptiste Alayrac, Isabella Lukasewitz, Puneet K. Dokania, Simon Lacoste-Julien

    Abstract: In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by BCFW reveal the bl… ▽ More

    Submitted 30 May, 2016; originally announced May 2016.

    Comments: Appears in Proceedings of the 33rd International Conference on Machine Learning (ICML 2016). 31 pages

    MSC Class: 90C52; 90C90; 90C06; 68T05 ACM Class: G.1.6; I.2.6

  21. arXiv:1605.08636  [pdf, other

    stat.ML cs.LG

    PAC-Bayesian Theory Meets Bayesian Inference

    Authors: Pascal Germain, Francis Bach, Alexandre Lacoste, Simon Lacoste-Julien

    Abstract: We exhibit a strong link between frequentist PAC-Bayesian risk bounds and the Bayesian marginal likelihood. That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood. This provides an alternative explanation to the Bayesian Occam's razor criteria, under the assumption that the data is… ▽ More

    Submitted 13 February, 2017; v1 submitted 27 May, 2016; originally announced May 2016.

    Comments: Published at NIPS 2015 (http://papers.nips.cc/paper/6569-pac-bayesian-theory-meets-bayesian-inference)

    Journal ref: Advances in Neural Information Processing Systems 29 (NIPS 2016), p. 1884-1892

  22. arXiv:1602.09013  [pdf, other

    stat.ML cs.LG

    Beyond CCA: Moment Matching for Multi-View Models

    Authors: Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien

    Abstract: We introduce three novel semi-parametric extensions of probabilistic canonical correlation analysis with identifiability guarantees. We consider moment matching techniques for estimation in these models. For that, by drawing explicit links between the new models and a discrete version of independent component analysis (DICA), we first extend the DICA cumulant tensors to the new discrete version of… ▽ More

    Submitted 3 June, 2016; v1 submitted 29 February, 2016; originally announced February 2016.

    Comments: Appears in: Proceedings of the 33rd International Conference on Machine Learning (ICML 2016). 22 pages

    MSC Class: 62H12 62H20 62H25 62F10 68T05 68T50 90C26 90C90 ACM Class: I.2.6; I.2.7; G.1.3; G.1.6; G.3

  23. arXiv:1511.05932  [pdf, other

    math.OC cs.LG stat.ML

    On the Global Linear Convergence of Frank-Wolfe Optimization Variants

    Authors: Simon Lacoste-Julien, Martin Jaggi

    Abstract: The Frank-Wolfe (FW) optimization algorithm has lately re-gained popularity thanks in particular to its ability to nicely handle the structured constraints appearing in machine learning applications. However, its convergence rate is known to be slow (sublinear) when the solution lies at the boundary. A simple less-known fix is to add the possibility to take 'away steps' during optimization, an ope… ▽ More

    Submitted 18 November, 2015; originally announced November 2015.

    Comments: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 26 pages

    MSC Class: 90C52; 90C90; 68T05 ACM Class: G.1.6; I.2.6

  24. arXiv:1511.02124  [pdf, other

    stat.ML cs.LG math.OC

    Barrier Frank-Wolfe for Marginal Inference

    Authors: Rahul G. Krishnan, Simon Lacoste-Julien, David Sontag

    Abstract: We introduce a globally-convergent algorithm for optimizing the tree-reweighted (TRW) variational objective over the marginal polytope. The algorithm is based on the conditional gradient method (Frank-Wolfe) and moves pseudomarginals within the marginal polytope through repeated maximum a posteriori (MAP) calls. This modular structure enables us to leverage black-box MAP solvers (both exact and ap… ▽ More

    Submitted 25 November, 2015; v1 submitted 6 November, 2015; originally announced November 2015.

    Comments: 25 pages, 12 figures, To appear in Neural Information Processing Systems (NIPS) 2015, Corrected reference and cleaned up bibliography

  25. arXiv:1507.01784  [pdf, ps, other

    stat.ML cs.LG

    Rethinking LDA: moment matching for discrete ICA

    Authors: Anastasia Podosinnikova, Francis Bach, Simon Lacoste-Julien

    Abstract: We consider moment matching techniques for estimation in Latent Dirichlet Allocation (LDA). By drawing explicit links between LDA and discrete versions of independent component analysis (ICA), we first derive a new set of cumulant-based tensors, with an improved sample complexity. Moreover, we reuse standard ICA techniques such as joint diagonalization of tensors to improve over existing methods b… ▽ More

    Submitted 5 November, 2015; v1 submitted 7 July, 2015; originally announced July 2015.

    Comments: 30 pages; added plate diagrams and clarifications, changed style, corrected typos, updated figures. in Proceedings of the 29-th Conference on Neural Information Processing Systems (NIPS), 2015

  26. arXiv:1506.09215  [pdf, other

    cs.CV cs.LG

    Unsupervised Learning from Narrated Instruction Videos

    Authors: Jean-Baptiste Alayrac, Piotr Bojanowski, Nishant Agrawal, Josef Sivic, Ivan Laptev, Simon Lacoste-Julien

    Abstract: We address the problem of automatically learning the main steps to complete a certain task, such as changing a car tire, from a set of narrated instruction videos. The contributions of this paper are three-fold. First, we develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method solves two clustering pr… ▽ More

    Submitted 28 June, 2016; v1 submitted 30 June, 2015; originally announced June 2015.

    Comments: Appears in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). 21 pages

    ACM Class: I.5.1; I.5.4; I.2

  27. arXiv:1506.03662  [pdf, other

    cs.LG math.OC stat.ML

    Variance Reduced Stochastic Gradient Descent with Neighbors

    Authors: Thomas Hofmann, Aurelien Lucchi, Simon Lacoste-Julien, Brian McWilliams

    Abstract: Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness, achieving linear convergence. However, these methods are either based on computations of full gradients at pivot points, or on kee** per data point corrections in me… ▽ More

    Submitted 26 February, 2016; v1 submitted 11 June, 2015; originally announced June 2015.

    Comments: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 13 pages

    MSC Class: 90C06; 90C25; 68T05 ACM Class: G.1.6; I.2.6

  28. arXiv:1501.02056  [pdf, other

    stat.ML cs.LG

    Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering

    Authors: Simon Lacoste-Julien, Fredrik Lindsten, Francis Bach

    Abstract: Recently, the Frank-Wolfe optimization algorithm was suggested as a procedure to obtain adaptive quadrature rules for integrals of functions in a reproducing kernel Hilbert space (RKHS) with a potentially faster rate of convergence than Monte Carlo integration (and "kernel herding" was shown to be a special case of this procedure). In this paper, we propose to replace the random sampling step in a… ▽ More

    Submitted 10 February, 2015; v1 submitted 9 January, 2015; originally announced January 2015.

    Comments: in 18th International Conference on Artificial Intelligence and Statistics (AISTATS), May 2015, San Diego, United States. 38, JMLR Workshop and Conference Proceedings

  29. arXiv:1408.3304  [pdf, other

    cs.CV math.OC

    On Pairwise Costs for Network Flow Multi-Object Tracking

    Authors: Visesh Chari, Simon Lacoste-Julien, Ivan Laptev, Josef Sivic

    Abstract: Multi-object tracking has been recently approached with the min-cost network flow optimization techniques. Such methods simultaneously resolve multiple object tracks in a video and enable modeling of dependencies among tracks. Min-cost network flow methods also fit well within the "tracking-by-detection" paradigm where object trajectories are obtained by connecting per-frame outputs of an object d… ▽ More

    Submitted 5 May, 2015; v1 submitted 14 August, 2014; originally announced August 2014.

    Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5537-5545

  30. arXiv:1407.0202  [pdf, other

    cs.LG math.OC stat.ML

    SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

    Authors: Aaron Defazio, Francis Bach, Simon Lacoste-Julien

    Abstract: In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates. SAGA improves on the theory behind SAG and SVRG, with better theoretical convergence rates, and has support for composite objectives where a proximal operator is used on the regulariser. Unlike SDCA… ▽ More

    Submitted 16 December, 2014; v1 submitted 1 July, 2014; originally announced July 2014.

    Comments: Advances In Neural Information Processing Systems, Nov 2014, Montreal, Canada

  31. arXiv:1312.7864  [pdf, ps, other

    math.OC

    An Affine Invariant Linear Convergence Analysis for Frank-Wolfe Algorithms

    Authors: Simon Lacoste-Julien, Martin Jaggi

    Abstract: We study the linear convergence of variants of the Frank-Wolfe algorithms for some classes of strongly convex problems, using only affine-invariant quantities. As in Guelat & Marcotte (1986), we show the linear convergence of the standard Frank-Wolfe algorithm when the solution is in the interior of the domain, but with affine invariant constants. We also show the linear convergence of the away-st… ▽ More

    Submitted 3 January, 2014; v1 submitted 30 December, 2013; originally announced December 2013.

    Comments: appeared at the NIPS 2013 Workshop on Greedy Algorithms, Frank-Wolfe and Friends (v2: added acknowledgements)

    MSC Class: 90C25 ACM Class: G.1.6

    Journal ref: NIPS 2013 Workshop on Greedy Algorithms, Frank-Wolfe and Friends

  32. arXiv:1212.2002  [pdf, other

    cs.LG math.OC stat.ML

    A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method

    Authors: Simon Lacoste-Julien, Mark Schmidt, Francis Bach

    Abstract: In this note, we present a new averaging technique for the projected stochastic subgradient method. By using a weighted average with a weight of t+1 for each iterate w_t at iteration t, we obtain the convergence rate of O(1/t) with both an easy proof and an easy implementation. The new scheme is compared empirically to existing techniques, with similar performance behavior.

    Submitted 20 December, 2012; v1 submitted 10 December, 2012; originally announced December 2012.

    Comments: 8 pages, 6 figures. Changes with previous version: Added reference to concurrently submitted work arXiv:1212.1824v1; clarifications added; typos corrected; title changed to 'subgradient method' as 'subgradient descent' is misnomer

    MSC Class: 90C15; 68T05; 65K10 ACM Class: G.1.6; I.2.6

  33. arXiv:1207.4747  [pdf, other

    cs.LG math.OC stat.ML

    Block-Coordinate Frank-Wolfe Optimization for Structural SVMs

    Authors: Simon Lacoste-Julien, Martin Jaggi, Mark Schmidt, Patrick Pletscher

    Abstract: We propose a randomized block-coordinate variant of the classic Frank-Wolfe algorithm for convex optimization with block-separable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full Frank-Wolfe algorithm. We also show that, when applied to the dual structural support vector machine (SVM) objective, this yields an online alg… ▽ More

    Submitted 14 January, 2013; v1 submitted 19 July, 2012; originally announced July 2012.

    Comments: Appears in Proceedings of the 30th International Conference on Machine Learning (ICML 2013). 9 pages main text + 22 pages appendix. Changes from v3 to v4: 1) Re-organized appendix; improved & clarified duality gap proofs; re-drew all plots; 2) Changed convention for Cf definition; 3) Added weighted averaging experiments + convergence results; 4) Clarified main text and relationship with appendix

    MSC Class: 90C52; 90C90; 90C06; 68T05 ACM Class: G.1.6; I.2.6

  34. arXiv:1207.4525  [pdf, other

    cs.AI cs.DB cs.IR

    SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases

    Authors: Simon Lacoste-Julien, Konstantina Palla, Alex Davies, Gjergji Kasneci, Thore Graepel, Zoubin Ghahramani

    Abstract: The Internet has enabled the creation of a growing number of large-scale knowledge bases in a variety of domains containing complementary information. Tools for automatically aligning these knowledge bases would make it possible to unify many sources of structured knowledge and answer complex queries. However, the efficient alignment of large-scale knowledge bases still poses a considerable challe… ▽ More

    Submitted 18 July, 2012; originally announced July 2012.

    Comments: 10 pages + 2 pages appendix; 5 figures -- initial preprint

    ACM Class: I.2.4; H.3.4; D.2.12

  35. arXiv:1203.4523  [pdf, ps, other

    cs.LG math.OC stat.ML

    On the Equivalence between Herding and Conditional Gradient Algorithms

    Authors: Francis Bach, Simon Lacoste-Julien, Guillaume Obozinski

    Abstract: We show that the herding procedure of Welling (2009) takes exactly the form of a standard convex optimization algorithm--namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider faster alternatives for the task of approximating integrals in a reproducing kernel Hilbert space. We st… ▽ More

    Submitted 11 September, 2012; v1 submitted 20 March, 2012; originally announced March 2012.

    Journal ref: ICML 2012 International Conference on Machine Learning, Edimburgh : Royaume-Uni (2012)

  36. arXiv:1111.6832  [pdf, other

    stat.ML

    Gaussian Probabilities and Expectation Propagation

    Authors: John P. Cunningham, Philipp Hennig, Simon Lacoste-Julien

    Abstract: While Gaussian probability densities are omnipresent in applied mathematics, Gaussian cumulative probabilities are hard to calculate in any but the univariate case. We study the utility of Expectation Propagation (EP) as an approximate integration method for this problem. For rectangular integration regions, the approximation is highly accurate. We also extend the derivations to the more general c… ▽ More

    Submitted 28 November, 2013; v1 submitted 29 November, 2011; originally announced November 2011.

  37. arXiv:1103.1761  [pdf, other

    stat.ML stat.ME

    A Kernel Approach to Tractable Bayesian Nonparametrics

    Authors: Ferenc Huszár, Simon Lacoste-Julien

    Abstract: Inference in popular nonparametric Bayesian models typically relies on sampling or other approximations. This paper presents a general methodology for constructing novel tractable nonparametric Bayesian methods by applying the kernel trick to inference in a parametric Bayesian model. For example, Gaussian process regression can be derived this way from Bayesian linear regression. Despite the succe… ▽ More

    Submitted 12 August, 2011; v1 submitted 9 March, 2011; originally announced March 2011.

    Comments: acknowledgements added to previous version, content otherwise unchanged