Skip to main content

Showing 1–48 of 48 results for author: Smola, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2302.03020  [pdf, other

    cs.LG cs.CV stat.ML

    RLSbench: Domain Adaptation Under Relaxed Label Shift

    Authors: Saurabh Garg, Nick Erickson, James Sharpnack, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton

    Abstract: Despite the emergence of principled methods for domain adaptation under label shift, their sensitivity to shifts in class conditional distributions is precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with label proportions shifts. While several papers modify these heuristics in attempts to handle label proportions shifts, inconsistencies i… ▽ More

    Submitted 5 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023. Paper website: https://sites.google.com/view/rlsbench/

  2. arXiv:2111.02705  [pdf, other

    cs.LG cs.CL stat.ML

    Benchmarking Multimodal AutoML for Tabular Data with Text Fields

    Authors: Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, Alexander J. Smola

    Abstract: We consider the use of automated supervised learning systems for data tables that not only contain numeric/categorical columns, but one or more text fields as well. Here we assemble 18 multimodal data tables that each contain some text fields and stem from a real business application. Our publicly-available benchmark enables researchers to comprehensively evaluate their own methods for supervised… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: Proceedings of the Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks 2021

  3. arXiv:2111.00980  [pdf, other

    cs.LG stat.ML

    Mixture Proportion Estimation and PU Learning: A Modern Approach

    Authors: Saurabh Garg, Yifan Wu, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton

    Abstract: Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, lea… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

    Comments: Spotlight at NeurIPS 2021

  4. arXiv:2103.00083  [pdf, other

    stat.ML cs.LG

    Flexible Model Aggregation for Quantile Regression

    Authors: Rasool Fakoor, Taesup Kim, Jonas Mueller, Alexander J. Smola, Ryan J. Tibshirani

    Abstract: Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for… ▽ More

    Submitted 15 April, 2023; v1 submitted 26 February, 2021; originally announced March 2021.

    Comments: Accepted at JMLR 2023

  5. arXiv:2102.09225  [pdf, other

    cs.LG stat.ML

    Continuous Doubly Constrained Batch Reinforcement Learning

    Authors: Rasool Fakoor, Jonas Mueller, Kavosh Asadi, Pratik Chaudhari, Alexander J. Smola

    Abstract: Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produc… ▽ More

    Submitted 6 December, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021 conference paper

  6. arXiv:2008.02641  [pdf, other

    cs.LG cs.IT stat.ME stat.ML

    Bloom Origami Assays: Practical Group Testing

    Authors: Louis Abraham, Gary Becigneul, Benjamin Coleman, Bernhard Scholkopf, Anshumali Shrivastava, Alexander Smola

    Abstract: We study the problem usually referred to as group testing in the context of COVID-19. Given n samples collected from patients, how should we select and test mixtures of samples to maximize information and minimize the number of tests? Group testing is a well-studied problem with several appealing solutions, but recent biological studies impose practical constraints for COVID-19 that are incompatib… ▽ More

    Submitted 21 July, 2020; originally announced August 2020.

    Comments: arXiv admin note: text overlap with arXiv:2005.06413

  7. arXiv:2006.15199  [pdf, other

    cs.LG stat.ML

    DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning

    Authors: Rasool Fakoor, Pratik Chaudhari, Alexander J. Smola

    Abstract: This paper prescribes a suite of techniques for off-policy Reinforcement Learning (RL) that simplify the training process and reduce the sample complexity. First, we show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. This is contrast to existing literature which creates sophisticated off-policy techniques. Second, we pinpoint trai… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  8. arXiv:2006.14284  [pdf, other

    cs.LG stat.ML

    Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation

    Authors: Rasool Fakoor, Jonas Mueller, Nick Erickson, Pratik Chaudhari, Alexander J. Smola

    Abstract: Automated machine learning (AutoML) can produce complex model ensembles by stacking, bagging, and boosting many individual models like trees, deep networks, and nearest neighbor estimators. While highly accurate, the resulting predictors are large, slow, and opaque as compared to their constituents. To improve the deployment of AutoML on tabular data, we propose FAST-DAD to distill arbitrarily com… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

    Journal ref: NeurIPS 2020

  9. arXiv:2004.02441  [pdf, other

    cs.LG stat.ML

    TraDE: Transformers for Density Estimation

    Authors: Rasool Fakoor, Pratik Chaudhari, Jonas Mueller, Alexander J. Smola

    Abstract: We present TraDE, a self-attention-based architecture for auto-regressive density estimation with continuous and discrete valued data. Our model is trained using a penalized maximum likelihood objective, which ensures that samples from the density estimate resemble the training data distribution. The use of self-attention means that the model need not retain conditional sufficient statistics durin… ▽ More

    Submitted 14 October, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  10. arXiv:2003.06505  [pdf, other

    stat.ML cs.LG

    AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

    Authors: Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, Alexander Smola

    Abstract: We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models on an unprocessed tabular dataset such as a CSV file. Unlike existing AutoML frameworks that primarily focus on model/hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers. Exper… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

  11. arXiv:1910.00125  [pdf, other

    cs.LG stat.ML

    Meta-Q-Learning

    Authors: Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola

    Abstract: This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state-of-the-art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, a multi-task objective to maximize the average reward across the tr… ▽ More

    Submitted 4 April, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

    Comments: ICLR 2020 conference paper

  12. arXiv:1909.04844  [pdf, other

    cs.LG cs.DB stat.ML

    Recognizing Variables from their Data via Deep Embeddings of Distributions

    Authors: Jonas Mueller, Alex Smola

    Abstract: A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be more robustly addressed by leveraging the data values themselves rather than just relying on their arbitrarily selected variable names. Here, we present a comp… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

    Comments: IEEE International Conference on Data Mining (ICDM), 2019

  13. arXiv:1905.12417  [pdf, other

    stat.ML cs.LG

    Deep Factors for Forecasting

    Authors: Yuyang Wang, Alex Smola, Danielle C. Maddix, Jan Gasthaus, Dean Foster, Tim Januschowski

    Abstract: Producing probabilistic forecasts for large collections of similar and/or dependent time series is a practically relevant and challenging task. Classical time series models fail to capture complex patterns in the data, and multivariate techniques struggle to scale to large problem sizes. Their reliance on strong structural assumptions makes them data-efficient, and allows them to provide uncertain… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

    Comments: http://proceedings.mlr.press/v97/wang19k/wang19k.pdf. arXiv admin note: substantial text overlap with arXiv:1812.00098

    Journal ref: Proceedings of Machine Learning Research, Volume 97: International Conference on Machine Learning, 2019

  14. arXiv:1905.01756  [pdf, other

    cs.LG stat.ML

    P3O: Policy-on Policy-off Policy Optimization

    Authors: Rasool Fakoor, Pratik Chaudhari, Alexander J. Smola

    Abstract: On-policy reinforcement learning (RL) algorithms have high sample complexity while off-policy algorithms are difficult to tune. Merging the two holds the promise to develop efficient algorithms that generalize across diverse environments. It is however challenging in practice to find suitable hyper-parameters that govern this trade off. This paper develops a simple algorithm named P3O that interle… ▽ More

    Submitted 15 July, 2019; v1 submitted 5 May, 2019; originally announced May 2019.

    Comments: UAI 2019 conference paper. Code: https://github.com/rasoolfa/P3O

  15. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  16. arXiv:1812.00098  [pdf, other

    stat.ML cs.LG

    Deep Factors with Gaussian Processes for Forecasting

    Authors: Danielle C. Maddix, Yuyang Wang, Alex Smola

    Abstract: A large collection of time series poses significant challenges for classical and neural forecasting approaches. Classical time series models fail to fit data well and to scale to large problems, but succeed at providing uncertainty estimates. The converse is true for deep neural networks. In this paper, we propose a hybrid model that incorporates the benefits of both approaches. Our new method is… ▽ More

    Submitted 30 November, 2018; originally announced December 2018.

    Comments: Third workshop on Bayesian Deep Learning (NeurIPS 2018), Montreal, Canada

  17. arXiv:1806.01235  [pdf, other

    cs.LG cs.AI stat.ML

    Deep Graphs

    Authors: Emmanouil Antonios Platanios, Alex Smola

    Abstract: We propose an algorithm for deep learning on networks and graphs. It relies on the notion that many graph algorithms, such as PageRank, Weisfeiler-Lehman, or Message Passing can be expressed as iterative vertex updates. Unlike previous methods which rely on the ingenuity of the designer, Deep Graphs are adaptive to the estimation problem. Training and deployment are both efficient, since the cost… ▽ More

    Submitted 4 June, 2018; originally announced June 2018.

  18. arXiv:1802.03916  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Detecting and Correcting for Label Shift with Black Box Predictors

    Authors: Zachary C. Lipton, Yu-Xiang Wang, Alex Smola

    Abstract: Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets) cause symptoms (observations), we focus on label shift, where the label marginal $p(y)$ changes but the conditional $p(x| y)$ does not. We propose Black Box Shift Estimation (BBSE) to… ▽ More

    Submitted 26 July, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: Published at the International Conference on Machine Learning (ICML) 2018

  19. arXiv:1711.11179  [pdf, other

    cs.LG stat.ML

    State Space LSTM Models with Particle MCMC Inference

    Authors: Xun Zheng, Manzil Zaheer, Amr Ahmed, Yuan Wang, Eric P Xing, Alexander J Smola

    Abstract: Long Short-Term Memory (LSTM) is one of the most powerful sequence models. Despite the strong performance, however, it lacks the nice interpretability as in state space models. In this paper, we present a way to combine the best of both worlds by introducing State Space LSTM (SSL) models that generalizes the earlier work \cite{zaheer2017latent} of combining topic models with LSTM. However, unlike… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

  20. arXiv:1704.00003  [pdf, other

    cs.LG stat.ML

    Spectral Methods for Nonparametric Models

    Authors: Hsiao-Yu Fish Tung, Chao-Yuan Wu, Manzil Zaheer, Alexander J. Smola

    Abstract: Nonparametric models are versatile, albeit computationally expensive, tool for modeling mixture models. In this paper, we introduce spectral methods for the two most popular nonparametric models: the Indian Buffet Process (IBP) and the Hierarchical Dirichlet Process (HDP). We show that using spectral methods for the inference of nonparametric models are computationally and statistically efficient.… ▽ More

    Submitted 30 March, 2017; originally announced April 2017.

    Comments: Keywords: Spectral Methods, Indian Buffet Process, Hierarchical Dirichlet Process

  21. arXiv:1703.06114  [pdf, other

    cs.LG stat.ML

    Deep Sets

    Authors: Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, Alexander Smola

    Abstract: We study the problem of designing models for machine learning tasks defined on \emph{sets}. In contrast to traditional approach of operating on fixed dimensional vectors, we consider objective functions defined on sets that are invariant to permutations. Such problems are widespread, ranging from estimation of population statistics \cite{poczos13aistats}, to anomaly detection in piezometer data of… ▽ More

    Submitted 14 April, 2018; v1 submitted 10 March, 2017; originally announced March 2017.

    Comments: NIPS 2017

  22. arXiv:1702.08159  [pdf, other

    cs.LG stat.ML

    McKernel: A Library for Approximate Kernel Expansions in Log-linear Time

    Authors: J. D. Curtó, I. C. Zarza, Feng Yang, Alex Smola, Fernando de la Torre, Chong Wah Ngo, Luc van Gool

    Abstract: McKernel introduces a framework to use kernel approximates in the mini-batch setting with Stochastic Gradient Descent (SGD) as an alternative to Deep Learning. Based on Random Kitchen Sinks [Rahimi and Recht 2007], we provide a C++ library for Large-scale Machine Learning. It contains a CPU optimized implementation of the algorithm in [Le et al. 2013], that allows the computation of approximated k… ▽ More

    Submitted 17 April, 2020; v1 submitted 27 February, 2017; originally announced February 2017.

  23. arXiv:1611.06843  [pdf, other

    stat.AP

    Joint Hacking and Latent Hazard Rate Estimation

    Authors: Ziqi Liu, Alexander J. Smola, Kyle Soska, Yu-Xiang Wang, Qinghua Zheng

    Abstract: In this paper we describe an algorithm for predicting the websites at risk in a long range hacking activity, while jointly inferring the provenance and evolution of vulnerabilities on websites over continuous time. Specifically, we use hazard regression with a time-varying additive hazard function parameterized in a generalized linear form. The activation coefficients on each feature are continuou… ▽ More

    Submitted 21 November, 2016; originally announced November 2016.

    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

  24. arXiv:1611.04488  [pdf, other

    stat.ML cs.AI cs.LG cs.NE stat.ME

    Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy

    Authors: Danica J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton

    Abstract: We propose a method to optimize the representation and distinguishability of samples from two probability distributions, by maximizing the estimated power of a statistical test based on the maximum mean discrepancy (MMD). This optimized MMD is applied to the setting of unsupervised learning by generative adversarial networks (GAN), in which a model attempts to generate realistic samples, and a dis… ▽ More

    Submitted 14 January, 2021; v1 submitted 14 November, 2016; originally announced November 2016.

    Comments: Published at ICLR 2017 (public comments: http://openreview.net/forum?id=HJWHIKqgl )

  25. arXiv:1611.03021  [pdf, other

    cs.LG cs.CR stat.AP

    Attributing Hacks

    Authors: Ziqi Liu, Alexander J. Smola, Kyle Soska, Yu-Xiang Wang, Qinghua Zheng, Jun Zhou

    Abstract: In this paper we describe an algorithm for estimating the provenance of hacks on websites. That is, given properties of sites and the temporal occurrence of attacks, we are able to attribute individual attacks to joint causes and vulnerabilities, as well as estimating the evolution of these vulnerabilities over time. Specifically, we use hazard regression with a time-varying additive hazard functi… ▽ More

    Submitted 14 August, 2017; v1 submitted 7 November, 2016; originally announced November 2016.

    Comments: Appeared at AISTATS'17. Full version under review at the Electronic Journal of Statistics

  26. arXiv:1608.06879  [pdf, other

    math.OC cs.LG stat.ML

    AIDE: Fast and Communication Efficient Distributed Optimization

    Authors: Sashank J. Reddi, Jakub Konečný, Peter Richtárik, Barnabás Póczós, Alex Smola

    Abstract: In this paper, we present two new communication-efficient methods for distributed minimization of an average of functions. The first algorithm is an inexact variant of the DANE algorithm that allows any local algorithm to return an approximate solution to a local subproblem. We show that such a strategy does not affect the theoretical guarantees of DANE significantly. In fact, our approach can be… ▽ More

    Submitted 24 August, 2016; originally announced August 2016.

  27. arXiv:1607.08254  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Frank-Wolfe Methods for Nonconvex Optimization

    Authors: Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

    Abstract: We study Frank-Wolfe methods for nonconvex stochastic and finite-sum optimization problems. Frank-Wolfe methods (in the convex case) have gained tremendous recent interest in machine learning and optimization communities due to their projection-free property and their ability to exploit structured constraints. However, our understanding of these algorithms in the nonconvex setting is fairly limite… ▽ More

    Submitted 29 July, 2016; v1 submitted 27 July, 2016; originally announced July 2016.

  28. arXiv:1605.06900  [pdf, other

    math.OC cs.LG stat.ML

    Fast Stochastic Methods for Nonsmooth Nonconvex Optimization

    Authors: Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

    Abstract: We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tackle… ▽ More

    Submitted 23 May, 2016; originally announced May 2016.

  29. arXiv:1603.06160  [pdf, other

    math.OC cs.LG cs.NE stat.ML

    Stochastic Variance Reduction for Nonconvex Optimization

    Authors: Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola

    Abstract: We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary po… ▽ More

    Submitted 4 April, 2016; v1 submitted 19 March, 2016; originally announced March 2016.

    Comments: Minor feedback changes

  30. arXiv:1603.06159  [pdf, other

    math.OC cs.LG stat.ML

    Fast Incremental Method for Nonconvex Optimization

    Authors: Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

    Abstract: We analyze a fast incremental aggregated gradient method for optimizing nonconvex problems of the form $\min_x \sum_i f_i(x)$. Specifically, we analyze the SAGA algorithm within an Incremental First-order Oracle framework, and show that it converges to a stationary point provably faster than both gradient descent and stochastic gradient descent. We also discuss a Polyak's special class of nonconve… ▽ More

    Submitted 19 March, 2016; originally announced March 2016.

  31. arXiv:1512.04848  [pdf, other

    cs.LG cs.DS stat.ML

    Data Driven Resource Allocation for Distributed Learning

    Authors: Travis Dick, Mu Li, Venkata Krishna Pillutla, Colin White, Maria Florina Balcan, Alex Smola

    Abstract: In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be "locally simple but globally complex" (Vapnik & Bottou 1993), we propose data dependent dispatching that takes advantage of such structure. We present… ▽ More

    Submitted 15 December, 2016; v1 submitted 15 December, 2015; originally announced December 2015.

  32. arXiv:1512.01845  [pdf, other

    cs.LG stat.ML

    Explaining reviews and ratings with PACO: Poisson Additive Co-Clustering

    Authors: Chao-Yuan Wu, Alex Beutel, Amr Ahmed, Alexander J. Smola

    Abstract: Understanding a user's motivations provides valuable information beyond the ability to recommend items. Quite often this can be accomplished by perusing both ratings and review texts, since it is the latter where the reasoning for specific preferences is explicitly expressed. Unfortunately matrix factorization approaches to recommendation result in large, complex models that are difficult to int… ▽ More

    Submitted 6 December, 2015; originally announced December 2015.

  33. arXiv:1508.05003  [pdf, other

    stat.ML cs.LG math.OC

    AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization

    Authors: Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola

    Abstract: We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients. We discuss, analyze, and experiment with a setup motivated by the behavior of real-world distributed computation networks, where the machines are differently slow at different time. Therefore, we allow the parame… ▽ More

    Submitted 20 August, 2015; originally announced August 2015.

    Comments: 19 pages

  34. arXiv:1506.06840  [pdf, other

    cs.LG stat.ML

    On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

    Authors: Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabás Póczos, Alex Smola

    Abstract: We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have been shown to outperform SGD, both theoretically and empirically. However, asynchronous versions of these algorithms---a crucial requirement for modern large-scale… ▽ More

    Submitted 24 January, 2016; v1 submitted 22 June, 2015; originally announced June 2015.

  35. arXiv:1506.04448  [pdf, other

    stat.ML cs.LG

    Fast and Guaranteed Tensor Decomposition via Sketching

    Authors: Yining Wang, Hsiao-Yu Tung, Alexander Smola, Animashree Anandkumar

    Abstract: Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in statistical learning of latent variable models and in data mining. In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching. We build on the idea of count sketches, but introduce many novel ideas which are unique to tensors. We develop novel methods for randomized computation of tensor c… ▽ More

    Submitted 20 October, 2015; v1 submitted 14 June, 2015; originally announced June 2015.

    Comments: 29 pages. Appeared in Proceedings of Advances in Neural Information Processing Systems (NIPS), held at Montreal, Canada in 2015

  36. arXiv:1502.07645  [pdf, other

    stat.ML cs.LG

    Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo

    Authors: Yu-Xiang Wang, Stephen E. Fienberg, Alex Smola

    Abstract: We consider the problem of Bayesian learning on sensitive datasets and present two simple but somewhat surprising results that connect Bayesian learning to "differential privacy:, a cryptographic approach to protect individual-level privacy while permiting database-level utility. Specifically, we show that that under standard assumptions, getting one single sample from a posterior distribution is… ▽ More

    Submitted 11 April, 2015; v1 submitted 26 February, 2015; originally announced February 2015.

  37. arXiv:1501.00199  [pdf, other

    cs.LG stat.ML

    ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly

    Authors: Alex Beutel, Amr Ahmed, Alexander J. Smola

    Abstract: Matrix completion and approximation are popular tools to capture a user's preferences for recommendation and to approximate missing data. Instead of using low-rank factorization we take a drastically different approach, based on the simple insight that an additive model of co-clusterings allows one to approximate matrices efficiently. This allows us to build a concise model that, per bit of model… ▽ More

    Submitted 31 December, 2014; originally announced January 2015.

    Comments: 22 pages, under review for conference publication

    ACM Class: H.2.8; H.3.3; I.2.6

  38. arXiv:1412.7149  [pdf, other

    cs.LG cs.NE stat.ML

    Deep Fried Convnets

    Authors: Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, Ziyu Wang

    Abstract: The fully connected layers of a deep convolutional neural network typically contain over 90% of the network parameters, and consume the majority of the memory required to store the network parameters. Reducing the number of parameters while preserving essentially the same predictive performance is critically important for operating deep neural networks in memory constrained environments such as GP… ▽ More

    Submitted 17 July, 2015; v1 submitted 22 December, 2014; originally announced December 2014.

    Comments: svd experiments included

  39. arXiv:1412.6493  [pdf, other

    cs.LG stat.ML

    A la Carte - Learning Fast Kernels

    Authors: Zichao Yang, Alexander J. Smola, Le Song, Andrew Gordon Wilson

    Abstract: Kernel methods have great promise for learning rich statistical representations of large modern datasets. However, compared to neural networks, kernel methods have been perceived as lacking in scalability and flexibility. We introduce a family of fast, flexible, lightly parametrized and general purpose kernel learning methods, derived from Fastfood basis function expansions. We provide mechanisms… ▽ More

    Submitted 19 December, 2014; originally announced December 2014.

  40. arXiv:1410.7690  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    Trend Filtering on Graphs

    Authors: Yu-Xiang Wang, James Sharpnack, Alex Smola, Ryan J. Tibshirani

    Abstract: We introduce a family of adaptive estimators on graphs, based on penalizing the $\ell_1$ norm of discrete graph differences. This generalizes the idea of trend filtering [Kim et al. (2009), Tibshirani (2014)], used for univariate nonparametric regression, to graphs. Analogous to the univariate case, graph trend filtering exhibits a level of local adaptivity unmatched by the usual $\ell_2$-based gr… ▽ More

    Submitted 4 June, 2016; v1 submitted 28 October, 2014; originally announced October 2014.

    Comments: A short version appeared in AISTATS'2015

    MSC Class: 62G05

    Journal ref: Journal of Machine Learning Research Volume (2016) Volume 17 Article 15-147

  41. arXiv:1408.3060  [pdf, other

    cs.LG stat.ML

    Fastfood: Approximate Kernel Expansions in Loglinear Time

    Authors: Quoc Viet Le, Tamas Sarlos, Alexander Johannes Smola

    Abstract: Despite their successes, what makes kernel methods difficult to use in many large scale problems is the fact that storing and computing the decision function is typically expensive, especially at prediction time. In this paper, we overcome this difficulty by proposing Fastfood, an approximation that accelerates such computation significantly. Key to Fastfood is the observation that Hadamard matric… ▽ More

    Submitted 13 August, 2014; originally announced August 2014.

  42. arXiv:1405.0558  [pdf, other

    stat.ML

    The Falling Factorial Basis and Its Statistical Applications

    Authors: Yu-Xiang Wang, Alex Smola, Ryan J. Tibshirani

    Abstract: We study a novel spline-like basis, which we name the "falling factorial basis", bearing many similarities to the classic truncated power basis. The advantage of the falling factorial basis is that it enables rapid, linear-time computations in basis matrix multiplication and basis matrix inversion. The falling factorial functions are not actually splines, but are close enough to splines that they… ▽ More

    Submitted 27 October, 2014; v1 submitted 3 May, 2014; originally announced May 2014.

    Comments: Full version for the ICML paper with the same title

  43. arXiv:1402.0119  [pdf, other

    stat.ML cs.LG

    Randomized Nonlinear Component Analysis

    Authors: David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schölkopf

    Abstract: Classical methods such as Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are ubiquitous in statistics. However, these techniques are only able to reveal linear relationships in data. Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale. In a separate strand of recent research, randomized methods have… ▽ More

    Submitted 13 May, 2014; v1 submitted 1 February, 2014; originally announced February 2014.

    Comments: Appearing in ICML 2014

  44. arXiv:1207.4131  [pdf

    cs.LG stat.ML

    Exponential Families for Conditional Random Fields

    Authors: Yasemin Altun, Alex Smola, Thomas Hofmann

    Abstract: In this paper we de ne conditional random elds in reproducing kernel Hilbert spaces and show connections to Gaussian Process classi cation. More speci cally, we prove decomposition results for undirected graphical models and we give constructions for kernels. Finally we present e cient means of solving the optimization problem using reduced rank decompositions and we show how stationarity can be e… ▽ More

    Submitted 11 July, 2012; originally announced July 2012.

    Comments: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)

    Report number: UAI-P-2004-PG-2-9

  45. arXiv:1206.6457  [pdf

    cs.LG stat.ML

    Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations

    Authors: Nando de Freitas, Alex Smola, Masrour Zoghi

    Abstract: This paper analyzes the problem of Gaussian process (GP) bandits with deterministic observations. The analysis uses a branch and bound algorithm that is related to the UCB algorithm of (Srinivas et al, 2010). For GPs with Gaussian observation noise, with variance strictly greater than zero, Srinivas et al proved that the regret vanishes at the approximate rate of $O(1/\sqrt{t})$, where t is the nu… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012). arXiv admin note: substantial text overlap with arXiv:1203.2177

  46. arXiv:1203.3472  [pdf

    cs.LG stat.ML

    Super-Samples from Kernel Herding

    Authors: Yutian Chen, Max Welling, Alex Smola

    Abstract: We extend the herding algorithm to continuous spaces by using the kernel trick. The resulting "kernel herding" algorithm is an infinite memory deterministic process that learns to approximate a PDF with a collection of samples. We show that kernel herding decreases the error of expectations of functions in the Hilbert space at a rate O(1/T) which is much faster than the usual O(1/pT) for iid rando… ▽ More

    Submitted 15 March, 2012; originally announced March 2012.

    Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

    Report number: UAI-P-2010-PG-109-116

  47. arXiv:1203.2177  [pdf, other

    cs.LG stat.ML

    Regret Bounds for Deterministic Gaussian Process Bandits

    Authors: Nando de Freitas, Alex Smola, Masrour Zoghi

    Abstract: This paper analyses the problem of Gaussian process (GP) bandits with deterministic observations. The analysis uses a branch and bound algorithm that is related to the UCB algorithm of (Srinivas et al., 2010). For GPs with Gaussian observation noise, with variance strictly greater than zero, (Srinivas et al., 2010) proved that the regret vanishes at the approximate rate of $O(\frac{1}{\sqrt{t}})$,… ▽ More

    Submitted 9 March, 2012; originally announced March 2012.

    Comments: 17 pages, 5 figures

  48. arXiv:0911.0491  [pdf, ps, other

    math.OC stat.ML

    Slow Learners are Fast

    Authors: John Langford, Alexander Smola, Martin Zinkevich

    Abstract: Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online lear… ▽ More

    Submitted 3 November, 2009; originally announced November 2009.

    Comments: Extended version of conference paper - NIPS 2009

    MSC Class: 49M30; 80M50