Skip to main content

Showing 1–34 of 34 results for author: Smola, A J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2210.01422  [pdf, other

    cs.LG

    Time-Varying Propensity Score to Bridge the Gap between the Past and Present

    Authors: Rasool Fakoor, Jonas Mueller, Zachary C. Lipton, Pratik Chaudhari, Alexander J. Smola

    Abstract: Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the… ▽ More

    Submitted 2 May, 2024; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Published at ICLR 2024

  2. arXiv:2112.05848  [pdf, other

    cs.LG cs.AI

    Faster Deep Reinforcement Learning with Slower Online Network

    Authors: Kavosh Asadi, Rasool Fakoor, Omer Gottesman, Taesup Kim, Michael L. Littman, Alexander J. Smola

    Abstract: Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrap**. In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with u… ▽ More

    Submitted 17 April, 2023; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: Published at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  3. arXiv:2111.02705  [pdf, other

    cs.LG cs.CL stat.ML

    Benchmarking Multimodal AutoML for Tabular Data with Text Fields

    Authors: Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, Alexander J. Smola

    Abstract: We consider the use of automated supervised learning systems for data tables that not only contain numeric/categorical columns, but one or more text fields as well. Here we assemble 18 multimodal data tables that each contain some text fields and stem from a real business application. Our publicly-available benchmark enables researchers to comprehensively evaluate their own methods for supervised… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: Proceedings of the Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks 2021

  4. arXiv:2110.13878  [pdf, other

    cs.LG

    Deep Explicit Duration Switching Models for Time Series

    Authors: Abdul Fatir Ansari, Konstantinos Benidis, Richard Kurle, Ali Caner Turkmen, Harold Soh, Alexander J. Smola, Yuyang Wang, Tim Januschowski

    Abstract: Many complex time series can be effectively subdivided into distinct regimes that exhibit persistent dynamics. Discovering the switching behavior and the statistical patterns in these regimes is important for understanding the underlying dynamical system. We propose the Recurrent Explicit Duration Switching Dynamical System (RED-SDS), a flexible model that is capable of identifying both state- and… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted at NeurIPS 2021

  5. arXiv:2106.11342  [pdf

    cs.LG cs.AI cs.CL cs.CV

    Dive into Deep Learning

    Authors: Aston Zhang, Zachary C. Lipton, Mu Li, Alexander J. Smola

    Abstract: This open-source book represents our attempt to make deep learning approachable, teaching readers the concepts, the context, and the code. The entire book is drafted in Jupyter notebooks, seamlessly integrating exposition figures, math, and interactive examples with self-contained code. Our goal is to offer a resource that could (i) be freely available for everyone; (ii) offer sufficient technical… ▽ More

    Submitted 22 August, 2023; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: (HTML) https://D2L.ai (GitHub) https://github.com/d2l-ai/d2l-en/

  6. arXiv:2103.09944  [pdf, other

    cs.IR cs.LG

    IRLI: Iterative Re-partitioning for Learning to Index

    Authors: Gaurav Gupta, Tharun Medini, Anshumali Shrivastava, Alexander J Smola

    Abstract: Neural models have transformed the fundamental information retrieval problem of map** a query to a giant set of items. However, the need for efficient and low latency inference forces the community to reconsider efficient approximate near-neighbor search in the item space. To this end, learning to index is gaining much interest in recent times. Methods have to trade between obtaining high accura… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Comments: 12 pages

  7. arXiv:2103.00083  [pdf, other

    stat.ML cs.LG

    Flexible Model Aggregation for Quantile Regression

    Authors: Rasool Fakoor, Taesup Kim, Jonas Mueller, Alexander J. Smola, Ryan J. Tibshirani

    Abstract: Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for… ▽ More

    Submitted 15 April, 2023; v1 submitted 26 February, 2021; originally announced March 2021.

    Comments: Accepted at JMLR 2023

  8. arXiv:2102.09225  [pdf, other

    cs.LG stat.ML

    Continuous Doubly Constrained Batch Reinforcement Learning

    Authors: Rasool Fakoor, Jonas Mueller, Kavosh Asadi, Pratik Chaudhari, Alexander J. Smola

    Abstract: Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produc… ▽ More

    Submitted 6 December, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021 conference paper

  9. arXiv:2011.12683  [pdf, other

    cs.IR

    GraphHINGE: Learning Interaction Models of Structured Neighborhood on Heterogeneous Information Network

    Authors: Jiarui **, Kounianhua Du, Weinan Zhang, Jiarui Qin, Yuchen Fang, Yong Yu, Zheng Zhang, Alexander J. Smola

    Abstract: Heterogeneous information network (HIN) has been widely used to characterize entities of various types and their complex relations. Recent attempts either rely on explicit path reachability to leverage path-based semantic relatedness or graph neighborhood to learn heterogeneous network representations before predictions. These weakly coupled manners overlook the rich interactions among neighbor no… ▽ More

    Submitted 30 June, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

    Comments: TOIS (Special Issue on Graph Technologies for User Modeling and Recommendation). arXiv admin note: text overlap with arXiv:2007.00216

  10. arXiv:2007.00216  [pdf, other

    cs.IR

    An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph

    Authors: Jiarui **, Jiarui Qin, Yuchen Fang, Kounianhua Du, Weinan Zhang, Yong Yu, Zheng Zhang, Alexander J. Smola

    Abstract: There is an influx of heterogeneous information network (HIN) based recommender systems in recent years since HIN is capable of characterizing complex graphs and contains rich semantics. Although the existing approaches have achieved performance improvement, while practical, they still face the following problems. On one hand, most existing HIN-based methods rely on explicit path reachability to l… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: KDD 2020

  11. arXiv:2006.15199  [pdf, other

    cs.LG stat.ML

    DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning

    Authors: Rasool Fakoor, Pratik Chaudhari, Alexander J. Smola

    Abstract: This paper prescribes a suite of techniques for off-policy Reinforcement Learning (RL) that simplify the training process and reduce the sample complexity. First, we show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. This is contrast to existing literature which creates sophisticated off-policy techniques. Second, we pinpoint trai… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  12. arXiv:2006.14284  [pdf, other

    cs.LG stat.ML

    Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation

    Authors: Rasool Fakoor, Jonas Mueller, Nick Erickson, Pratik Chaudhari, Alexander J. Smola

    Abstract: Automated machine learning (AutoML) can produce complex model ensembles by stacking, bagging, and boosting many individual models like trees, deep networks, and nearest neighbor estimators. While highly accurate, the resulting predictors are large, slow, and opaque as compared to their constituents. To improve the deployment of AutoML on tabular data, we propose FAST-DAD to distill arbitrarily com… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

    Journal ref: NeurIPS 2020

  13. arXiv:2004.02441  [pdf, other

    cs.LG stat.ML

    TraDE: Transformers for Density Estimation

    Authors: Rasool Fakoor, Pratik Chaudhari, Jonas Mueller, Alexander J. Smola

    Abstract: We present TraDE, a self-attention-based architecture for auto-regressive density estimation with continuous and discrete valued data. Our model is trained using a penalized maximum likelihood objective, which ensures that samples from the density estimate resemble the training data distribution. The use of self-attention means that the model need not retain conditional sufficient statistics durin… ▽ More

    Submitted 14 October, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  14. arXiv:2002.06170  [pdf, other

    cs.CL cs.LG

    Transformer on a Diet

    Authors: Chenguang Wang, Zihao Ye, Aston Zhang, Zheng Zhang, Alexander J. Smola

    Abstract: Transformer has been widely used thanks to its ability to capture sequence information in an efficient way. However, recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness. In this paper, we explore three carefully-designed light Transformer architectures to figure out whether the Transformer with less computations could produce competitive resu… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: 6 pages, 2 tables, 1 figure

  15. arXiv:1910.00125  [pdf, other

    cs.LG stat.ML

    Meta-Q-Learning

    Authors: Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola

    Abstract: This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state-of-the-art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, a multi-task objective to maximize the average reward across the tr… ▽ More

    Submitted 4 April, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

    Comments: ICLR 2020 conference paper

  16. arXiv:1905.01756  [pdf, other

    cs.LG stat.ML

    P3O: Policy-on Policy-off Policy Optimization

    Authors: Rasool Fakoor, Pratik Chaudhari, Alexander J. Smola

    Abstract: On-policy reinforcement learning (RL) algorithms have high sample complexity while off-policy algorithms are difficult to tune. Merging the two holds the promise to develop efficient algorithms that generalize across diverse environments. It is however challenging in practice to find suitable hyper-parameters that govern this trade off. This paper develops a simple algorithm named P3O that interle… ▽ More

    Submitted 15 July, 2019; v1 submitted 5 May, 2019; originally announced May 2019.

    Comments: UAI 2019 conference paper. Code: https://github.com/rasoolfa/P3O

  17. arXiv:1904.09408  [pdf, other

    cs.CL cs.AI cs.LG

    Language Models with Transformers

    Authors: Chenguang Wang, Mu Li, Alexander J. Smola

    Abstract: The Transformer architecture is superior to RNN-based models in computational efficiency. Recently, GPT and BERT demonstrate the efficacy of Transformer models on various NLP tasks using pre-trained language models on large-scale corpora. Surprisingly, these Transformer architectures are suboptimal for language model itself. Neither self-attention nor the positional encoding in the Transformer is… ▽ More

    Submitted 17 October, 2019; v1 submitted 20 April, 2019; originally announced April 2019.

    Comments: 12 pages, 7 tables, 4 figures

  18. arXiv:1712.00636  [pdf, other

    cs.CV

    Compressed Video Action Recognition

    Authors: Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl

    Abstract: Training robust deep video representations has proven to be much more challenging than learning deep image representations. This is in part due to the enormous size of raw video streams and the high temporal redundancy; the true and interesting signal is often drowned in too much irrelevant data. Motivated by that the superfluous information can be reduced by up to two orders of magnitude by video… ▽ More

    Submitted 29 March, 2018; v1 submitted 2 December, 2017; originally announced December 2017.

    Comments: CVPR 2018 (Selected for spotlight presentation)

  19. arXiv:1711.11179  [pdf, other

    cs.LG stat.ML

    State Space LSTM Models with Particle MCMC Inference

    Authors: Xun Zheng, Manzil Zaheer, Amr Ahmed, Yuan Wang, Eric P Xing, Alexander J Smola

    Abstract: Long Short-Term Memory (LSTM) is one of the most powerful sequence models. Despite the strong performance, however, it lacks the nice interpretability as in state space models. In this paper, we present a way to combine the best of both worlds by introducing State Space LSTM (SSL) models that generalizes the earlier work \cite{zaheer2017latent} of combining topic models with LSTM. However, unlike… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

  20. arXiv:1709.04071  [pdf, other

    cs.LG cs.AI cs.CL

    Variational Reasoning for Question Answering with Knowledge Graph

    Authors: Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexander J. Smola, Le Song

    Abstract: Knowledge graph (KG) is known to be helpful for the task of question answering (QA), since it provides well-structured relational information between entities, and allows one to further infer indirect facts. However, it is challenging to build QA systems which can learn to reason over knowledge graphs based on question-answer pairs alone. First, when people ask questions, their expressions are noi… ▽ More

    Submitted 27 November, 2017; v1 submitted 12 September, 2017; originally announced September 2017.

  21. arXiv:1709.01434  [pdf, other

    cs.LG cs.AI

    A Generic Approach for Esca** Saddle points

    Authors: Sashank J Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, Alexander J Smola

    Abstract: A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them imp… ▽ More

    Submitted 5 September, 2017; originally announced September 2017.

  22. arXiv:1706.07567  [pdf, other

    cs.CV

    Sampling Matters in Deep Embedding Learning

    Authors: Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl

    Abstract: Deep embeddings answer one simple question: How similar are two images? Learning these embeddings is the bedrock of verification, zero-shot learning, and visual search. The most prominent approaches optimize a deep convolutional network with a suitable loss function, such as contrastive loss or triplet loss. While a rich line of work focuses solely on the loss functions, we show in this paper that… ▽ More

    Submitted 16 January, 2018; v1 submitted 23 June, 2017; originally announced June 2017.

    Comments: Add supplementary material. Paper published in ICCV 2017

  23. arXiv:1704.00003  [pdf, other

    cs.LG stat.ML

    Spectral Methods for Nonparametric Models

    Authors: Hsiao-Yu Fish Tung, Chao-Yuan Wu, Manzil Zaheer, Alexander J. Smola

    Abstract: Nonparametric models are versatile, albeit computationally expensive, tool for modeling mixture models. In this paper, we introduce spectral methods for the two most popular nonparametric models: the Indian Buffet Process (IBP) and the Hierarchical Dirichlet Process (HDP). We show that using spectral methods for the inference of nonparametric models are computationally and statistically efficient.… ▽ More

    Submitted 30 March, 2017; originally announced April 2017.

    Comments: Keywords: Spectral Methods, Indian Buffet Process, Hierarchical Dirichlet Process

  24. arXiv:1611.03021  [pdf, other

    cs.LG cs.CR stat.AP

    Attributing Hacks

    Authors: Ziqi Liu, Alexander J. Smola, Kyle Soska, Yu-Xiang Wang, Qinghua Zheng, Jun Zhou

    Abstract: In this paper we describe an algorithm for estimating the provenance of hacks on websites. That is, given properties of sites and the temporal occurrence of attacks, we are able to attribute individual attacks to joint causes and vulnerabilities, as well as estimating the evolution of these vulnerabilities over time. Specifically, we use hazard regression with a time-varying additive hazard functi… ▽ More

    Submitted 14 August, 2017; v1 submitted 7 November, 2016; originally announced November 2016.

    Comments: Appeared at AISTATS'17. Full version under review at the Electronic Journal of Statistics

  25. arXiv:1512.01845  [pdf, other

    cs.LG stat.ML

    Explaining reviews and ratings with PACO: Poisson Additive Co-Clustering

    Authors: Chao-Yuan Wu, Alex Beutel, Amr Ahmed, Alexander J. Smola

    Abstract: Understanding a user's motivations provides valuable information beyond the ability to recommend items. Quite often this can be accomplished by perusing both ratings and review texts, since it is the latter where the reasoning for specific preferences is explicitly expressed. Unfortunately matrix factorization approaches to recommendation result in large, complex models that are difficult to int… ▽ More

    Submitted 6 December, 2015; originally announced December 2015.

  26. arXiv:1508.05003  [pdf, other

    stat.ML cs.LG math.OC

    AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization

    Authors: Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola

    Abstract: We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients. We discuss, analyze, and experiment with a setup motivated by the behavior of real-world distributed computation networks, where the machines are differently slow at different time. Therefore, we allow the parame… ▽ More

    Submitted 20 August, 2015; originally announced August 2015.

    Comments: 19 pages

  27. arXiv:1505.04636  [pdf, other

    cs.DC cs.AI cs.LG

    Graph Partitioning via Parallel Submodular Approximation to Accelerate Distributed Machine Learning

    Authors: Mu Li, Dave G. Andersen, Alexander J. Smola

    Abstract: Distributed computing excels at processing large scale data, but the communication cost for synchronizing the shared parameters may slow down the overall performance. Fortunately, the interactions between parameter and data in many problems are sparse, which admits efficient partition in order to reduce the communication overhead. In this paper, we formulate data placement as a graph partitionin… ▽ More

    Submitted 18 May, 2015; originally announced May 2015.

    ACM Class: I.2.11; I.5.1; G.1.6

  28. arXiv:1505.01419  [pdf, other

    cs.LG cs.AI

    Fast Differentially Private Matrix Factorization

    Authors: Ziqi Liu, Yu-Xiang Wang, Alexander J. Smola

    Abstract: Differentially private collaborative filtering is a challenging task, both in terms of accuracy and speed. We present a simple algorithm that is provably differentially private, while offering good performance, using a novel connection of differential privacy to Bayesian posterior sampling via Stochastic Gradient Langevin Dynamics. Due to its simplicity the algorithm lends itself to efficient impl… ▽ More

    Submitted 7 May, 2015; v1 submitted 6 May, 2015; originally announced May 2015.

    ACM Class: G.2; I.2.6; G.3; G.1.6

  29. arXiv:1501.00199  [pdf, other

    cs.LG stat.ML

    ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly

    Authors: Alex Beutel, Amr Ahmed, Alexander J. Smola

    Abstract: Matrix completion and approximation are popular tools to capture a user's preferences for recommendation and to approximate missing data. Instead of using low-rank factorization we take a drastically different approach, based on the simple insight that an additive model of co-clusterings allows one to approximate matrices efficiently. This allows us to build a concise model that, per bit of model… ▽ More

    Submitted 31 December, 2014; originally announced January 2015.

    Comments: 22 pages, under review for conference publication

    ACM Class: H.2.8; H.3.3; I.2.6

  30. arXiv:1412.6493  [pdf, other

    cs.LG stat.ML

    A la Carte - Learning Fast Kernels

    Authors: Zichao Yang, Alexander J. Smola, Le Song, Andrew Gordon Wilson

    Abstract: Kernel methods have great promise for learning rich statistical representations of large modern datasets. However, compared to neural networks, kernel methods have been perceived as lacking in scalability and flexibility. We introduce a family of fast, flexible, lightly parametrized and general purpose kernel learning methods, derived from Fastfood basis function expansions. We provide mechanisms… ▽ More

    Submitted 19 December, 2014; originally announced December 2014.

  31. arXiv:1408.3060  [pdf, other

    cs.LG stat.ML

    Fastfood: Approximate Kernel Expansions in Loglinear Time

    Authors: Quoc Viet Le, Tamas Sarlos, Alexander Johannes Smola

    Abstract: Despite their successes, what makes kernel methods difficult to use in many large scale problems is the fact that storing and computing the decision function is typically expensive, especially at prediction time. In this paper, we overcome this difficulty by proposing Fastfood, an approximation that accelerates such computation significantly. Key to Fastfood is the observation that Hadamard matric… ▽ More

    Submitted 13 August, 2014; originally announced August 2014.

  32. arXiv:0809.3618  [pdf, other

    cs.CV cs.LG

    Robust Near-Isometric Matching via Structured Learning of Graphical Models

    Authors: Julian J. McAuley, Tiberio S. Caetano, Alexander J. Smola

    Abstract: Models for near-rigid shape matching are typically based on distance-related features, in order to infer matches that are consistent with the isometric assumption. However, real shapes from image datasets, even when expected to be related by "almost isometric" transformations, are actually subject not only to noise but also, to some limited degree, to variations in appearance and scale. In this… ▽ More

    Submitted 21 September, 2008; originally announced September 2008.

    Comments: 11 pages, 9 figures

  33. arXiv:0806.2890  [pdf, other

    cs.CV cs.LG

    Learning Graph Matching

    Authors: Tiberio S. Caetano, Julian J. McAuley, Li Cheng, Quoc V. Le, Alex J. Smola

    Abstract: As a fundamental problem in pattern recognition, graph matching has applications in a variety of fields, from computer vision to computational biology. In graph matching, patterns are modeled as graphs and pattern recognition amounts to finding a correspondence between the nodes of different graphs. Many formulations of this problem can be cast in general as a quadratic assignment problem, where… ▽ More

    Submitted 17 June, 2008; originally announced June 2008.

    Comments: 10 pages, 4 figures

  34. arXiv:0805.2368  [pdf, ps, other

    cs.LG cs.AI

    A Kernel Method for the Two-Sample Problem

    Authors: Arthur Gretton, Karsten Borgwardt, Malte J. Rasch, Bernhard Scholkopf, Alexander J. Smola

    Abstract: We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a… ▽ More

    Submitted 15 May, 2008; originally announced May 2008.

    ACM Class: G.3; I.2.6