Skip to main content

Showing 1–50 of 83 results for author: Smola, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2304.04746  [pdf, other

    cs.CL cs.AI

    A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

    Authors: Jiaao Chen, Aston Zhang, Mu Li, Alex Smola, Diyi Yang

    Abstract: Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation. Whereas, as a way inherently built for continuous data, existing diffusion models still have some limitations in modeling discrete data, e.g., languages. For example, the generally used Gaussian noise can not handle the discrete corruption well, and th… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Code is available at https://github.com/amazon-science/masked-diffusion-lm

  2. arXiv:2304.04704  [pdf, other

    cs.CV cs.AI cs.CL

    Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

    Authors: Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alex Smola, Xu Sun

    Abstract: This work proposes POMP, a prompt pre-training method for vision-language models. Being memory and computation efficient, POMP enables the learned prompt to condense semantic information for a rich set of visual concepts with over twenty-thousand classes. Once pre-trained, the prompt with a strong transferable ability can be directly plugged into a variety of visual recognition tasks including ima… ▽ More

    Submitted 6 October, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: Code is available at https://github.com/amazon-science/prompt-pretraining

  3. arXiv:2302.03020  [pdf, other

    cs.LG cs.CV stat.ML

    RLSbench: Domain Adaptation Under Relaxed Label Shift

    Authors: Saurabh Garg, Nick Erickson, James Sharpnack, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton

    Abstract: Despite the emergence of principled methods for domain adaptation under label shift, their sensitivity to shifts in class conditional distributions is precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with label proportions shifts. While several papers modify these heuristics in attempts to handle label proportions shifts, inconsistencies i… ▽ More

    Submitted 5 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023. Paper website: https://sites.google.com/view/rlsbench/

  4. arXiv:2302.00923  [pdf, other

    cs.CL cs.AI cs.CV

    Multimodal Chain-of-Thought Reasoning in Language Models

    Authors: Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola

    Abstract: Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies have primarily focused on the language modality. We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage fr… ▽ More

    Submitted 20 May, 2024; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: Published in Transactions on Machine Learning Research

  5. arXiv:2301.01821  [pdf, other

    cs.CL cs.AI

    Parameter-Efficient Fine-Tuning Design Spaces

    Authors: Jiaao Chen, Aston Zhang, Xingjian Shi, Mu Li, Alex Smola, Diyi Yang

    Abstract: Parameter-efficient fine-tuning aims to achieve performance comparable to fine-tuning, using fewer trainable parameters. Several strategies (e.g., Adapters, prefix tuning, BitFit, and LoRA) have been proposed. However, their designs are hand-crafted separately, and it remains unclear whether certain design patterns exist for parameter-efficient fine-tuning. Thus, we present a parameter-efficient f… ▽ More

    Submitted 4 January, 2023; originally announced January 2023.

    Comments: Code is available at https://github.com/amazon-science/peft-design-spaces

  6. arXiv:2210.03493  [pdf, other

    cs.CL cs.AI

    Automatic Chain of Thought Prompting in Large Language Models

    Authors: Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola

    Abstract: Large language models (LLMs) can perform complex reasoning by generating intermediate reasoning steps. Providing these steps for prompting demonstrations is called chain-of-thought (CoT) prompting. CoT prompting has two major paradigms. One leverages a simple prompt like "Let's think step by step" to facilitate step-by-step thinking before answering a question. The other uses a few manual demonstr… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

  7. arXiv:2210.01422  [pdf, other

    cs.LG

    Time-Varying Propensity Score to Bridge the Gap between the Past and Present

    Authors: Rasool Fakoor, Jonas Mueller, Zachary C. Lipton, Pratik Chaudhari, Alexander J. Smola

    Abstract: Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the… ▽ More

    Submitted 2 May, 2024; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Published at ICLR 2024

  8. arXiv:2207.01160  [pdf, other

    cs.CV cs.LG

    Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition

    Authors: Haotao Wang, Aston Zhang, Yi Zhu, Shuai Zheng, Mu Li, Alex Smola, Zhangyang Wang

    Abstract: Existing out-of-distribution (OOD) detection methods are typically benchmarked on training sets with balanced class distributions. However, in real-world applications, it is common for the training sets to have long-tailed distributions. In this work, we first demonstrate that existing OOD detection methods commonly suffer from significant performance degradation when the training set is long-tail… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: ICML 2022

  9. arXiv:2112.05848  [pdf, other

    cs.LG cs.AI

    Faster Deep Reinforcement Learning with Slower Online Network

    Authors: Kavosh Asadi, Rasool Fakoor, Omer Gottesman, Taesup Kim, Michael L. Littman, Alexander J. Smola

    Abstract: Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrap**. In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with u… ▽ More

    Submitted 17 April, 2023; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: Published at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  10. arXiv:2111.02705  [pdf, other

    cs.LG cs.CL stat.ML

    Benchmarking Multimodal AutoML for Tabular Data with Text Fields

    Authors: Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, Alexander J. Smola

    Abstract: We consider the use of automated supervised learning systems for data tables that not only contain numeric/categorical columns, but one or more text fields as well. Here we assemble 18 multimodal data tables that each contain some text fields and stem from a real business application. Our publicly-available benchmark enables researchers to comprehensively evaluate their own methods for supervised… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: Proceedings of the Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks 2021

  11. arXiv:2111.00980  [pdf, other

    cs.LG stat.ML

    Mixture Proportion Estimation and PU Learning: A Modern Approach

    Authors: Saurabh Garg, Yifan Wu, Alex Smola, Sivaraman Balakrishnan, Zachary C. Lipton

    Abstract: Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, lea… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

    Comments: Spotlight at NeurIPS 2021

  12. arXiv:2110.13878  [pdf, other

    cs.LG

    Deep Explicit Duration Switching Models for Time Series

    Authors: Abdul Fatir Ansari, Konstantinos Benidis, Richard Kurle, Ali Caner Turkmen, Harold Soh, Alexander J. Smola, Yuyang Wang, Tim Januschowski

    Abstract: Many complex time series can be effectively subdivided into distinct regimes that exhibit persistent dynamics. Discovering the switching behavior and the statistical patterns in these regimes is important for understanding the underlying dynamical system. We propose the Recurrent Explicit Duration Switching Dynamical System (RED-SDS), a flexible model that is capable of identifying both state- and… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted at NeurIPS 2021

  13. arXiv:2106.11342  [pdf

    cs.LG cs.AI cs.CL cs.CV

    Dive into Deep Learning

    Authors: Aston Zhang, Zachary C. Lipton, Mu Li, Alexander J. Smola

    Abstract: This open-source book represents our attempt to make deep learning approachable, teaching readers the concepts, the context, and the code. The entire book is drafted in Jupyter notebooks, seamlessly integrating exposition figures, math, and interactive examples with self-contained code. Our goal is to offer a resource that could (i) be freely available for everyone; (ii) offer sufficient technical… ▽ More

    Submitted 22 August, 2023; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: (HTML) https://D2L.ai (GitHub) https://github.com/d2l-ai/d2l-en/

  14. arXiv:2104.03221  [pdf, other

    cs.DS

    Graph Reordering for Cache-Efficient Near Neighbor Search

    Authors: Benjamin Coleman, Santiago Segarra, Anshumali Shrivastava, Alex Smola

    Abstract: Graph search is one of the most successful algorithmic trends in near neighbor search. Several of the most popular and empirically successful algorithms are, at their core, a simple walk along a pruned near neighbor graph. Such algorithms consistently perform at the top of industrial speed benchmarks for applications such as embedding search. However, graph traversal applications often suffer from… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

  15. arXiv:2103.09944  [pdf, other

    cs.IR cs.LG

    IRLI: Iterative Re-partitioning for Learning to Index

    Authors: Gaurav Gupta, Tharun Medini, Anshumali Shrivastava, Alexander J Smola

    Abstract: Neural models have transformed the fundamental information retrieval problem of map** a query to a giant set of items. However, the need for efficient and low latency inference forces the community to reconsider efficient approximate near-neighbor search in the item space. To this end, learning to index is gaining much interest in recent times. Methods have to trade between obtaining high accura… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Comments: 12 pages

  16. arXiv:2103.00083  [pdf, other

    stat.ML cs.LG

    Flexible Model Aggregation for Quantile Regression

    Authors: Rasool Fakoor, Taesup Kim, Jonas Mueller, Alexander J. Smola, Ryan J. Tibshirani

    Abstract: Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for… ▽ More

    Submitted 15 April, 2023; v1 submitted 26 February, 2021; originally announced March 2021.

    Comments: Accepted at JMLR 2023

  17. arXiv:2102.09225  [pdf, other

    cs.LG stat.ML

    Continuous Doubly Constrained Batch Reinforcement Learning

    Authors: Rasool Fakoor, Jonas Mueller, Kavosh Asadi, Pratik Chaudhari, Alexander J. Smola

    Abstract: Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produc… ▽ More

    Submitted 6 December, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021 conference paper

  18. arXiv:2011.12683  [pdf, other

    cs.IR

    GraphHINGE: Learning Interaction Models of Structured Neighborhood on Heterogeneous Information Network

    Authors: Jiarui **, Kounianhua Du, Weinan Zhang, Jiarui Qin, Yuchen Fang, Yong Yu, Zheng Zhang, Alexander J. Smola

    Abstract: Heterogeneous information network (HIN) has been widely used to characterize entities of various types and their complex relations. Recent attempts either rely on explicit path reachability to leverage path-based semantic relatedness or graph neighborhood to learn heterogeneous network representations before predictions. These weakly coupled manners overlook the rich interactions among neighbor no… ▽ More

    Submitted 30 June, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

    Comments: TOIS (Special Issue on Graph Technologies for User Modeling and Recommendation). arXiv admin note: text overlap with arXiv:2007.00216

  19. arXiv:2008.02641  [pdf, other

    cs.LG cs.IT stat.ME stat.ML

    Bloom Origami Assays: Practical Group Testing

    Authors: Louis Abraham, Gary Becigneul, Benjamin Coleman, Bernhard Scholkopf, Anshumali Shrivastava, Alexander Smola

    Abstract: We study the problem usually referred to as group testing in the context of COVID-19. Given n samples collected from patients, how should we select and test mixtures of samples to maximize information and minimize the number of tests? Group testing is a well-studied problem with several appealing solutions, but recent biological studies impose practical constraints for COVID-19 that are incompatib… ▽ More

    Submitted 21 July, 2020; originally announced August 2020.

    Comments: arXiv admin note: text overlap with arXiv:2005.06413

  20. arXiv:2007.00216  [pdf, other

    cs.IR

    An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph

    Authors: Jiarui **, Jiarui Qin, Yuchen Fang, Kounianhua Du, Weinan Zhang, Yong Yu, Zheng Zhang, Alexander J. Smola

    Abstract: There is an influx of heterogeneous information network (HIN) based recommender systems in recent years since HIN is capable of characterizing complex graphs and contains rich semantics. Although the existing approaches have achieved performance improvement, while practical, they still face the following problems. On one hand, most existing HIN-based methods rely on explicit path reachability to l… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: KDD 2020

  21. arXiv:2006.15199  [pdf, other

    cs.LG stat.ML

    DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning

    Authors: Rasool Fakoor, Pratik Chaudhari, Alexander J. Smola

    Abstract: This paper prescribes a suite of techniques for off-policy Reinforcement Learning (RL) that simplify the training process and reduce the sample complexity. First, we show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. This is contrast to existing literature which creates sophisticated off-policy techniques. Second, we pinpoint trai… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  22. arXiv:2006.14284  [pdf, other

    cs.LG stat.ML

    Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation

    Authors: Rasool Fakoor, Jonas Mueller, Nick Erickson, Pratik Chaudhari, Alexander J. Smola

    Abstract: Automated machine learning (AutoML) can produce complex model ensembles by stacking, bagging, and boosting many individual models like trees, deep networks, and nearest neighbor estimators. While highly accurate, the resulting predictors are large, slow, and opaque as compared to their constituents. To improve the deployment of AutoML on tabular data, we propose FAST-DAD to distill arbitrarily com… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

    Journal ref: NeurIPS 2020

  23. arXiv:2005.07893  [pdf, other

    cs.IR cs.LG

    Tiering as a Stochastic Submodular Optimization Problem

    Authors: Hyokun Yun, Michael Froh, Roshan Makhijani, Brian Luc, Alex Smola, Trishul Chilimbi

    Abstract: Tiering is an essential technique for building large-scale information retrieval systems. While the selection of documents for high priority tiers critically impacts the efficiency of tiering, past work focuses on optimizing it with respect to a static set of queries in the history, and generalizes poorly to the future traffic. Instead, we formulate the optimal tiering as a stochastic optimization… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

  24. arXiv:2004.14960  [pdf, other

    cs.CV

    Improving Semantic Segmentation via Self-Training

    Authors: Yi Zhu, Zhongyue Zhang, Chongruo Wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li, Alexander Smola

    Abstract: Deep learning usually achieves the best results with complete supervision. In the case of semantic segmentation, this means that large amounts of pixelwise annotations are required to learn accurate models. In this paper, we show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm. We first train a teacher model on labeled data, and t… ▽ More

    Submitted 6 May, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

  25. arXiv:2004.08955  [pdf, other

    cs.CV

    ResNeSt: Split-Attention Networks

    Authors: Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola

    Abstract: It is well known that featuremap attention and multi-path representation are important for visual recognition. In this paper, we present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our design results in a simple and unified computation block… ▽ More

    Submitted 30 December, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

  26. arXiv:2004.02441  [pdf, other

    cs.LG stat.ML

    TraDE: Transformers for Density Estimation

    Authors: Rasool Fakoor, Pratik Chaudhari, Jonas Mueller, Alexander J. Smola

    Abstract: We present TraDE, a self-attention-based architecture for auto-regressive density estimation with continuous and discrete valued data. Our model is trained using a penalized maximum likelihood objective, which ensures that samples from the density estimate resemble the training data distribution. The use of self-attention means that the model need not retain conditional sufficient statistics durin… ▽ More

    Submitted 14 October, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  27. arXiv:2003.06505  [pdf, other

    stat.ML cs.LG

    AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

    Authors: Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, Alexander Smola

    Abstract: We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models on an unprocessed tabular dataset such as a CSV file. Unlike existing AutoML frameworks that primarily focus on model/hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers. Exper… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

  28. arXiv:2002.06170  [pdf, other

    cs.CL cs.LG

    Transformer on a Diet

    Authors: Chenguang Wang, Zihao Ye, Aston Zhang, Zheng Zhang, Alexander J. Smola

    Abstract: Transformer has been widely used thanks to its ability to capture sequence information in an efficient way. However, recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness. In this paper, we explore three carefully-designed light Transformer architectures to figure out whether the Transformer with less computations could produce competitive resu… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: 6 pages, 2 tables, 1 figure

  29. arXiv:1910.00125  [pdf, other

    cs.LG stat.ML

    Meta-Q-Learning

    Authors: Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola

    Abstract: This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state-of-the-art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, a multi-task objective to maximize the average reward across the tr… ▽ More

    Submitted 4 April, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

    Comments: ICLR 2020 conference paper

  30. arXiv:1909.04844  [pdf, other

    cs.LG cs.DB stat.ML

    Recognizing Variables from their Data via Deep Embeddings of Distributions

    Authors: Jonas Mueller, Alex Smola

    Abstract: A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be more robustly addressed by leveraging the data values themselves rather than just relying on their arbitrarily selected variable names. Here, we present a comp… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

    Comments: IEEE International Conference on Data Mining (ICDM), 2019

  31. arXiv:1905.12417  [pdf, other

    stat.ML cs.LG

    Deep Factors for Forecasting

    Authors: Yuyang Wang, Alex Smola, Danielle C. Maddix, Jan Gasthaus, Dean Foster, Tim Januschowski

    Abstract: Producing probabilistic forecasts for large collections of similar and/or dependent time series is a practically relevant and challenging task. Classical time series models fail to capture complex patterns in the data, and multivariate techniques struggle to scale to large problem sizes. Their reliance on strong structural assumptions makes them data-efficient, and allows them to provide uncertain… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

    Comments: http://proceedings.mlr.press/v97/wang19k/wang19k.pdf. arXiv admin note: substantial text overlap with arXiv:1812.00098

    Journal ref: Proceedings of Machine Learning Research, Volume 97: International Conference on Machine Learning, 2019

  32. arXiv:1905.01756  [pdf, other

    cs.LG stat.ML

    P3O: Policy-on Policy-off Policy Optimization

    Authors: Rasool Fakoor, Pratik Chaudhari, Alexander J. Smola

    Abstract: On-policy reinforcement learning (RL) algorithms have high sample complexity while off-policy algorithms are difficult to tune. Merging the two holds the promise to develop efficient algorithms that generalize across diverse environments. It is however challenging in practice to find suitable hyper-parameters that govern this trade off. This paper develops a simple algorithm named P3O that interle… ▽ More

    Submitted 15 July, 2019; v1 submitted 5 May, 2019; originally announced May 2019.

    Comments: UAI 2019 conference paper. Code: https://github.com/rasoolfa/P3O

  33. arXiv:1904.09408  [pdf, other

    cs.CL cs.AI cs.LG

    Language Models with Transformers

    Authors: Chenguang Wang, Mu Li, Alexander J. Smola

    Abstract: The Transformer architecture is superior to RNN-based models in computational efficiency. Recently, GPT and BERT demonstrate the efficacy of Transformer models on various NLP tasks using pre-trained language models on large-scale corpora. Surprisingly, these Transformer architectures are suboptimal for language model itself. Neither self-attention nor the positional encoding in the Transformer is… ▽ More

    Submitted 17 October, 2019; v1 submitted 20 April, 2019; originally announced April 2019.

    Comments: 12 pages, 7 tables, 4 figures

  34. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  35. arXiv:1812.00098  [pdf, other

    stat.ML cs.LG

    Deep Factors with Gaussian Processes for Forecasting

    Authors: Danielle C. Maddix, Yuyang Wang, Alex Smola

    Abstract: A large collection of time series poses significant challenges for classical and neural forecasting approaches. Classical time series models fail to fit data well and to scale to large problems, but succeed at providing uncertainty estimates. The converse is true for deep neural networks. In this paper, we propose a hybrid model that incorporates the benefits of both approaches. Our new method is… ▽ More

    Submitted 30 November, 2018; originally announced December 2018.

    Comments: Third workshop on Bayesian Deep Learning (NeurIPS 2018), Montreal, Canada

  36. arXiv:1806.01235  [pdf, other

    cs.LG cs.AI stat.ML

    Deep Graphs

    Authors: Emmanouil Antonios Platanios, Alex Smola

    Abstract: We propose an algorithm for deep learning on networks and graphs. It relies on the notion that many graph algorithms, such as PageRank, Weisfeiler-Lehman, or Message Passing can be expressed as iterative vertex updates. Unlike previous methods which rely on the ingenuity of the designer, Deep Graphs are adaptive to the estimation problem. Training and deployment are both efficient, since the cost… ▽ More

    Submitted 4 June, 2018; originally announced June 2018.

  37. arXiv:1802.03916  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Detecting and Correcting for Label Shift with Black Box Predictors

    Authors: Zachary C. Lipton, Yu-Xiang Wang, Alex Smola

    Abstract: Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets) cause symptoms (observations), we focus on label shift, where the label marginal $p(y)$ changes but the conditional $p(x| y)$ does not. We propose Black Box Shift Estimation (BBSE) to… ▽ More

    Submitted 26 July, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: Published at the International Conference on Machine Learning (ICML) 2018

  38. arXiv:1712.00636  [pdf, other

    cs.CV

    Compressed Video Action Recognition

    Authors: Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl

    Abstract: Training robust deep video representations has proven to be much more challenging than learning deep image representations. This is in part due to the enormous size of raw video streams and the high temporal redundancy; the true and interesting signal is often drowned in too much irrelevant data. Motivated by that the superfluous information can be reduced by up to two orders of magnitude by video… ▽ More

    Submitted 29 March, 2018; v1 submitted 2 December, 2017; originally announced December 2017.

    Comments: CVPR 2018 (Selected for spotlight presentation)

  39. arXiv:1711.11179  [pdf, other

    cs.LG stat.ML

    State Space LSTM Models with Particle MCMC Inference

    Authors: Xun Zheng, Manzil Zaheer, Amr Ahmed, Yuan Wang, Eric P Xing, Alexander J Smola

    Abstract: Long Short-Term Memory (LSTM) is one of the most powerful sequence models. Despite the strong performance, however, it lacks the nice interpretability as in state space models. In this paper, we present a way to combine the best of both worlds by introducing State Space LSTM (SSL) models that generalizes the earlier work \cite{zaheer2017latent} of combining topic models with LSTM. However, unlike… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

  40. arXiv:1711.05851  [pdf, other

    cs.CL cs.AI

    Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning

    Authors: Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum

    Abstract: Knowledge bases (KB), both automatically and manually constructed, are often incomplete --- many valid facts can be inferred from the KB by synthesizing existing information. A popular approach to KB completion is to infer new relations by combinatory reasoning over the information found along other paths connecting a pair of entities. Given the enormous size of KBs and the exponential number of p… ▽ More

    Submitted 30 December, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

    Comments: ICLR 2018

  41. arXiv:1709.04071  [pdf, other

    cs.LG cs.AI cs.CL

    Variational Reasoning for Question Answering with Knowledge Graph

    Authors: Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexander J. Smola, Le Song

    Abstract: Knowledge graph (KG) is known to be helpful for the task of question answering (QA), since it provides well-structured relational information between entities, and allows one to further infer indirect facts. However, it is challenging to build QA systems which can learn to reason over knowledge graphs based on question-answer pairs alone. First, when people ask questions, their expressions are noi… ▽ More

    Submitted 27 November, 2017; v1 submitted 12 September, 2017; originally announced September 2017.

  42. arXiv:1709.01434  [pdf, other

    cs.LG cs.AI

    A Generic Approach for Esca** Saddle points

    Authors: Sashank J Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, Alexander J Smola

    Abstract: A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them imp… ▽ More

    Submitted 5 September, 2017; originally announced September 2017.

  43. arXiv:1706.07567  [pdf, other

    cs.CV

    Sampling Matters in Deep Embedding Learning

    Authors: Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl

    Abstract: Deep embeddings answer one simple question: How similar are two images? Learning these embeddings is the bedrock of verification, zero-shot learning, and visual search. The most prominent approaches optimize a deep convolutional network with a suitable loss function, such as contrastive loss or triplet loss. While a rich line of work focuses solely on the loss functions, we show in this paper that… ▽ More

    Submitted 16 January, 2018; v1 submitted 23 June, 2017; originally announced June 2017.

    Comments: Add supplementary material. Paper published in ICCV 2017

  44. arXiv:1704.00003  [pdf, other

    cs.LG stat.ML

    Spectral Methods for Nonparametric Models

    Authors: Hsiao-Yu Fish Tung, Chao-Yuan Wu, Manzil Zaheer, Alexander J. Smola

    Abstract: Nonparametric models are versatile, albeit computationally expensive, tool for modeling mixture models. In this paper, we introduce spectral methods for the two most popular nonparametric models: the Indian Buffet Process (IBP) and the Hierarchical Dirichlet Process (HDP). We show that using spectral methods for the inference of nonparametric models are computationally and statistically efficient.… ▽ More

    Submitted 30 March, 2017; originally announced April 2017.

    Comments: Keywords: Spectral Methods, Indian Buffet Process, Hierarchical Dirichlet Process

  45. arXiv:1703.06114  [pdf, other

    cs.LG stat.ML

    Deep Sets

    Authors: Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, Alexander Smola

    Abstract: We study the problem of designing models for machine learning tasks defined on \emph{sets}. In contrast to traditional approach of operating on fixed dimensional vectors, we consider objective functions defined on sets that are invariant to permutations. Such problems are widespread, ranging from estimation of population statistics \cite{poczos13aistats}, to anomaly detection in piezometer data of… ▽ More

    Submitted 14 April, 2018; v1 submitted 10 March, 2017; originally announced March 2017.

    Comments: NIPS 2017

  46. arXiv:1702.08160  [pdf, other

    cs.CV

    Segmentation of Objects by Hashing

    Authors: J. D. Curtó, I. C. Zarza, Alex Smola, Luc van Gool

    Abstract: We propose a novel approach to address the problem of Simultaneous Detection and Segmentation introduced in [Hariharan et al 2014]. Using the hierarchical structures first presented in [Arbeláez et al 2011] we use an efficient and accurate procedure that exploits the feature information of the hierarchy using Locality Sensitive Hashing. We build on recent work that utilizes convolutional neural ne… ▽ More

    Submitted 17 April, 2020; v1 submitted 27 February, 2017; originally announced February 2017.

  47. arXiv:1702.08159  [pdf, other

    cs.LG stat.ML

    McKernel: A Library for Approximate Kernel Expansions in Log-linear Time

    Authors: J. D. Curtó, I. C. Zarza, Feng Yang, Alex Smola, Fernando de la Torre, Chong Wah Ngo, Luc van Gool

    Abstract: McKernel introduces a framework to use kernel approximates in the mini-batch setting with Stochastic Gradient Descent (SGD) as an alternative to Deep Learning. Based on Random Kitchen Sinks [Rahimi and Recht 2007], we provide a C++ library for Large-scale Machine Learning. It contains a CPU optimized implementation of the algorithm in [Le et al. 2013], that allows the computation of approximated k… ▽ More

    Submitted 17 April, 2020; v1 submitted 27 February, 2017; originally announced February 2017.

  48. arXiv:1702.04423  [pdf, other

    cs.LG cs.AI

    Efficient Multitask Feature and Relationship Learning

    Authors: Han Zhao, Otilia Stretcu, Alex Smola, Geoff Gordon

    Abstract: We consider a multitask learning problem, in which several predictors are learned jointly. Prior research has shown that learning the relations between tasks, and between the input features, together with the predictor, can lead to better generalization and interpretability, which proved to be useful for applications in many domains. In this paper, we consider a formulation of multitask learning t… ▽ More

    Submitted 10 July, 2019; v1 submitted 14 February, 2017; originally announced February 2017.

  49. arXiv:1611.04488  [pdf, other

    stat.ML cs.AI cs.LG cs.NE stat.ME

    Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy

    Authors: Danica J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton

    Abstract: We propose a method to optimize the representation and distinguishability of samples from two probability distributions, by maximizing the estimated power of a statistical test based on the maximum mean discrepancy (MMD). This optimized MMD is applied to the setting of unsupervised learning by generative adversarial networks (GAN), in which a model attempts to generate realistic samples, and a dis… ▽ More

    Submitted 14 January, 2021; v1 submitted 14 November, 2016; originally announced November 2016.

    Comments: Published at ICLR 2017 (public comments: http://openreview.net/forum?id=HJWHIKqgl )

  50. arXiv:1611.03021  [pdf, other

    cs.LG cs.CR stat.AP

    Attributing Hacks

    Authors: Ziqi Liu, Alexander J. Smola, Kyle Soska, Yu-Xiang Wang, Qinghua Zheng, Jun Zhou

    Abstract: In this paper we describe an algorithm for estimating the provenance of hacks on websites. That is, given properties of sites and the temporal occurrence of attacks, we are able to attribute individual attacks to joint causes and vulnerabilities, as well as estimating the evolution of these vulnerabilities over time. Specifically, we use hazard regression with a time-varying additive hazard functi… ▽ More

    Submitted 14 August, 2017; v1 submitted 7 November, 2016; originally announced November 2016.

    Comments: Appeared at AISTATS'17. Full version under review at the Electronic Journal of Statistics