Skip to main content

Showing 1–42 of 42 results for author: Schuurmans, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.19320  [pdf, other

    cs.LG cs.AI stat.ML

    Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

    Authors: Shicong Cen, **cheng Mei, Katayoon Goshvadi, Hanjun Dai, Tong Yang, Sherry Yang, Dale Schuurmans, Yuejie Chi, Bo Dai

    Abstract: Reinforcement learning from human feedback (RLHF) has demonstrated great promise in aligning large language models (LLMs) with human preference. Depending on the availability of preference data, both online and offline RLHF are active areas of investigation. A key bottleneck is understanding how to incorporate uncertainty estimation in the reward function learned from the preference data for RLHF,… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  2. arXiv:2311.12244  [pdf, other

    cs.LG cs.AI stat.ML

    Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

    Authors: Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai

    Abstract: In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounte… ▽ More

    Submitted 10 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: The first two authors contribute equally

  3. arXiv:2212.08949  [pdf, other

    cs.LG eess.SY stat.ML

    Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off

    Authors: Zichen Zhang, Johannes Kirschner, Junxi Zhang, Francesco Zanini, Alex Ayoub, Masood Dehghan, Dale Schuurmans

    Abstract: A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle. Yet, many applications involve continuous-time systems where the time discretization, in principle, can be managed. The impact of time discretization on RL methods has not been fully characterized in existing theory, but a more detailed analysis of its… ▽ More

    Submitted 16 January, 2024; v1 submitted 17 December, 2022; originally announced December 2022.

    Comments: NeurIPS 2023

  4. arXiv:2212.08765  [pdf, other

    cs.LG stat.ML

    Latent Variable Representation for Reinforcement Learning

    Authors: Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai

    Abstract: Deep latent variable models have achieved significant empirical successes in model-based reinforcement learning (RL) due to their expressiveness in modeling complex transition dynamics. On the other hand, it remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of RL. In this paper, we provide a… ▽ More

    Submitted 7 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: ICLR 2023. The first two authors contribute equally. Project Website: https://rlrep.github.io/lvrep/

  5. arXiv:2211.07767  [pdf, other

    stat.ML cs.LG math.OC

    Learning to Optimize with Stochastic Dominance Constraints

    Authors: Hanjun Dai, Yuan Xue, Niao He, Bethany Wang, Na Li, Dale Schuurmans, Bo Dai

    Abstract: In real-world decision-making, uncertainty is important yet difficult to handle. Stochastic dominance provides a theoretically sound approach for comparing uncertain quantities, but optimization with stochastic dominance constraints is often computationally expensive, which limits practical applicability. In this paper, we develop a simple yet efficient approach for the problem, the Light Stochast… ▽ More

    Submitted 24 February, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: Accepted to the 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023)

  6. arXiv:2208.09515  [pdf, other

    cs.LG stat.ML

    Spectral Decomposition Representation for Reinforcement Learning

    Authors: Tongzheng Ren, Tianjun Zhang, Lisa Lee, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

    Abstract: Representation learning often plays a critical role in reinforcement learning by managing the curse of dimensionality. A representative class of algorithms exploits a spectral decomposition of the stochastic transition dynamics to construct representations that enjoy strong theoretical properties in an idealized setting. However, current spectral methods suffer from limited applicability because t… ▽ More

    Submitted 7 March, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: ICLR 2023. The first two authors contribute equally

  7. arXiv:2207.07150  [pdf, other

    cs.LG stat.ML

    Making Linear MDPs Practical via Contrastive Representation Learning

    Authors: Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

    Abstract: It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. This motivates much of the recent theoretical study on linear MDPs. However, most approaches require a given representation under unrealistic assumptions about the normalization of the decomposition or introduce unresolved computational challenges in practice. Instead, we… ▽ More

    Submitted 7 December, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: ICML 2022. The first two authors contribute equally

  8. arXiv:2112.00874  [pdf, other

    cs.LG stat.ML

    Neural Stochastic Dual Dynamic Programming

    Authors: Hanjun Dai, Yuan Xue, Zia Syed, Dale Schuurmans, Bo Dai

    Abstract: Stochastic dual dynamic programming (SDDP) is a state-of-the-art method for solving multi-stage stochastic optimization, widely used for modeling real-world process optimization tasks. Unfortunately, SDDP has a worst-case complexity that scales exponentially in the number of decision variables, which severely limits applicability to only low dimensional problems. To overcome this limitation, we ex… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: 24 pages

  9. arXiv:2102.06234  [pdf, other

    cs.LG stat.ML

    Optimization Issues in KL-Constrained Approximate Policy Iteration

    Authors: Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári

    Abstract: Many reinforcement learning algorithms can be seen as versions of approximate policy iteration (API). While standard API often performs poorly, it has been shown that learning can be stabilized by regularizing each policy update by the KL-divergence to the previous policy. Popular practical algorithms such as TRPO, MPO, and VMPO replace regularization by a constraint on KL-divergence of consecutiv… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  10. arXiv:2010.11652  [pdf, other

    cs.LG stat.ML

    CoinDICE: Off-Policy Confidence Interval Estimation

    Authors: Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans

    Abstract: We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies. Starting from a function space embedding of the linear program formulation of the $Q$-function, we obtain an optimization problem with gene… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: To appear at NeurIPS 2020 as spotlight

  11. arXiv:2009.14308  [pdf, other

    cs.LG stat.ML

    Attention that does not Explain Away

    Authors: Nan Ding, Xinjie Fan, Zhenzhong Lan, Dale Schuurmans, Radu Soricut

    Abstract: Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks. A unique feature of the Transformer is its universal application of a self-attention mechanism, which allows for free information flow at arbitrary distances. Following a probabilistic view of the attention via the Gaussian mixture model, we find empir… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

  12. arXiv:2007.11091  [pdf, other

    cs.LG stat.ML

    EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

    Authors: Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Shane Gu

    Abstract: Off-policy reinforcement learning holds the promise of sample-efficient learning of decision-making policies by leveraging past experience. However, in the offline RL setting -- where a fixed collection of interactions are provided and no further interactions are allowed -- it has been shown that standard off-policy RL methods can significantly underperform. Recently proposed methods often aim to… ▽ More

    Submitted 13 January, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

  13. arXiv:2007.03438  [pdf, other

    cs.LG math.OC stat.ML

    Off-Policy Evaluation via the Regularized Lagrangian

    Authors: Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans

    Abstract: The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data. While these estimators all perform some form of stationary distribution correction, they arise from different derivations and objective functions. In this paper, we unify these estimators as regularized Lagrangians of the same… ▽ More

    Submitted 24 July, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

  14. arXiv:2007.00811  [pdf, other

    cs.LG stat.ML

    Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

    Authors: Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc Le, Qiang Liu, Dale Schuurmans

    Abstract: For deploying a deep learning model into production, it needs to be both accurate and compact to meet the latency and memory constraints. This usually results in a network that is deep (to ensure performance) and yet thin (to improve computational efficiency). In this paper, we propose an efficient method to train a deep thin network with a theoretic guarantee. Our method is motivated by model com… ▽ More

    Submitted 17 August, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: ICML 2020

  15. arXiv:2006.15502  [pdf, other

    cs.LG stat.ML

    Scalable Deep Generative Modeling for Sparse Graphs

    Authors: Hanjun Dai, Azade Nazi, Yujia Li, Bo Dai, Dale Schuurmans

    Abstract: Learning graph generative models is a challenging task for deep learning and has wide applicability to a range of domains like chemistry, biology and social science. However current deep neural methods suffer from limited scalability: for a graph with $n$ nodes and $m$ edges, existing deep neural methods require $Ω(n^2)$ complexity by building up the adjacency matrix. On the other hand, many real… ▽ More

    Submitted 28 June, 2020; originally announced June 2020.

    Comments: ICML 2020

  16. arXiv:2005.06392  [pdf, other

    cs.LG stat.ML

    On the Global Convergence Rates of Softmax Policy Gradient Methods

    Authors: **cheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

    Abstract: We make three contributions toward better understanding policy gradient methods in the tabular setting. First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization. This result significantly expands the recent asymptotic convergence results. The analysis relies on two findings: t… ▽ More

    Submitted 2 June, 2022; v1 submitted 13 May, 2020; originally announced May 2020.

    Comments: 64 pages, 5 figures. Published in ICML 2020

  17. arXiv:2003.07521  [pdf, other

    cs.LG stat.ML

    Energy-Based Processes for Exchangeable Data

    Authors: Mengjiao Yang, Bo Dai, Hanjun Dai, Dale Schuurmans

    Abstract: Recently there has been growing interest in modeling sets with exchangeability such as point clouds. A shortcoming of current approaches is that they restrict the cardinality of the sets considered or can only express limited forms of distribution over unobserved data. To overcome these limitations, we introduce Energy-Based Processes (EBPs), which extend energy based models to exchangeable data w… ▽ More

    Submitted 8 July, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

    Journal ref: PMLR 119:2302-2312, 2020

  18. arXiv:2003.04292  [pdf, other

    cs.LG stat.ML

    Variational Inference for Deep Probabilistic Canonical Correlation Analysis

    Authors: Mahdi Karami, Dale Schuurmans

    Abstract: In this paper, we propose a deep probabilistic multi-view model that is composed of a linear multi-view layer based on probabilistic canonical correlation analysis (CCA) description in the latent space together with deep generative networks as observation models. The network is designed to decompose the variations of all views into a shared latent representation and a set of view-specific componen… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

    Comments: 13 pages, 4 figures

  19. arXiv:2003.00722  [pdf, other

    cs.LG cs.AI stat.ML

    Batch Stationary Distribution Estimation

    Authors: Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans

    Abstract: We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions. Classical simulation-based approaches assume access to the underlying process so that trajectories of sufficient length can be gathered to approximate stationary sampling. Instead, we consider an alternative setting where a fixed set of transitions has been collected… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  20. arXiv:2002.12399  [pdf, other

    cs.LG cs.AI stat.ML

    ConQUR: Mitigating Delusional Bias in Deep Q-learning

    Authors: Andy Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier

    Abstract: Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penal… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

  21. arXiv:2002.09072  [pdf, other

    stat.ML cs.LG

    GenDICE: Generalized Offline Estimation of Stationary Values

    Authors: Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans

    Abstract: An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain. In many real-world applications, access to the underlying transition operator is limited to a fixed set of data that has already been collected, without additional interaction with the environment being available. We show that consist… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

    Comments: ICLR 2020

  22. arXiv:1912.11206  [pdf, other

    cs.LG stat.ML

    Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

    Authors: Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, Martin Müller

    Abstract: Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate. An algorithm should ideally be able to trust an imperfect model over a reasonably long planning horizon, and only rely on model-free updates when the model errors get infeasibly large. In this paper, we investigate techniques for… ▽ More

    Submitted 23 December, 2019; originally announced December 2019.

  23. arXiv:1909.05352  [pdf, other

    cs.LG stat.ML

    Domain Aggregation Networks for Multi-Source Domain Adaptation

    Authors: Junfeng Wen, Russell Greiner, Dale Schuurmans

    Abstract: In many real-world applications, we want to exploit multiple source datasets of similar tasks to learn a model for a different but related target dataset -- e.g., recognizing characters of a new font using a set of different fonts. While most recent research has considered ad-hoc combination rules to address this problem, we extend previous work on domain discrepancy minimization to develop a fini… ▽ More

    Submitted 25 September, 2019; v1 submitted 11 September, 2019; originally announced September 2019.

  24. arXiv:1907.04543  [pdf, other

    cs.LG cs.AI stat.ML

    An Optimistic Perspective on Offline Reinforcement Learning

    Authors: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi

    Abstract: Off-policy reinforcement learning (RL) using a fixed offline dataset of logged interactions is an important consideration in real world applications. This paper studies offline RL using the DQN replay dataset comprising the entire replay experience of a DQN agent on 60 Atari 2600 games. We demonstrate that recent off-policy deep RL algorithms, even when trained solely on this fixed dataset, outper… ▽ More

    Submitted 22 June, 2020; v1 submitted 10 July, 2019; originally announced July 2019.

    Comments: ICML 2020. An earlier version was titled "Striving for Simplicity in Off-Policy Deep Reinforcement Learning". Project Website: https://offline-rl.github.io

    Journal ref: Proceedings of the 37th International Conference on Machine Learning, PMLR 119:104-114, 2020

  25. arXiv:1905.13559  [pdf, other

    cs.LG cs.AI stat.ML

    Advantage Amplification in Slowly Evolving Latent-State Environments

    Authors: Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier

    Abstract: Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL). In this work, we identify and analyze several key hurdles for RL in such environments, including belief state error and small action advantage. We develop a general principle of advantage amplification that can overcome these hurdles through the use… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  26. arXiv:1904.12083  [pdf, other

    cs.LG stat.CO stat.ML

    Exponential Family Estimation via Adversarial Dynamics Embedding

    Authors: Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans

    Abstract: We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks. We exploit the primal-dual view of the MLE with a kinetics augmented model to obtain an estimate associated with an adversarial dual sampler. To represent this sampler, we introduce a novel neural architecture,… ▽ More

    Submitted 30 March, 2020; v1 submitted 26 April, 2019; originally announced April 2019.

    Comments: Appearing in NeurIPS 2019 Vancouver, Canada; a preliminary version published in NeurIPS2018 Bayesian Deep Learning Workshop

  27. arXiv:1902.07198  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Learning to Generalize from Sparse and Underspecified Rewards

    Authors: Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi

    Abstract: We consider the problem of learning from sparse and underspecified rewards, where an agent receives a complex input, such as a natural language instruction, and needs to generate a complex response, such as an action sequence, while only receiving binary success-failure feedback. Such success-failure rewards are often underspecified: they do not distinguish between purposeful and accidental succes… ▽ More

    Submitted 31 May, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

    Comments: ICML 2019

    Journal ref: Proceedings of the 36th International Conference on Machine Learning, PMLR 97:130-140, 2019

  28. arXiv:1901.11530  [pdf, other

    cs.LG cs.AI stat.ML

    A Geometric Perspective on Optimal Representations for Reinforcement Learning

    Authors: Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle

    Abstract: We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functions. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary po… ▽ More

    Submitted 25 June, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

  29. arXiv:1901.11524  [pdf, other

    cs.LG cs.AI stat.ML

    The Value Function Polytope in Reinforcement Learning

    Authors: Robert Dadashi, Adrien Ali Taïga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare

    Abstract: We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010). To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem… ▽ More

    Submitted 15 May, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

  30. arXiv:1811.11214  [pdf, other

    cs.LG stat.ML

    Understanding the impact of entropy on policy optimization

    Authors: Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans

    Abstract: Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{exploration} by encouraging the selection of more stochastic policies. In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first show that even with access to the exact gradient,… ▽ More

    Submitted 7 June, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: Accepted to ICML 2019

  31. arXiv:1811.02228  [pdf, other

    cs.LG stat.ML

    Kernel Exponential Family Estimation via Doubly Dual Embedding

    Authors: Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He

    Abstract: We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space. Key to our approach is a novel technique, doubly dual embedding, that avoids computation of the partition function. This technique also allows the development of a flexible sampling strategy that amortizes the cost of Monte-Carlo sam… ▽ More

    Submitted 24 April, 2019; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: 22 pages, 20 figures; AISTATS 2019

  32. arXiv:1804.01712  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Variational Rejection Sampling

    Authors: Aditya Grover, Ramki Gummadi, Miguel Lazaro-Gredilla, Dale Schuurmans, Stefano Ermon

    Abstract: Learning latent variable models with stochastic variational inference is challenging when the approximate posterior is far from the true posterior, due to high variance in the gradient estimates. We propose a novel rejection sampling step that discards samples from the variational posterior which are assigned low likelihoods by the model. Our approach provides an arbitrarily accurate approximation… ▽ More

    Submitted 5 April, 2018; originally announced April 2018.

    Comments: AISTATS 2018

  33. arXiv:1702.08892  [pdf, other

    cs.AI cs.LG stat.ML

    Bridging the Gap Between Value and Policy Based Reinforcement Learning

    Authors: Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

    Abstract: We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization. Specifically, we show that softmax consistent action values correspond to optimal entropy regularized policy probabilities along any action sequence, regardless of provenance. From this observ… ▽ More

    Submitted 22 November, 2017; v1 submitted 28 February, 2017; originally announced February 2017.

    Comments: NIPS 2017

  34. arXiv:1601.00034  [pdf, other

    stat.ML cs.LG cs.NE

    Stochastic Neural Networks with Monotonic Activation Functions

    Authors: Siamak Ravanbakhsh, Barnabas Poczos, Jeff Schneider, Dale Schuurmans, Russell Greiner

    Abstract: We propose a Laplace approximation that creates a stochastic unit from any smooth monotonic activation function, using only Gaussian noise. This paper investigates the application of this stochastic approximation in training a family of Restricted Boltzmann Machines (RBM) that are closely linked to Bregman divergences. This family, that we call exponential family RBM (Exp-RBM), is a subset of the… ▽ More

    Submitted 22 July, 2016; v1 submitted 31 December, 2015; originally announced January 2016.

    Comments: AISTATS 2016

  35. arXiv:1410.4828  [pdf, other

    math.OC cs.LG stat.ML

    Generalized Conditional Gradient for Sparse Estimation

    Authors: Yaoliang Yu, Xinhua Zhang, Dale Schuurmans

    Abstract: Structured sparsity is an important modeling tool that expands the applicability of convex formulations for data analysis, however it also creates significant challenges for efficient algorithm design. In this paper we investigate the generalized conditional gradient (GCG) algorithm for solving structured sparse optimization problems---demonstrating that, with some enhancements, it can provide a m… ▽ More

    Submitted 17 October, 2014; originally announced October 2014.

    Comments: 67 pages, 20 figures

  36. arXiv:1309.6823  [pdf

    cs.LG stat.ML

    Convex Relaxations of Bregman Divergence Clustering

    Authors: Hao Cheng, Xinhua Zhang, Dale Schuurmans

    Abstract: Although many convex relaxations of clustering have been proposed in the past decade, current formulations remain restricted to spherical Gaussian or discriminative models and are susceptible to imbalanced clusters. To address these shortcomings, we propose a new class of convex relaxations that can be flexibly applied to more general forms of Bregman divergence clustering. By basing these new for… ▽ More

    Submitted 26 September, 2013; originally announced September 2013.

    Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

    Report number: UAI-P-2013-PG-162-171

  37. arXiv:1301.3890  [pdf

    cs.LG stat.CO stat.ML

    Monte Carlo Inference via Greedy Importance Sampling

    Authors: Dale Schuurmans, Finnegan Southey

    Abstract: We present a new method for conducting Monte Carlo inference in graphical models which combines explicit search with generalized importance sampling. The idea is to reduce the variance of importance sampling by searching for significant points in the target distribution. We prove that it is possible to introduce search and still maintain unbiasedness. We then demonstrate our procedure on a few s… ▽ More

    Submitted 16 January, 2013; originally announced January 2013.

    Comments: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

    Report number: UAI-P-2000-PG-523-532

  38. arXiv:1212.2514  [pdf

    cs.LG stat.ML

    Boltzmann Machine Learning with the Latent Maximum Entropy Principle

    Authors: Shaojun Wang, Dale Schuurmans, Fuchun Peng, Yunxin Zhao

    Abstract: We present a new statistical learning paradigm for Boltzmann machines based on a new inference principle we have proposed: the latent maximum entropy principle (LME). LME is different both from Jaynes maximum entropy principle and from standard maximum likelihood estimation.We demonstrate the LME principle BY deriving new algorithms for Boltzmann machine parameter estimation, and show how robust a… ▽ More

    Submitted 19 October, 2012; originally announced December 2012.

    Comments: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003)

    Report number: UAI-P-2003-PG-567-574

  39. arXiv:1207.1382  [pdf

    cs.LG stat.ML

    Maximum Margin Bayesian Networks

    Authors: Yuhong Guo, Dana Wilkinson, Dale Schuurmans

    Abstract: We consider the problem of learning Bayesian network classifiers that maximize the marginover a set of classification variables. We find that this problem is harder for Bayesian networks than for undirected graphical models like maximum margin Markov networks. The main difficulty is that the parameters in a Bayesian network must satisfy additional normalization constraints that an undirected graph… ▽ More

    Submitted 4 July, 2012; originally announced July 2012.

    Comments: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)

    Report number: UAI-P-2005-PG-233-242

  40. arXiv:1206.6832  [pdf

    cs.LG stat.ML

    Convex Structure Learning for Bayesian Networks: Polynomial Feature Selection and Approximate Ordering

    Authors: Yuhong Guo, Dale Schuurmans

    Abstract: We present a new approach to learning the structure and parameters of a Bayesian network based on regularized estimation in an exponential family representation. Here we show that, given a fixed variable order, the optimal structure and parameters can be learned efficiently, even without restricting the size of the parent sets. We then consider the problem of optimizing the variable order for a gi… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)

    Report number: UAI-P-2006-PG-208-216

  41. arXiv:1206.6455  [pdf

    cs.LG stat.ML

    Regularizers versus Losses for Nonlinear Dimensionality Reduction: A Factored View with New Convex Relaxations

    Authors: Yaoliang Yu, James Neufeld, Ryan Kiros, Xinhua Zhang, Dale Schuurmans

    Abstract: We demonstrate that almost all non-parametric dimensionality reduction methods can be expressed by a simple procedure: regularized loss minimization plus singular value truncation. By distinguishing the role of the loss and regularizer in such a process, we recover a factored perspective that reveals some gaps in the current literature. Beyond identifying a useful new loss for manifold unfolding,… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

  42. arXiv:1202.3772  [pdf, ps, other

    cs.LG math.NA stat.ML

    Rank/Norm Regularization with Closed-Form Solutions: Application to Subspace Clustering

    Authors: Yao-Liang Yu, Dale Schuurmans

    Abstract: When data is sampled from an unknown subspace, principal component analysis (PCA) provides an effective way to estimate the subspace and hence reduce the dimension of the data. At the heart of PCA is the Eckart-Young-Mirsky theorem, which characterizes the best rank k approximation of a matrix. In this paper, we prove a generalization of the Eckart-Young-Mirsky theorem under all unitarily invarian… ▽ More

    Submitted 9 October, 2012; v1 submitted 14 February, 2012; originally announced February 2012.

    Comments: 11 pages, 1 figure, appeared in UAI 2011. One footnote corrected and appendix added

    Report number: UAI-P-2011-PG-778-785