Skip to main content

Showing 1–40 of 40 results for author: Dudik, M

.
  1. arXiv:2401.14893  [pdf, other

    cs.LG cs.CY stat.AP stat.ML

    A structured regression approach for evaluating model performance across intersectional subgroups

    Authors: Christine Herlihy, Kimberly Truong, Alexandra Chouldechova, Miroslav Dudik

    Abstract: Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups defined by combinations of demographic or other sensitive attributes. The standard approach is to stratify the evaluation data across subgroups and compute performance metrics separately for each group. However, even for moderately-sized evaluatio… ▽ More

    Submitted 14 May, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

  2. arXiv:2311.03534  [pdf, other

    cs.LG cs.AI cs.RO

    PcLast: Discovering Plannable Continuous Latent States

    Authors: Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, Lekan Molu, Miro Dudik, John Langford, Alex Lamb

    Abstract: Goal-conditioned planning benefits from learned low-dimensional representations of rich observations. While compact latent representations typically learned from variational autoencoders or inverse dynamics enable goal-conditioned decision making, they ignore state reachability, hampering their performance. In this paper, we learn a representation that associates reachable states together for effe… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at ICML 2024

  3. arXiv:2306.06184  [pdf, other

    cs.LG stat.ML

    A Unified Model and Dimension for Interactive Estimation

    Authors: Nataly Brukhim, Miroslav Dudik, Aldo Pacchiano, Robert Schapire

    Abstract: We study an abstract framework for interactive learning called interactive estimation in which the goal is to estimate a target from its "similarity'' to points queried by the learner. We introduce a combinatorial measure called dissimilarity dimension which largely captures learnability in our model. We present a simple, general, and broadly-applicable algorithm, for which we obtain both regret a… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  4. arXiv:2303.16626  [pdf, ps, other

    cs.LG cs.AI cs.CY

    Fairlearn: Assessing and Improving Fairness of AI Systems

    Authors: Hilde Weerts, Miroslav Dudík, Richard Edgar, Adrin Jalali, Roman Lutz, Michael Madaio

    Abstract: Fairlearn is an open source project to help practitioners assess and improve fairness of artificial intelligence (AI) systems. The associated Python library, also named fairlearn, supports evaluation of a model's output across affected populations and includes several algorithms for mitigating fairness issues. Grounded in the understanding that fairness is a sociotechnical challenge, the project i… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  5. arXiv:2205.14237  [pdf, other

    cs.LG cs.AI stat.ML

    Provably Sample-Efficient RL with Side Information about Latent Dynamics

    Authors: Yao Liu, Dipendra Misra, Miro Dudík, Robert E. Schapire

    Abstract: We study reinforcement learning (RL) in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space, as is the case, for example, when a robot is tasked to go to a specific room in a building using observations from its own camera, while having access to the floor plan. We formalize this setting as transfer reinfor… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 35 pages, 4 figures

  6. arXiv:2205.03260  [pdf, other

    math.OC cs.LG

    Convex Analysis at Infinity: An Introduction to Astral Space

    Authors: Miroslav Dudík, Robert E. Schapire, Matus Telgarsky

    Abstract: Not all convex functions on $\mathbb{R}^n$ have finite minimizers; some can only be minimized by a sequence as it heads to infinity. In this work, we aim to develop a theory for understanding such minimizers at infinity. We study astral space, a compact extension of $\mathbb{R}^n$ to which such points at infinity have been added. Astral space is constructed to be as small as possible while still e… ▽ More

    Submitted 11 January, 2023; v1 submitted 6 May, 2022; originally announced May 2022.

  7. arXiv:2202.05318  [pdf, other

    stat.ML cs.CR cs.LG math.OC

    Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning

    Authors: Alberto Bietti, Chen-Yu Wei, Miroslav Dudík, John Langford, Zhiwei Steven Wu

    Abstract: Large-scale machine learning systems often involve data distributed across a collection of users. Federated learning algorithms leverage this structure by communicating model updates to a central server, rather than entire datasets. In this paper, we study stochastic optimization algorithms for a personalized federated learning setting involving local and global models subject to user-level (joint… ▽ More

    Submitted 15 July, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: ICML

  8. arXiv:2107.01509  [pdf, other

    cs.LG math.ST stat.ML

    Bayesian decision-making under misspecified priors with applications to meta-learning

    Authors: Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire

    Abstract: Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The choice of prior in these algorithms offers flexibility to encode domain knowledge but can also lead to poor performance when misspecified. In this paper, we demonstrate that performance degrades gracefully with misspecifi… ▽ More

    Submitted 3 July, 2021; originally announced July 2021.

  9. arXiv:2102.07308  [pdf, other

    cs.GT cs.MA

    Log-time Prediction Markets for Interval Securities

    Authors: Miroslav Dudík, Xintong Wang, David M. Pennock, David M. Rothschild

    Abstract: We design a prediction market to recover a complete and fully general probability distribution over a random variable. Traders buy and sell interval securities that pay \$1 if the outcome falls into an interval and \$0 otherwise. Our market takes the form of a central automated market maker and allows traders to express interval endpoints of arbitrary precision. We present two designs in both of w… ▽ More

    Submitted 15 February, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

    Comments: To appear in AAMAS 2021

  10. arXiv:2102.07024  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Interactive Learning from Activity Description

    Authors: Khanh Nguyen, Dipendra Misra, Robert Schapire, Miro Dudík, Patrick Shafto

    Abstract: We present a novel interactive learning protocol that enables training request-fulfilling agents by verbally describing their activities. Unlike imitation learning (IL), our protocol allows the teaching agent to provide feedback in a language that is most appropriate for them. Compared with reward in reinforcement learning (RL), the description feedback is richer and allows for improved sample com… ▽ More

    Submitted 14 June, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: ICML 2021

  11. arXiv:2006.11226  [pdf, other

    cs.LG math.OC stat.ML

    Gradient descent follows the regularization path for general losses

    Authors: Ziwei Ji, Miroslav Dudík, Robert E. Schapire, Matus Telgarsky

    Abstract: Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an implicit bias. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss. In this work, we s… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: To appear, COLT 2020

  12. arXiv:2006.05051  [pdf, other

    cs.LG cs.AI cs.DS stat.ML

    Constrained episodic reinforcement learning in concave-convex and knapsack settings

    Authors: Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

    Abstract: We propose an algorithm for tabular episodic reinforcement learning with constraints. We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints, and for settings with hard constraints (knapsacks). Most of the previous work in constrained reinforcement learning is limited to linear constraints, and the remaining work focuses on either… ▽ More

    Submitted 5 June, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: The NeurIPS 2020 version of this paper includes a small bug, leading to an incorrect dependence on H in Theorem 3.4. This version fixes it by adjusting Eq. (9), Theorem 3.4 and the relevant proofs. Changes in the main text are noted in red. Changes in the appendix are limited to Appendices B.1, B.5, and B.6 and the statement of Lemma F.3

  13. arXiv:1907.09623  [pdf, other

    cs.LG stat.ML

    Doubly robust off-policy evaluation with shrinkage

    Authors: Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudík

    Abstract: We propose a new framework for designing estimators for off-policy evaluation in contextual bandits. Our approach is based on the asymptotically optimal doubly robust estimator, but we shrink the importance weights to minimize a bound on the mean squared error, which results in a better bias-variance tradeoff in finite samples. We use this optimization-based framework to obtain three estimators: (… ▽ More

    Submitted 18 September, 2020; v1 submitted 22 July, 2019; originally announced July 2019.

    Journal ref: International Conference on Machine Learning (2020)

  14. arXiv:1906.09323  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    Reinforcement Learning with Convex Constraints

    Authors: Sobhan Miryoosefi, Kianté Brantley, Hal Daumé III, Miroslav Dudik, Robert Schapire

    Abstract: In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we… ▽ More

    Submitted 11 November, 2019; v1 submitted 21 June, 2019; originally announced June 2019.

    Journal ref: Advances in Neural Information Processing Systems 32 (2019), 14093-14102

  15. arXiv:1905.12843  [pdf, other

    cs.LG stat.ML

    Fair Regression: Quantitative Definitions and Reduction-based Algorithms

    Authors: Alekh Agarwal, Miroslav Dudík, Zhiwei Steven Wu

    Abstract: In this paper, we study the prediction of a real-valued target, such as a risk score or recidivism rate, while guaranteeing a quantitative notion of fairness with respect to a protected attribute such as gender or race. We call this class of problems \emph{fair regression}. We propose general schemes for fair regression under two notions of fairness: (1) statistical parity, which asks that the pre… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  16. arXiv:1901.09018  [pdf, other

    cs.LG stat.ML

    Provably efficient RL with Rich Observations via Latent State Decoding

    Authors: Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford

    Abstract: We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a map** from the observations to latent states inductively through a sequence of regression and clustering steps -- where previously decoded latent states provide labels for later regression problems --… ▽ More

    Submitted 9 September, 2021; v1 submitted 25 January, 2019; originally announced January 2019.

    Comments: The ICML 2019 version omitted the second constraint on $ε$ in Theorem 4.1. We thank Yonathan Efroni for calling this to our attention

  17. arXiv:1812.05239  [pdf, other

    cs.HC cs.CY cs.LG cs.SE

    Improving fairness in machine learning systems: What do industry practitioners need?

    Authors: Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudík, Hanna Wallach

    Abstract: The potential for machine learning (ML) systems to amplify social inequities and unfairness is receiving increasing popular and academic attention. A surge of recent work has focused on the development of algorithmic tools to assess and mitigate such unfairness. If these tools are to have a positive impact on industry practice, however, it is crucial that their design be informed by an understandi… ▽ More

    Submitted 7 January, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

    Comments: To appear in the 2019 ACM CHI Conference on Human Factors in Computing Systems (CHI 2019)

  18. arXiv:1803.02453  [pdf, other

    cs.LG

    A Reductions Approach to Fair Classification

    Authors: Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, Hanna Wallach

    Abstract: We present a systematic approach for achieving fairness in a binary classification setting. While we focus on two well-known quantitative definitions of fairness, our approach encompasses many other previously studied definitions as special cases. The key idea is to reduce fair classification to a sequence of cost-sensitive classification problems, whose solutions yield a randomized classifier wit… ▽ More

    Submitted 16 July, 2018; v1 submitted 6 March, 2018; originally announced March 2018.

  19. arXiv:1803.01088  [pdf, other

    cs.LG stat.ML

    Practical Contextual Bandits with Regression Oracles

    Authors: Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire

    Abstract: A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded. We present a new technique that has the empirical and computational advantages of realizability-based approaches combined with the flexibility of agnostic methods. Our algorithms leverage the availability of a regression oracle for the value-function clas… ▽ More

    Submitted 2 March, 2018; originally announced March 2018.

  20. arXiv:1803.00590  [pdf, other

    cs.LG cs.AI stat.ML

    Hierarchical Imitation and Reinforcement Learning

    Authors: Hoang M. Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, Hal Daumé III

    Abstract: We study how to effectively leverage expert feedback to learn sequential decision-making policies. We focus on problems with sparse rewards and long time horizons, which typically pose significant challenges in reinforcement learning. We propose an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes o… ▽ More

    Submitted 9 June, 2018; v1 submitted 1 March, 2018; originally announced March 2018.

    Comments: Proceedings of the 35th International Conference on Machine Learning (ICML 2018)

  21. arXiv:1702.07810  [pdf, other

    cs.GT

    A Decomposition of Forecast Error in Prediction Markets

    Authors: Miroslav Dudík, Sébastien Lahaie, Ryan Rogers, Jennifer Wortman Vaughan

    Abstract: We analyze sources of error in prediction market forecasts in order to bound the difference between a security's price and the ground truth it estimates. We consider cost-function-based prediction markets in which an automated market maker adjusts security prices according to the history of trade. We decompose the forecasting error into three components: sampling error, arising because traders onl… ▽ More

    Submitted 20 February, 2018; v1 submitted 24 February, 2017; originally announced February 2017.

    Journal ref: Advances in Neural Information Processing Systems 30 (NIPS 2017)

  22. arXiv:1612.01205  [pdf, other

    stat.ML cs.LG

    Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

    Authors: Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik

    Abstract: We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model. We consider the general (agnostic) setting without access to a consistent model of rewards and establish a minimax lower bound on the mean squared error (MSE). The bound is matched up to constants by the inverse propensity scoring (IPS) an… ▽ More

    Submitted 11 November, 2017; v1 submitted 4 December, 2016; originally announced December 2016.

    Journal ref: International Conference on Machine Learning (pp. 3589-3597) (2017)

  23. arXiv:1611.01688  [pdf, other

    cs.LG cs.DS cs.GT

    Oracle-Efficient Online Learning and Auction Design

    Authors: Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, Jennifer Wortman Vaughan

    Abstract: We consider the design of computationally efficient online learning algorithms in an adversarial setting in which the learner has access to an offline optimization oracle. We present an algorithm called Generalized Follow-the-Perturbed-Leader and provide conditions under which it is oracle-efficient while achieving vanishing regret. Our results make significant progress on an open problem raised b… ▽ More

    Submitted 5 August, 2019; v1 submitted 5 November, 2016; originally announced November 2016.

    Comments: An earlier version of this paper appeared in FOCS 2017

  24. Arbitrage-Free Combinatorial Market Making via Integer Programming

    Authors: Christian Kroer, Miroslav Dudík, Sébastien Lahaie, Sivaraman Balakrishnan

    Abstract: We present a new combinatorial market maker that operates arbitrage-free combinatorial prediction markets specified by integer programs. Although the problem of arbitrage-free pricing, while maintaining a bound on the subsidy provided by the market maker, is #P-hard in the worst case, we posit that the typical case might be amenable to modern integer programming (IP) solvers. At the crux of our me… ▽ More

    Submitted 10 June, 2016; v1 submitted 9 June, 2016; originally announced June 2016.

  25. arXiv:1605.04812  [pdf, other

    cs.LG cs.AI stat.ML

    Off-policy evaluation for slate recommendation

    Authors: Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni

    Abstract: This paper studies the evaluation of policies that recommend an ordered set of items (e.g., a ranking) based on some context---a common scenario in web search, ads, and recommendation. We build on techniques from combinatorial bandits to introduce a new practical estimator that uses logged data to estimate a policy's performance. A thorough empirical evaluation on real-world data reveals that our… ▽ More

    Submitted 6 November, 2017; v1 submitted 16 May, 2016; originally announced May 2016.

    Comments: 31 pages (9 main paper, 20 supplementary), 12 figures (2 main paper, 10 supplementary)

  26. arXiv:1510.02045  [pdf, other

    cs.GT cs.AI

    Budget Constraints in Prediction Markets

    Authors: Nikhil Devanur, Miroslav Dudík, Zhiyi Huang, David M. Pennock

    Abstract: We give a detailed characterization of optimal trades under budget constraints in a prediction market with a cost-function-based automated market maker. We study how the budget constraints of individual traders affect their ability to impact the market price. As a concrete application of our characterization, we give sufficient conditions for a property we call budget additivity: two traders with… ▽ More

    Submitted 7 October, 2015; originally announced October 2015.

    Journal ref: In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence, pages 238-247, 2015

  27. arXiv:1506.04513  [pdf, other

    cs.LG stat.ML

    Convex Risk Minimization and Conditional Probability Estimation

    Authors: Matus Telgarsky, Miroslav Dudík, Robert Schapire

    Abstract: This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem. Unlike most previous work, we give results that are general enough to include cases in which no minimum exists, as occurs typically, for instance, with standard boosting algorithms. Concretely, we first show that any se… ▽ More

    Submitted 15 June, 2015; originally announced June 2015.

    Comments: To appear, COLT 2015

  28. arXiv:1503.02834  [pdf, ps, other

    stat.ME cs.AI

    Doubly Robust Policy Evaluation and Optimization

    Authors: Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li

    Abstract: We study sequential decision making in environments where rewards are only partially observed, but can be modeled as a function of observed contexts and the chosen action by the decision maker. This setting, known as contextual bandits, encompasses a wide variety of applications such as health care, content recommendation and Internet advertising. A central task is evaluation of a new policy given… ▽ More

    Submitted 10 March, 2015; originally announced March 2015.

    Comments: Published in at http://dx.doi.org/10.1214/14-STS500 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS500

    Journal ref: Statistical Science 2014, Vol. 29, No. 4, 485-511

  29. arXiv:1502.06362  [pdf, other

    cs.LG

    Contextual Dueling Bandits

    Authors: Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi

    Abstract: We consider the problem of learning to choose actions using contextual information when provided with limited feedback in the form of relative pairwise comparisons. We study this problem in the dueling-bandits framework of Yue et al. (2009), which we extend to incorporate context. Roughly, the learner's goal is to find the best policy, or way of behaving, in some space of policies, although "best"… ▽ More

    Submitted 13 June, 2015; v1 submitted 23 February, 2015; originally announced February 2015.

    Comments: 25 pages, 4 figures, Published at COLT 2015

  30. arXiv:1502.05890  [pdf, other

    cs.LG stat.ML

    Contextual Semibandits via Supervised Learning Oracles

    Authors: Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudik

    Abstract: We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this feedback. These problems, known as contextual semibandits, arise in crowdsourcing, recommendation, and many other domains. This paper reduces contextual semibandits t… ▽ More

    Submitted 4 November, 2016; v1 submitted 20 February, 2015; originally announced February 2015.

  31. arXiv:1407.8161  [pdf, ps, other

    cs.GT cs.AI

    Market Making with Decreasing Utility for Information

    Authors: Miroslav Dudík, Rafael Frongillo, Jennifer Wortman Vaughan

    Abstract: We study information elicitation in cost-function-based combinatorial prediction markets when the market maker's utility for information decreases over time. In the sudden revelation setting, it is known that some piece of information will be revealed to traders, and the market maker wishes to prevent guaranteed profits for trading on the sure information. In the gradual decrease setting, the mark… ▽ More

    Submitted 30 July, 2014; originally announced July 2014.

    Journal ref: M. Dudik, R. Frongillo, and J. Wortman Vaughan. Market Making with Decreasing Utility for Information. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, pages 152-161, 2014

  32. arXiv:1310.8243  [pdf, other

    cs.LG stat.ML

    Para-active learning

    Authors: Alekh Agarwal, Leon Bottou, Miroslav Dudik, John Langford

    Abstract: Training examples are not all equally informative. Active learning strategies leverage this observation in order to massively reduce the number of examples that need to be labeled. We leverage the same observation to build a generic strategy for parallelizing learning algorithms. This strategy is effective because the search for informative examples is highly parallelizable and because we show tha… ▽ More

    Submitted 30 October, 2013; originally announced October 2013.

  33. arXiv:1210.4862  [pdf

    cs.LG stat.ML

    Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits

    Authors: Miroslav Dudik, Dumitru Erhan, John Langford, Lihong Li

    Abstract: We present and prove properties of a new offline policy evaluator for an exploration learning setting which is superior to previous evaluators. In particular, it simultaneously and correctly incorporates techniques from importance weighting, doubly robust evaluation, and nonstationary policy evaluation approaches. In addition, our approach allows generating longer histories by careful control of a… ▽ More

    Submitted 16 October, 2012; originally announced October 2012.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

    Report number: UAI-P-2012-PG-247-254

  34. arXiv:1205.2649  [pdf

    cs.GT

    A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games

    Authors: Miroslav Dudik, Geoffrey Gordon

    Abstract: A central task of artificial intelligence is the design of artificial agents that act towards specified goals in partially observed environments. Since such environments frequently include interaction over time with other agents with their own goals, reasoning about such interaction relies on sequential game-theoretic models such as extensive-form games or some of their succinct representations su… ▽ More

    Submitted 9 May, 2012; originally announced May 2012.

    Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

    Report number: UAI-P-2009-PG-151-160

  35. arXiv:1205.2644  [pdf

    cs.LO cs.AI

    First-Order Mixed Integer Linear Programming

    Authors: Geoffrey Gordon, Sue Ann Hong, Miroslav Dudik

    Abstract: Mixed integer linear programming (MILP) is a powerful representation often used to formulate decision-making problems under uncertainty. However, it lacks a natural mechanism to reason about objects, classes of objects, and relations. First-order logic (FOL), on the other hand, excels at reasoning about classes of objects, but lacks a rich representation of uncertainty. While representing proposit… ▽ More

    Submitted 9 May, 2012; originally announced May 2012.

    Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

    Report number: UAI-P-2009-PG-213-222

  36. arXiv:1202.1334  [pdf, ps, other

    cs.LG

    Contextual Bandit Learning with Predictable Rewards

    Authors: Alekh Agarwal, Miroslav Dudík, Satyen Kale, John Langford, Robert E. Schapire

    Abstract: Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context. We consider this problem under a realizability assumption: there exists a function in a (known) function class, always capable of predicting the expected reward, given the action and context. Under t… ▽ More

    Submitted 2 March, 2012; v1 submitted 6 February, 2012; originally announced February 2012.

  37. arXiv:1110.4198  [pdf, other

    cs.LG stat.ML

    A Reliable Effective Terascale Linear Learning System

    Authors: Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford

    Abstract: We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.} billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are new, but the… ▽ More

    Submitted 11 July, 2013; v1 submitted 19 October, 2011; originally announced October 2011.

  38. arXiv:1106.2369  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Efficient Optimal Learning for Contextual Bandits

    Authors: Miroslav Dudik, Daniel Hsu, Satyen Kale, Nikos Karampatziakis, John Langford, Lev Reyzin, Tong Zhang

    Abstract: We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle and has a running time $\mathrm{polylog}(N)$, where $N$ is the number of classificati… ▽ More

    Submitted 12 June, 2011; originally announced June 2011.

  39. arXiv:1103.4601  [pdf, ps, other

    cs.LG cs.AI cs.RO stat.AP stat.ML

    Doubly Robust Policy Evaluation and Learning

    Authors: Miroslav Dudik, John Langford, Lihong Li

    Abstract: We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as contextual bandits, encompasses a wide variety of applications including health-care policy and Internet advertising. A central task is evaluation of a new policy given historic data consisting of contexts, actions and r… ▽ More

    Submitted 5 May, 2011; v1 submitted 23 March, 2011; originally announced March 2011.

    Comments: Published at ICML 2011, 8 pages, 6 figures

  40. arXiv:0712.2437  [pdf, other

    q-bio.QM cond-mat.dis-nn q-bio.NC

    Faster solutions of the inverse pairwise Ising problem

    Authors: Tamara Broderick, Miroslav Dudik, Gasper Tkacik, Robert E. Schapire, William Bialek

    Abstract: Recent work has shown that probabilistic models based on pairwise interactions-in the simplest case, the Ising model-provide surprisingly accurate descriptions of experiments on real biological networks ranging from neurons to genes. Finding these models requires us to solve an inverse problem: given experimentally measured expectation values, what are the parameters of the underlying Hamiltonia… ▽ More

    Submitted 15 December, 2007; v1 submitted 14 December, 2007; originally announced December 2007.