Skip to main content

Showing 1–19 of 19 results for author: Mannor, S

Searching in archive math. Search in all archives.
.
  1. arXiv:2301.13642  [pdf, other

    cs.LG math.OC

    An Efficient Solution to s-Rectangular Robust Markov Decision Processes

    Authors: Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor

    Abstract: We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method. We do so by deriving the optimal robust Bellman operator in concrete forms using our $L_p$ water filling lemma. We unveil the exact form of the optimal policies, whic… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2205.14327

  2. arXiv:2110.06267  [pdf, other

    cs.LG math.OC

    Twice regularized MDPs and the equivalence between robustness and regularization

    Authors: Esther Derman, Matthieu Geist, Shie Mannor

    Abstract: Robust Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. To solve them, one typically resorts to robust optimization methods. However, this significantly increases computational complexity and limits scalability in both learning and planning. On the other hand, regularized MDPs show more stability in policy learning without impairing time complexity. Yet,… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021

  3. arXiv:2102.03802  [pdf, other

    cs.LG math.ST stat.ML

    Dimension Free Generalization Bounds for Non Linear Metric Learning

    Authors: Mark Kozdoba, Shie Mannor

    Abstract: In this work we study generalization guarantees for the metric learning problem, where the metric is induced by a neural network type embedding of the data. Specifically, we provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime which we term \emph{bounded amplification}. The sparse regime bounds correspond to situations where $\ell_1$-type norms of the… ▽ More

    Submitted 7 February, 2021; originally announced February 2021.

  4. arXiv:2007.13232  [pdf, other

    math.PR cs.DM math.CO math.OC physics.data-an

    The Pendulum Arrangement: Maximizing the Escape Time of Heterogeneous Random Walks

    Authors: Asaf Cassel, Shie Mannor, Guy Tennenholtz

    Abstract: We identify a fundamental phenomenon of heterogeneous one dimensional random walks: the escape (traversal) time is maximized when the heterogeneity in transition probabilities forms a pyramid-like potential barrier. This barrier corresponds to a distinct arrangement of transition probabilities, sometimes referred to as the pendulum arrangement. We reduce this problem to a sum over products, combin… ▽ More

    Submitted 28 July, 2020; v1 submitted 26 July, 2020; originally announced July 2020.

    Comments: Names ordered alphabetically

  5. arXiv:2003.02894  [pdf, ps, other

    math.OC cs.LG stat.ML

    Distributional Robustness and Regularization in Reinforcement Learning

    Authors: Esther Derman, Shie Mannor

    Abstract: Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO's extension to sequential decision-making overcomes $\textit{external uncertainty}$ through the robust Markov Decision Process (MDP) setti… ▽ More

    Submitted 14 July, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: Accepted at the "Theoretical Foundations of Reinforcement Learning" Workshop - ICML 2020

  6. arXiv:1909.04236  [pdf, other

    cs.LG math.OC stat.ML

    Online Planning with Lookahead Policies

    Authors: Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor

    Abstract: Real Time Dynamic Programming (RTDP) is an online algorithm based on Dynamic Programming (DP) that acts by 1-step greedy planning. Unlike DP, RTDP does not require access to the entire state space, i.e., it explicitly handles the exploration. This fact makes RTDP particularly appealing when the state space is large and it is not possible to update all states simultaneously. In this we devise a mul… ▽ More

    Submitted 12 October, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: NeurIPS 2020

  7. arXiv:1909.02769  [pdf, ps, other

    cs.LG math.OC stat.ML

    Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

    Authors: Lior Shani, Yonathan Efroni, Shie Mannor

    Abstract: Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be 'close' to one another, is iteratively solved. Nevertheless, TRPO has been considered a heuristic algorithm inspired by Conservative Policy Iteration (CPI). We show that the adaptive scaling me… ▽ More

    Submitted 12 December, 2019; v1 submitted 6 September, 2019; originally announced September 2019.

    Comments: Published at AAAI-2020 58 pages

  8. arXiv:1902.04376  [pdf, ps, other

    stat.ML cs.LG math.OC

    An adaptive stochastic optimization algorithm for resource allocation

    Authors: Xavier Fontaine, Shie Mannor, Vianney Perchet

    Abstract: We consider the classical problem of sequential resource allocation where a decision maker must repeatedly divide a budget between several resources, each with diminishing returns. This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret. We construct an algorithm that is {\em adaptive} to the… ▽ More

    Submitted 16 January, 2020; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: ALT2020, 45 pages, 9 figures

    Journal ref: Proceedings of Machine Learning Research (PMLR), volume 117, 2020

  9. arXiv:1809.05870  [pdf, other

    math.ST cs.AI cs.LG math.OC

    On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters

    Authors: Mark Kozdoba, Jakub Marecek, Tigran Tchrakian, Shie Mannor

    Abstract: Kalman filter is a key tool for time-series forecasting and analysis. We show that the dependence of a prediction of Kalman filter on the past is decaying exponentially, whenever the process noise is non-degenerate. Therefore, Kalman filter may be approximated by regression on a few recent observations. Surprisingly, we also show that having some process noise is essential for the exponential deca… ▽ More

    Submitted 16 September, 2018; originally announced September 2018.

    Journal ref: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019. Pages: 4098-4105

  10. arXiv:1506.02188  [pdf, other

    cs.AI math.OC

    Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach

    Authors: Yinlam Chow, Aviv Tamar, Shie Mannor, Marco Pavone

    Abstract: In this paper we address the problem of decision making within a Markov decision process (MDP) framework where risk and modeling errors are taken into account. Our approach is to minimize a risk-sensitive conditional-value-at-risk (CVaR) objective, as opposed to a standard risk-neutral expectation. We refer to such problem as CVaR MDP. Our first contribution is to show that a CVaR objective, besid… ▽ More

    Submitted 6 June, 2015; originally announced June 2015.

    Comments: Submitted to NIPS 15

  11. arXiv:1402.6361  [pdf, ps, other

    math.OC cs.LG

    Oracle-Based Robust Optimization via Online Learning

    Authors: Aharon Ben-Tal, Elad Hazan, Tomer Koren, Shie Mannor

    Abstract: Robust optimization is a common framework in optimization under uncertainty when the problem parameters are not known, but it is rather known that the parameters belong to some given uncertainty set. In the robust optimization framework the problem solved is a min-max problem where a solution is judged according to its performance on the worst possible realization of the parameters. In many cases,… ▽ More

    Submitted 25 February, 2014; originally announced February 2014.

  12. arXiv:1402.2043  [pdf, other

    stat.ML cs.LG math.ST

    Approachability in unknown games: Online learning meets multi-objective optimization

    Authors: Shie Mannor, Vianney Perchet, Gilles Stoltz

    Abstract: In the standard setting of approachability there are two players and a target set. The players play repeatedly a known vector-valued game where the first player wants to have the average vector-valued payoff converge to the target set which the other player tries to exclude it from this set. We revisit this setting in the spirit of online learning and do not assume that the first player knows the… ▽ More

    Submitted 17 June, 2016; v1 submitted 10 February, 2014; originally announced February 2014.

  13. arXiv:1305.5399  [pdf, other

    math.OC cs.GT cs.LG stat.ML

    A Primal Condition for Approachability with Partial Monitoring

    Authors: Shie Mannor, Vianney Perchet, Gilles Stoltz

    Abstract: In approachability with full monitoring there are two types of conditions that are known to be equivalent for convex sets: a primal and a dual condition. The primal one is of the form: a set C is approachable if and only all containing half-spaces are approachable in the one-shot game; while the dual one is of the form: a convex set C is approachable if and only if it intersects all payoff sets of… ▽ More

    Submitted 23 May, 2013; originally announced May 2013.

  14. arXiv:1301.2725  [pdf, other

    stat.ML cs.IT cs.LG math.ST

    Robust High Dimensional Sparse Regression and Matching Pursuit

    Authors: Yudong Chen, Constantine Caramanis, Shie Mannor

    Abstract: We consider high dimensional sparse regression, and develop strategies able to deal with arbitrary -- possibly, severe or coordinated -- errors in the covariance matrix $X$. These may come from corrupted data, persistent experimental errors, or malicious respondents in surveys/recommender systems, etc. Such non-stochastic error-in-variables problems are notoriously difficult to treat, and as we de… ▽ More

    Submitted 12 January, 2013; originally announced January 2013.

  15. arXiv:1206.6404  [pdf

    cs.LG cs.CY math.OC stat.ML

    Policy Gradients with Variance Related Risk Criteria

    Authors: Dotan Di Castro, Aviv Tamar, Shie Mannor

    Abstract: Managing risk in dynamic decision problems is of cardinal importance in many fields such as finance and process control. The most common approach to defining risk is through various variance related criteria such as the Sharpe Ratio or the standard deviation adjusted reward. It is known that optimizing many of the variance related risk criteria is NP-hard. In this paper we devise a framework for l… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

  16. arXiv:1203.1072  [pdf, other

    math.OC math.PR

    Go Viral, or Not: Rate-Optimal Control for Resource-Constrained Branching Processes

    Authors: Shie Mannor, Kuang Xu

    Abstract: We propose and analyze a new class of controlled multi-type branching processes with a per-step linear resource constraint, motivated by potential applications in viral marketing and cancer treatment. We show that the optimal exponential growth rate of the population can be achieved by maintaining a fixed proportion among the species, for both deterministic and stochastic branching processes. In t… ▽ More

    Submitted 8 January, 2013; v1 submitted 5 March, 2012; originally announced March 2012.

  17. arXiv:1109.3151  [pdf, other

    eess.SY math.OC

    Regulation, Volatility and Efficiency in Continuous-Time Markets

    Authors: Arman C. Kizilkale, Shie Mannor

    Abstract: We analyze the efficiency of markets with friction, particularly power markets. We model the market as a dynamic system with $(d_t;\,t\geq 0)$ the demand process and $(s_t;\,t\geq 0)$ the supply process. Using stochastic differential equations to model the dynamics with friction, we investigate the efficiency of the market under an integrated expected undiscounted cost function solving the optimal… ▽ More

    Submitted 14 September, 2011; originally announced September 2011.

  18. arXiv:1105.4995  [pdf, ps, other

    math.ST cs.LG

    Robust approachability and regret minimization in games with partial monitoring

    Authors: Shie Mannor, Vianney Perchet, Gilles Stoltz

    Abstract: Approachability has become a standard tool in analyzing earning algorithms in the adversarial online learning setup. We develop a variant of approachability for games where there is ambiguity in the obtained reward that belongs to a set, rather than being a single vector. Using this variant we tackle the problem of approachability in games with partial monitoring and develop simple and efficient a… ▽ More

    Submitted 15 February, 2012; v1 submitted 25 May, 2011; originally announced May 2011.

  19. arXiv:math/0701419  [pdf, ps, other

    math.ST cs.LG

    Strategies for prediction under imperfect monitoring

    Authors: Gabor Lugosi, Shie Mannor, Gilles Stoltz

    Abstract: We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal. The proposed strategies are consistent in the sense that they achieve, asymptotically, the best possible average reward. It was Rustichini (1999) who first proved the existence of such consistent pred… ▽ More

    Submitted 7 January, 2008; v1 submitted 15 January, 2007; originally announced January 2007.

    Comments: Journal version of a COLT conference paper

    MSC Class: 91A20; 62L12; 68Q32

    Journal ref: Mathematics of Operations Research (2008) à paraître