Skip to main content

Showing 1–14 of 14 results for author: Derman, E

.
  1. arXiv:2404.05440  [pdf, other

    cs.AI cs.LG

    Tree Search-Based Policy Optimization under Stochastic Execution Delay

    Authors: David Valensi, Esther Derman, Shie Mannor, Gal Dalal

    Abstract: The standard formulation of Markov decision processes (MDPs) assumes that the agent's decisions are executed immediately. However, in numerous realistic applications such as robotics or healthcare, actions are performed with a delay whose value can even be stochastic. In this work, we introduce stochastic delayed execution MDPs, a new formalism addressing random delays without resorting to state a… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Published in ICLR 2024

  2. arXiv:2309.01107  [pdf, other

    cs.LG

    Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

    Authors: Uri Gadot, Esther Derman, Navdeep Kumar, Maxence Mohamed Elfatihi, Kfir Levy, Shie Mannor

    Abstract: In robust Markov decision processes (RMDPs), it is assumed that the reward and the transition dynamics lie in a given uncertainty set. By targeting maximal return under the most adversarial model from that set, RMDPs address performance sensitivity to misspecified environments. Yet, to preserve computational tractability, the uncertainty set is traditionally independently structured for each state… ▽ More

    Submitted 12 February, 2024; v1 submitted 3 September, 2023; originally announced September 2023.

    Comments: accepted in AAAI2024

  3. arXiv:2303.06654  [pdf, other

    cs.LG cs.AI

    Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization

    Authors: Esther Derman, Yevgeniy Men, Matthieu Geist, Shie Mannor

    Abstract: Robust Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. To solve them, one typically resorts to robust optimization methods. However, this significantly increases computational complexity and limits scalability in both learning and planning. On the other hand, regularized MDPs show more stability in policy learning without impairing time complexity. Yet,… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

    Comments: Extended version of NeuIPS paper: arXiv:2110.06267

  4. arXiv:2301.13589  [pdf, ps, other

    cs.LG cs.AI

    Policy Gradient for Rectangular Robust Markov Decision Processes

    Authors: Navdeep Kumar, Esther Derman, Matthieu Geist, Kfir Levy, Shie Mannor

    Abstract: Policy gradient methods have become a standard for training reinforcement learning agents in a scalable and efficient manner. However, they do not account for transition uncertainty, whereas learning robust policies can be computationally expensive. In this paper, we introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (… ▽ More

    Submitted 10 December, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: Accepted to NeurIPS 2023

  5. arXiv:2110.06267  [pdf, other

    cs.LG math.OC

    Twice regularized MDPs and the equivalence between robustness and regularization

    Authors: Esther Derman, Matthieu Geist, Shie Mannor

    Abstract: Robust Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. To solve them, one typically resorts to robust optimization methods. However, this significantly increases computational complexity and limits scalability in both learning and planning. On the other hand, regularized MDPs show more stability in policy learning without impairing time complexity. Yet,… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021

  6. arXiv:2106.14829  [pdf

    cs.CV

    Dataset Bias Mitigation Through Analysis of CNN Training Scores

    Authors: Ekberjan Derman

    Abstract: Training datasets are crucial for convolutional neural network-based algorithms, which directly impact their overall performance. As such, using a well-structured dataset that has minimum level of bias is always desirable. In this paper, we proposed a novel, domain-independent approach, called score-based resampling (SBR), to locate the under-represented samples of the original training dataset ba… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

    Comments: 12 pages, 11 figures

  7. arXiv:2101.11992  [pdf, other

    cs.LG cs.AI

    Acting in Delayed Environments with Non-Stationary Markov Policies

    Authors: Esther Derman, Gal Dalal, Shie Mannor

    Abstract: The standard Markov Decision Process (MDP) formulation hinges on the assumption that an action is executed immediately after it was chosen. However, assuming it is often unrealistic and can lead to catastrophic failures in applications such as robotic manipulation, cloud computing, and finance. We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that… ▽ More

    Submitted 12 December, 2023; v1 submitted 28 January, 2021; originally announced January 2021.

    Comments: Published in ICLR 2021

  8. arXiv:2003.02894  [pdf, ps, other

    math.OC cs.LG stat.ML

    Distributional Robustness and Regularization in Reinforcement Learning

    Authors: Esther Derman, Shie Mannor

    Abstract: Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO's extension to sequential decision-making overcomes $\textit{external uncertainty}$ through the robust Markov Decision Process (MDP) setti… ▽ More

    Submitted 14 July, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: Accepted at the "Theoretical Foundations of Reinforcement Learning" Workshop - ICML 2020

  9. arXiv:1905.08188  [pdf, other

    cs.LG cs.AI stat.ML

    A Bayesian Approach to Robust Reinforcement Learning

    Authors: Esther Derman, Daniel Mankowitz, Timothy Mann, Shie Mannor

    Abstract: Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior. In this framework, transitions are modeled as arbitrary elements of a known and properly structured uncertainty set and a robust optimal policy can be derived under the worst-case scenario. In this study, we address the issue of learning in RMDPs using a Bayesian approach.… ▽ More

    Submitted 23 July, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

    Comments: Accepted to UAI 2019

  10. arXiv:1803.04848  [pdf, other

    cs.LG cs.AI stat.ML

    Soft-Robust Actor-Critic Policy-Gradient

    Authors: Esther Derman, Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

    Abstract: Robust Reinforcement Learning aims to derive optimal behavior that accounts for model uncertainty in dynamical systems. However, previous studies have shown that by considering the worst case scenario, robust policies can be overly conservative. Our soft-robust framework is an attempt to overcome this issue. In this paper, we present a novel Soft-Robust Actor-Critic algorithm (SR-AC). It learns an… ▽ More

    Submitted 24 October, 2018; v1 submitted 11 March, 2018; originally announced March 2018.

    Comments: UAI 2018

  11. arXiv:1709.02294  [pdf, ps, other

    math.ST

    Clustering and Model Selection via Penalized Likelihood for Different-sized Categorical Data Vectors

    Authors: Esther Derman, Erwan Le Pennec

    Abstract: In this study, we consider unsupervised clustering of categorical vectors that can be of different size using mixture. We use likelihood maximization to estimate the parameters of the underlying mixture model and a penalization technique to select the number of mixture components. Regardless of the true distribution that generated the data, we show that an explicit penalty, known up to a multiplic… ▽ More

    Submitted 7 September, 2017; originally announced September 2017.

  12. arXiv:1603.03268  [pdf, ps, other

    astro-ph.EP

    New transit observations for HAT-P-30 b, HAT-P-37 b, TrES-5 b, WASP-28 b, WASP-36 b, and WASP-39 b

    Authors: G. Maciejewski, D. Dimitrov, L. Mancini, J. Southworth, S. Ciceri, G. D'Ago, I. Bruni, St. Raetz, G. Nowak, J. Ohlert, D. Puchalski, G. Saral, E. Derman, R. Petrucci, E. Jofre, M. Seeliger, T. Henning

    Abstract: We present new transit light curves for planets in six extrasolar planetary systems. They were acquired with 0.4-2.2 m telescopes located in west Asia, Europe, and South America. When combined with literature data, they allowed us to redetermine system parameters in a homogeneous way. Our results for individual systems are in agreement with values reported in previous studies. We refined transit e… ▽ More

    Submitted 10 March, 2016; originally announced March 2016.

    Comments: Submitted to Acta Astronomica

  13. arXiv:1404.6354  [pdf, ps, other

    astro-ph.SR astro-ph.EP

    Transit Timing Analysis in the HAT-P-32 system

    Authors: M. Seeliger, D. Dimitrov, D. Kjurkchieva, M. Mallonn, M. Fernandez, M. Kitze, V. Casanova, G. Maciejewski, J. M. Ohlert, J. G. Schmidt, A. Pannicke, D. Puchalski, E. Göğüş, T. Güver, S. Bilir, T. Ak, M. M. Hohle, T. O. B. Schmidt, R. Errmann, E. Jensen, D. Cohen, L. Marschall, G. Saral, I. Bernt, E. Derman , et al. (2 additional authors not shown)

    Abstract: We present the results of 45 transit observations obtained for the transiting exoplanet HAT-P-32b. The transits have been observed using several telescopes mainly throughout the YETI network. In 25 cases, complete transit light curves with a timing precision better than $1.4\:$min have been obtained. These light curves have been used to refine the system properties, namely inclination $i$, planet-… ▽ More

    Submitted 25 April, 2014; originally announced April 2014.

    Comments: MNRAS accepted; 13 pages, 10 figures

  14. arXiv:cond-mat/0201345  [pdf

    cond-mat.dis-nn q-fin.TR

    The Perception of Time, Risk and Return During Periods of Speculation

    Authors: Emanuel Derman

    Abstract: What return should you expect when you take on a given amount of risk? How should that return depend upon other people's behavior? What principles can you use to answer these questions? In this paper, we approach these topics by exploring the consequences of two simple hypotheses about risk. The first is a common-sense invariance principle: assets with the same perceived risk must have the sam… ▽ More

    Submitted 18 January, 2002; originally announced January 2002.

    Comments: Front page plus 36