Skip to main content

Showing 1–6 of 6 results for author: Soffair, N

.
  1. arXiv:2405.00877   

    cs.LG cs.AI

    Markov flow policy -- deep MC

    Authors: Nitsan Soffair, Gilad Katz

    Abstract: Discounted algorithms often encounter evaluation errors due to their reliance on short-term estimations, which can impede their efficacy in addressing simple, short-term tasks and impose undesired temporal discounts (\(γ\)). Interestingly, these algorithms are often tested without applying a discount, a phenomenon we refer as the \textit{train-test bias}. In response to these challenges, we propos… ▽ More

    Submitted 2 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: Paper do not ready

  2. arXiv:2403.05732   

    cs.AI cs.LG

    Conservative DDPG -- Pessimistic RL without Ensemble

    Authors: Nitsan Soffair, Shie Mannor

    Abstract: DDPG is hindered by the overestimation bias problem, wherein its $Q$-estimates tend to overstate the actual $Q$-values. Traditional solutions to this bias involve ensemble-based methods, which require significant computational resources, or complex log-policy-based approaches, which are difficult to understand and implement. In contrast, we propose a straightforward solution using a $Q$-target and… ▽ More

    Submitted 2 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: Paper do not ready

  3. arXiv:2402.05951   

    cs.LG cs.AI

    MinMaxMin $Q$-learning

    Authors: Nitsan Soffair, Shie Mannor

    Abstract: MinMaxMin $Q$-learning is a novel optimistic Actor-Critic algorithm that addresses the problem of overestimation bias ($Q$-estimations are overestimating the real $Q$-values) inherent in conservative RL algorithms. Its core formula relies on the disagreement among $Q$-networks in the form of the min-batch MaxMin $Q$-networks distance which is added to the $Q$-target and used as the priority experi… ▽ More

    Submitted 2 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: Paper do not ready

  4. arXiv:2402.05950  [pdf, other

    cs.LG cs.AI

    SQT -- std $Q$-target

    Authors: Nitsan Soffair, Dotan Di-Castro, Orly Avner, Shie Mannor

    Abstract: Std $Q$-target is a conservative, actor-critic, ensemble, $Q$-learning-based algorithm, which is based on a single key $Q$-formula: $Q$-networks standard deviation, which is an "uncertainty penalty", and, serves as a minimalistic solution to the problem of overestimation bias. We implement SQT on top of TD3/TD7 code and test it against the state-of-the-art (SOTA) actor-critic algorithms, DDPG, TD3… ▽ More

    Submitted 2 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  5. arXiv:2301.01246   

    cs.AI cs.MA

    Optimizing Agent Collaboration through Heuristic Multi-Agent Planning

    Authors: Nitsan Soffair

    Abstract: The SOTA algorithms for addressing QDec-POMDP issues, QDec-FP and QDec-FPS, are unable to effectively tackle problems that involve different types of sensing agents. We propose a new algorithm that addresses this issue by requiring agents to adopt the same plan if one agent is unable to take a sensing action but the other can. Our algorithm performs significantly better than both QDec-FP and QDec-… ▽ More

    Submitted 2 June, 2024; v1 submitted 3 January, 2023; originally announced January 2023.

    Comments: Paper do not ready

  6. arXiv:2211.15411   

    cs.LG cs.AI cs.MA

    Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics

    Authors: Nitsan Soffair

    Abstract: WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of them cannot solve complex agents' cooperation domains. We give an algorithm to solve such problems. In the first stage, we solve a single-agent problem and get a policy. In the second stage, we solve the multi-agent problem with the single-agent policy. SA2MA has a clear advantage over all competitors in complex agents' cooperat… ▽ More

    Submitted 2 June, 2024; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: Paper do not ready