Target Networks and Over-parameterization Stabilize Off-policy Bootstrap** with Function Approximation
Authors:
Fengdi Che,
Chenjun Xiao,
**cheng Mei,
Bo Dai,
Ramki Gummadi,
Oscar A Ramirez,
Christopher K Harris,
A. Rupam Mahmood,
Dale Schuurmans
Abstract:
We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision pr…
▽ More
We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision processes. Notably, using only a target network or an over-parameterized model does not provide such a convergence guarantee. Additionally, we extend our results to learning with truncated trajectories, showing that convergence is achievable for all tasks with minor modifications, akin to value truncation for the final states in trajectories. Our primary result focuses on temporal difference estimation for prediction, providing high-probability value estimation error bounds and empirical analysis on Baird's counterexample and a Four-room task. Furthermore, we explore the control setting, demonstrating that similar convergence conditions apply to Q-learning.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
Park4U Mate: Context-Aware Digital Assistant for Personalized Autonomous Parking
Authors:
Antonyo Musabini,
Evin Bozbayir,
Hervé Marcasuzaa,
Omar Adair Islas Ramírez
Abstract:
People park their vehicle depending on interior and exterior contexts. They do it naturally, even unconsciously. For instance, with a baby seat on the rear, the driver might leave more space on one side to be able to get the baby out easily; or when grocery shop**, s/he may position the vehicle to remain the trunk accessible. Autonomous vehicles are becoming technically effective at driving from…
▽ More
People park their vehicle depending on interior and exterior contexts. They do it naturally, even unconsciously. For instance, with a baby seat on the rear, the driver might leave more space on one side to be able to get the baby out easily; or when grocery shop**, s/he may position the vehicle to remain the trunk accessible. Autonomous vehicles are becoming technically effective at driving from A to B and parking in a proper spot, with a default way. However, in order to satisfy users' expectations and to become trustworthy, they will also need to park or make a temporary stop, appropriate to the given situation. In addition, users want to understand better the capabilities of their driving assistance features, such as automated parking systems. A voice-based interface can help with this and even ease the adoption of these features. Therefore, we developed a voice-based in-car assistant (Park4U Mate), that is aware of interior and exterior contexts (thanks to a variety of sensors), and that is able to park autonomously in a smart way (with a constraints minimization strategy). The solution was demonstrated to thirty-five users in test-drives and their feedback was collected on the system's decision-making capability as well as on the human-machine-interaction. The results show that: (1) the proposed optimization algorithm is efficient at deciding the best parking strategy; hence, autonomous vehicles can adopt it; (2) a voice-based digital assistant for autonomous parking is perceived as a clear and effective interaction method. However, the interaction speed remained the most important criterion for users. In addition, they clearly wish not to be limited on only voice-interaction, to use the automated parking function and rather appreciate a multi-modal interaction.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.