-
Mixture of Public and Private Distributions in Imperfect Information Games
Authors:
Jérôme Arjonilla,
Abdallah Saffidine,
Tristan Cazenave
Abstract:
In imperfect information games (e.g. Bridge, Skat, Poker), one of the fundamental considerations is to infer the missing information while at the same time avoiding the disclosure of private information. Disregarding the issue of protecting private information can lead to a highly exploitable performance. Yet, excessive attention to it leads to hesitations that are no longer consistent with our pr…
▽ More
In imperfect information games (e.g. Bridge, Skat, Poker), one of the fundamental considerations is to infer the missing information while at the same time avoiding the disclosure of private information. Disregarding the issue of protecting private information can lead to a highly exploitable performance. Yet, excessive attention to it leads to hesitations that are no longer consistent with our private information. In our work, we show that to improve performance, one must choose whether to use a player's private information. We extend our work by proposing a new belief distribution depending on the amount of private and public information desired. We empirically demonstrate an increase in performance and, with the aim of further improving performance, the new distribution should be used according to the position in the game. Our experiments have been done on multiple benchmarks and in multiple determinization-based algorithms (PIMC and IS-MCTS).
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Deep Reinforcement Learning for 5*5 Multiplayer Go
Authors:
Brahim Driss,
Jérôme Arjonilla,
Hui Wang,
Abdallah Saffidine,
Tristan Cazenave
Abstract:
In recent years, much progress has been made in computer Go and most of the results have been obtained thanks to search algorithms (Monte Carlo Tree Search) and Deep Reinforcement Learning (DRL). In this paper, we propose to use and analyze the latest algorithms that use search and DRL (AlphaZero and Descent algorithms) to automatically learn to play an extended version of the game of Go with more…
▽ More
In recent years, much progress has been made in computer Go and most of the results have been obtained thanks to search algorithms (Monte Carlo Tree Search) and Deep Reinforcement Learning (DRL). In this paper, we propose to use and analyze the latest algorithms that use search and DRL (AlphaZero and Descent algorithms) to automatically learn to play an extended version of the game of Go with more than two players. We show that using search and DRL we were able to improve the level of play, even though there are more than two players.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Monte Carlo Search Algorithms Discovering Monte Carlo Tree Search Exploration Terms
Authors:
Tristan Cazenave
Abstract:
Monte Carlo Tree Search and Monte Carlo Search have good results for many combinatorial problems. In this paper we propose to use Monte Carlo Search to design mathematical expressions that are used as exploration terms for Monte Carlo Tree Search algorithms. The optimized Monte Carlo Tree Search algorithms are PUCT and SHUSS. We automatically design the PUCT and the SHUSS root exploration terms. F…
▽ More
Monte Carlo Tree Search and Monte Carlo Search have good results for many combinatorial problems. In this paper we propose to use Monte Carlo Search to design mathematical expressions that are used as exploration terms for Monte Carlo Tree Search algorithms. The optimized Monte Carlo Tree Search algorithms are PUCT and SHUSS. We automatically design the PUCT and the SHUSS root exploration terms. For small search budgets of 32 evaluations the discovered root exploration terms make both algorithms competitive with usual PUCT.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Learning a Prior for Monte Carlo Search by Replaying Solutions to Combinatorial Problems
Authors:
Tristan Cazenave
Abstract:
Monte Carlo Search gives excellent results in multiple difficult combinatorial problems. Using a prior to perform non uniform playouts during the search improves a lot the results compared to uniform playouts. Handmade heuristics tailored to the combinatorial problem are often used as priors. We propose a method to automatically compute a prior. It uses statistics on solved problems. It is a simpl…
▽ More
Monte Carlo Search gives excellent results in multiple difficult combinatorial problems. Using a prior to perform non uniform playouts during the search improves a lot the results compared to uniform playouts. Handmade heuristics tailored to the combinatorial problem are often used as priors. We propose a method to automatically compute a prior. It uses statistics on solved problems. It is a simple and general method that incurs no computational cost at playout time and that brings large performance gains. The method is applied to three difficult combinatorial problems: Latin Square Completion, Kakuro, and Inverse RNA Folding.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Generalized Nested Rollout Policy Adaptation with Limited Repetitions
Authors:
Tristan Cazenave
Abstract:
Generalized Nested Rollout Policy Adaptation (GNRPA) is a Monte Carlo search algorithm for optimizing a sequence of choices. We propose to improve on GNRPA by avoiding too deterministic policies that find again and again the same sequence of choices. We do so by limiting the number of repetitions of the best sequence found at a given level. Experiments show that it improves the algorithm for three…
▽ More
Generalized Nested Rollout Policy Adaptation (GNRPA) is a Monte Carlo search algorithm for optimizing a sequence of choices. We propose to improve on GNRPA by avoiding too deterministic policies that find again and again the same sequence of choices. We do so by limiting the number of repetitions of the best sequence found at a given level. Experiments show that it improves the algorithm for three different combinatorial problems: Inverse RNA Folding, the Traveling Salesman Problem with Time Windows and the Weak Schur problem.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
The Mathematical Game
Authors:
Marc Pierre,
Quentin Cohen-Solal,
Tristan Cazenave
Abstract:
Monte Carlo Tree Search can be used for automated theorem proving. Holophrasm is a neural theorem prover using MCTS combined with neural networks for the policy and the evaluation. In this paper we propose to improve the performance of the Holophrasm theorem prover using other game tree search algorithms.
Monte Carlo Tree Search can be used for automated theorem proving. Holophrasm is a neural theorem prover using MCTS combined with neural networks for the policy and the evaluation. In this paper we propose to improve the performance of the Holophrasm theorem prover using other game tree search algorithms.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Vision Transformers for Computer Go
Authors:
Amani Sagri,
Tristan Cazenave,
Jérôme Arjonilla,
Abdallah Saffidine
Abstract:
Motivated by the success of transformers in various fields, such as language understanding and image analysis, this investigation explores their application in the context of the game of Go. In particular, our study focuses on the analysis of the Transformer in Vision. Through a detailed analysis of numerous points such as prediction accuracy, win rates, memory, speed, size, or even learning rate,…
▽ More
Motivated by the success of transformers in various fields, such as language understanding and image analysis, this investigation explores their application in the context of the game of Go. In particular, our study focuses on the analysis of the Transformer in Vision. Through a detailed analysis of numerous points such as prediction accuracy, win rates, memory, speed, size, or even learning rate, we have been able to highlight the substantial role that transformers can play in the game of Go. This study was carried out by comparing them to the usual Residual Networks.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
On the Consistency of Average Embeddings for Item Recommendation
Authors:
Walid Bendada,
Guillaume Salha-Galvan,
Romain Hennequin,
Thomas Bouabça,
Tristan Cazenave
Abstract:
A prevalent practice in recommender systems consists in averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently anal…
▽ More
A prevalent practice in recommender systems consists in averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting.
△ Less
Submitted 30 August, 2023; v1 submitted 24 August, 2023;
originally announced August 2023.
-
A Scalable Framework for Automatic Playlist Continuation on Music Streaming Services
Authors:
Walid Bendada,
Guillaume Salha-Galvan,
Thomas Bouabça,
Tristan Cazenave
Abstract:
Music streaming services often aim to recommend songs for users to extend the playlists they have created on these services. However, extending playlists while preserving their musical characteristics and matching user preferences remains a challenging task, commonly referred to as Automatic Playlist Continuation (APC). Besides, while these services often need to select the best songs to recommend…
▽ More
Music streaming services often aim to recommend songs for users to extend the playlists they have created on these services. However, extending playlists while preserving their musical characteristics and matching user preferences remains a challenging task, commonly referred to as Automatic Playlist Continuation (APC). Besides, while these services often need to select the best songs to recommend in real-time and among large catalogs with millions of candidates, recent research on APC mainly focused on models with few scalability guarantees and evaluated on relatively small datasets. In this paper, we introduce a general framework to build scalable yet effective APC models for large-scale applications. Based on a represent-then-aggregate strategy, it ensures scalability by design while remaining flexible enough to incorporate a wide range of representation learning and sequence modeling techniques, e.g., based on Transformers. We demonstrate the relevance of this framework through in-depth experimental validation on Spotify's Million Playlist Dataset (MPD), the largest public dataset for APC. We also describe how, in 2022, we successfully leveraged this framework to improve APC in production on Deezer. We report results from a large-scale online A/B test on this service, emphasizing the practical impact of our approach in such a real-world application.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Towards Tackling MaxSAT by Combining Nested Monte Carlo with Local Search
Authors:
Hui Wang,
Abdallah Saffidine,
Tristan Cazenave
Abstract:
Recent work proposed the UCTMAXSAT algorithm to address Maximum Satisfiability Problems (MaxSAT) and shown improved performance over pure Stochastic Local Search algorithms (SLS). UCTMAXSAT is based on Monte Carlo Tree Search but it uses SLS instead of purely random playouts. In this work, we introduce two algorithmic variations over UCTMAXSAT. We carry an empirical analysis on MaxSAT benchmarks f…
▽ More
Recent work proposed the UCTMAXSAT algorithm to address Maximum Satisfiability Problems (MaxSAT) and shown improved performance over pure Stochastic Local Search algorithms (SLS). UCTMAXSAT is based on Monte Carlo Tree Search but it uses SLS instead of purely random playouts. In this work, we introduce two algorithmic variations over UCTMAXSAT. We carry an empirical analysis on MaxSAT benchmarks from recent competitions and establish that both ideas lead to performance improvements. First, a nesting of the tree search inspired by the Nested Monte Carlo Search algorithm is effective on most instance types in the benchmark. Second, we observe that using a static flip limit in SLS, the ideal budget depends heavily on the instance size and we propose to set it dynamically. We show that it is a robust way to achieve comparable performance on a variety of instances without requiring additional tuning.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Learning to Play Stochastic Two-player Perfect-Information Games without Knowledge
Authors:
Quentin Cohen-Solal,
Tristan Cazenave
Abstract:
In this paper, we extend the Descent framework, which enables learning and planning in the context of two-player games with perfect information, to the framework of stochastic games.
We propose two ways of doing this, the first way generalizes the search algorithm, i.e. Descent, to stochastic games and the second way approximates stochastic games by deterministic games.
We then evaluate them o…
▽ More
In this paper, we extend the Descent framework, which enables learning and planning in the context of two-player games with perfect information, to the framework of stochastic games.
We propose two ways of doing this, the first way generalizes the search algorithm, i.e. Descent, to stochastic games and the second way approximates stochastic games by deterministic games.
We then evaluate them on the game EinStein wurfelt nicht! against state-of-the-art algorithms: Expectiminimax and Polygames (i.e. the Alpha Zero algorithm). It is our generalization of Descent which obtains the best results. The approximation by deterministic games nevertheless obtains good results, presaging that it could give better results in particular contexts.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Solving the HP model with Nested Monte Carlo Search
Authors:
Milo Roucairol,
Tristan Cazenave
Abstract:
In this paper we present a new Monte Carlo Search (MCS) algorithm for finding the ground state energy of proteins in the HP-model. We also compare it briefly to other MCS algorithms not usually used on the HP-model and provide an overview of the algorithms used on HP-model. The algorithm presented in this paper does not beat state of the art algorithms, see PERM (Hsu and Grassberger 2011), REMC (T…
▽ More
In this paper we present a new Monte Carlo Search (MCS) algorithm for finding the ground state energy of proteins in the HP-model. We also compare it briefly to other MCS algorithms not usually used on the HP-model and provide an overview of the algorithms used on HP-model. The algorithm presented in this paper does not beat state of the art algorithms, see PERM (Hsu and Grassberger 2011), REMC (Thachuk, Shmygelska, and Hoos 2007) or WLRE (Wüst and Landau 2012) for better results.
Hsu, H.-P.; and Grassberger, P. 2011. A review of Monte Carlo simulations of polymers with PERM. Journal of Statistical Physics, 144 (3): 597 to 637.
Thachuk, C.; Shmygelska, A.; and Hoos, H. H. 2007. A replica exchange Monte Carlo algorithm for protein folding in the HP model. BMC Bioinformatics, 8(1): 342.
Wüst, T.; and Landau, D. P. 2012. Optimized Wang-Landau sampling of lattice polymers: Ground state search and folding thermodynamics of HP model proteins. The Journal of Chemical Physics, 137(6): 064903.
△ Less
Submitted 25 January, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Sequential Elimination Voting Games
Authors:
Ulysse Pavloff,
Tristan Cazenave,
Jérôme Lang
Abstract:
Voting by sequential elimination is a low-communication voting protocol: voters play in sequence and eliminate one or more of the remaining candidates, until only one remains. While the fairness and efficiency of such protocols have been explored, the impact of strategic behaviour has not been addressed. We model voting by sequential elimination as a game. Given a fixed elimination sequence, we sh…
▽ More
Voting by sequential elimination is a low-communication voting protocol: voters play in sequence and eliminate one or more of the remaining candidates, until only one remains. While the fairness and efficiency of such protocols have been explored, the impact of strategic behaviour has not been addressed. We model voting by sequential elimination as a game. Given a fixed elimination sequence, we show that the outcome is the same in all subgame-perfect Nash equilibria of the corresponding game, and is polynomial-time computable. We measure the loss of social welfare due to strategic behaviour, with respect to the outcome under sincere behaviour, and with respect to the outcome maximizing social welfare. We give tight bounds for worst-case ratios, and show using experiments that the average impact of manipulation can be much lower than in the worst case.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Nested Search versus Limited Discrepancy Search
Authors:
Tristan Cazenave
Abstract:
Limited Discrepancy Search (LDS) is a popular algorithm to search a state space with a heuristic to order the possible actions. Nested Search (NS) is another algorithm to search a state space with the same heuristic. NS spends more time on the move associated to the best heuristic playout while LDS spends more time on the best heuristic move. They both use similar times for the same level of searc…
▽ More
Limited Discrepancy Search (LDS) is a popular algorithm to search a state space with a heuristic to order the possible actions. Nested Search (NS) is another algorithm to search a state space with the same heuristic. NS spends more time on the move associated to the best heuristic playout while LDS spends more time on the best heuristic move. They both use similar times for the same level of search. We advocate in this paper that it is often better to follow the best heuristic playout as in NS than to follow the heuristic as in LDS.
△ Less
Submitted 1 October, 2022;
originally announced October 2022.
-
Planning and Learning: Path-Planning for Autonomous Vehicles, a Review of the Literature
Authors:
Kevin Osanlou,
Christophe Guettier,
Tristan Cazenave,
Eric Jacopin
Abstract:
This short review aims to make the reader familiar with state-of-the-art works relating to planning, scheduling and learning. First, we study state-of-the-art planning algorithms. We give a brief introduction of neural networks. Then we explore in more detail graph neural networks, a recent variant of neural networks suited for processing graph-structured inputs. We describe briefly the concept of…
▽ More
This short review aims to make the reader familiar with state-of-the-art works relating to planning, scheduling and learning. First, we study state-of-the-art planning algorithms. We give a brief introduction of neural networks. Then we explore in more detail graph neural networks, a recent variant of neural networks suited for processing graph-structured inputs. We describe briefly the concept of reinforcement learning algorithms and some approaches designed to date. Next, we study some successful approaches combining neural networks for path-planning. Lastly, we focus on temporal planning problems with uncertainty.
△ Less
Submitted 17 October, 2023; v1 submitted 26 July, 2022;
originally announced July 2022.
-
Refutation of Spectral Graph Theory Conjectures with Monte Carlo Search
Authors:
Milo Roucairol,
Tristan Cazenave
Abstract:
We demonstrate how Monte Carlo Search (MCS) algorithms, namely Nested Monte Carlo Search (NMCS) and Nested Rollout Policy Adaptation (NRPA), can be used to build graphs and find counter-examples to spectral graph theory conjectures in minutes.
We demonstrate how Monte Carlo Search (MCS) algorithms, namely Nested Monte Carlo Search (NMCS) and Nested Rollout Policy Adaptation (NRPA), can be used to build graphs and find counter-examples to spectral graph theory conjectures in minutes.
△ Less
Submitted 3 August, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Solving Disjunctive Temporal Networks with Uncertainty under Restricted Time-Based Controllability using Tree Search and Graph Neural Networks
Authors:
Kevin Osanlou,
Jeremy Frank,
Andrei Bursuc,
Tristan Cazenave,
Eric Jacopin,
Christophe Guettier,
J. Benton
Abstract:
Planning under uncertainty is an area of interest in artificial intelligence. We present a novel approach based on tree search and graph machine learning for the scheduling problem known as Disjunctive Temporal Networks with Uncertainty (DTNU). Dynamic Controllability (DC) of DTNUs seeks a reactive scheduling strategy to satisfy temporal constraints in response to uncontrollable action durations.…
▽ More
Planning under uncertainty is an area of interest in artificial intelligence. We present a novel approach based on tree search and graph machine learning for the scheduling problem known as Disjunctive Temporal Networks with Uncertainty (DTNU). Dynamic Controllability (DC) of DTNUs seeks a reactive scheduling strategy to satisfy temporal constraints in response to uncontrollable action durations. We introduce new semantics for reactive scheduling: Time-based Dynamic Controllability (TDC) and a restricted subset of TDC, R-TDC. We design a tree search algorithm to determine whether or not a DTNU is R-TDC. Moreover, we leverage a graph neural network as a heuristic for tree search guidance. Finally, we conduct experiments on a known benchmark on which we show R-TDC to retain significant completeness with regard to DC, while being faster to prove. This results in the tree search processing fifty percent more DTNU problems in R-TDC than the state-of-the-art DC solver does in DC with the same time budget. We also observe that graph neural network search guidance leads to substantial performance gains on benchmarks of more complex DTNUs, with up to eleven times more problems solved than the baseline tree search.
△ Less
Submitted 30 March, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
Generalized Nested Rollout Policy Adaptation with Dynamic Bias for Vehicle Routing
Authors:
Julien Sentuc,
Tristan Cazenave,
Jean-Yves Lucas
Abstract:
In this paper we present an extension of the Nested Rollout Policy Adaptation algorithm (NRPA), namely the Generalized Nested Rollout Policy Adaptation (GNRPA), as well as its use for solving some instances of the Vehicle Routing Problem. We detail some results obtained on the Solomon instances set which is a conventional benchmark for the Vehicle Routing Problem (VRP). We show that on all instanc…
▽ More
In this paper we present an extension of the Nested Rollout Policy Adaptation algorithm (NRPA), namely the Generalized Nested Rollout Policy Adaptation (GNRPA), as well as its use for solving some instances of the Vehicle Routing Problem. We detail some results obtained on the Solomon instances set which is a conventional benchmark for the Vehicle Routing Problem (VRP). We show that on all instances, GNRPA performs better than NRPA. On some instances, it performs better than the Google OR Tool module dedicated to VRP.
△ Less
Submitted 29 December, 2021; v1 submitted 12 November, 2021;
originally announced November 2021.
-
Learning-based Preference Prediction for Constrained Multi-Criteria Path-Planning
Authors:
Kevin Osanlou,
Christophe Guettier,
Andrei Bursuc,
Tristan Cazenave,
Eric Jacopin
Abstract:
Learning-based methods are increasingly popular for search algorithms in single-criterion optimization problems. In contrast, for multiple-criteria optimization there are significantly fewer approaches despite the existence of numerous applications. Constrained path-planning for Autonomous Ground Vehicles (AGV) is one such application, where an AGV is typically deployed in disaster relief or searc…
▽ More
Learning-based methods are increasingly popular for search algorithms in single-criterion optimization problems. In contrast, for multiple-criteria optimization there are significantly fewer approaches despite the existence of numerous applications. Constrained path-planning for Autonomous Ground Vehicles (AGV) is one such application, where an AGV is typically deployed in disaster relief or search and rescue applications in off-road environments. The agent can be faced with the following dilemma : optimize a source-destination path according to a known criterion and an uncertain criterion under operational constraints. The known criterion is associated to the cost of the path, representing the distance. The uncertain criterion represents the feasibility of driving through the path without requiring human intervention. It depends on various external parameters such as the physics of the vehicle, the state of the explored terrains or weather conditions. In this work, we leverage knowledge acquired through offline simulations by training a neural network model to predict the uncertain criterion. We integrate this model inside a path-planner which can solve problems online. Finally, we conduct experiments on realistic AGV scenarios which illustrate that the proposed framework requires human intervention less frequently, trading for a limited increase in the path distance.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Time-based Dynamic Controllability of Disjunctive Temporal Networks with Uncertainty: A Tree Search Approach with Graph Neural Network Guidance
Authors:
Kevin Osanlou,
Jeremy Frank,
J. Benton,
Andrei Bursuc,
Christophe Guettier,
Eric Jacopin,
Tristan Cazenave
Abstract:
Scheduling in the presence of uncertainty is an area of interest in artificial intelligence due to the large number of applications. We study the problem of dynamic controllability (DC) of disjunctive temporal networks with uncertainty (DTNU), which seeks a strategy to satisfy all constraints in response to uncontrollable action durations. We introduce a more restricted, stronger form of controlla…
▽ More
Scheduling in the presence of uncertainty is an area of interest in artificial intelligence due to the large number of applications. We study the problem of dynamic controllability (DC) of disjunctive temporal networks with uncertainty (DTNU), which seeks a strategy to satisfy all constraints in response to uncontrollable action durations. We introduce a more restricted, stronger form of controllability than DC for DTNUs, time-based dynamic controllability (TDC), and present a tree search approach to determine whether or not a DTNU is TDC. Moreover, we leverage the learning capability of a message passing neural network (MPNN) as a heuristic for tree search guidance. Finally, we conduct experiments for which the tree search shows superior results to state-of-the-art timed-game automata (TGA) based approaches. We observe that using an MPNN for tree search guidance leads to a significant increase in solving performance and scalability to harder DTNU problems.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Optimal Solving of Constrained Path-Planning Problems with Graph Convolutional Networks and Optimized Tree Search
Authors:
Kevin Osanlou,
Andrei Bursuc,
Christophe Guettier,
Tristan Cazenave,
Eric Jacopin
Abstract:
Deep learning-based methods are growing prominence for planning purposes. In this paper, we present a hybrid planner that combines a graph machine learning model and an optimal solver based on branch and bound tree search for path-planning tasks. More specifically, a graph neural network is used to assist the branch and bound algorithm in handling constraints associated with a desired solution pat…
▽ More
Deep learning-based methods are growing prominence for planning purposes. In this paper, we present a hybrid planner that combines a graph machine learning model and an optimal solver based on branch and bound tree search for path-planning tasks. More specifically, a graph neural network is used to assist the branch and bound algorithm in handling constraints associated with a desired solution path. There are multiple downstream practical applications, such as Autonomous Unmanned Ground Vehicles (AUGV), typically deployed in disaster relief or search and rescue operations. In off-road environments, AUGVs must dynamically optimize a source-destination path under various operational constraints, out of which several are difficult to predict in advance and need to be addressed online. We conduct experiments on realistic scenarios and show that graph neural network support enables substantial speedup and smoother scaling to harder path-planning problems. Additionally, information provided by the graph neural network enables the approach to outperform problem-specific handcrafted heuristics, highlighting the potential graph neural networks hold for path-planning tasks.
△ Less
Submitted 3 April, 2022; v1 submitted 2 August, 2021;
originally announced August 2021.
-
Constrained Shortest Path Search with Graph Convolutional Neural Networks
Authors:
Kevin Osanlou,
Christophe Guettier,
Andrei Bursuc,
Tristan Cazenave,
Eric Jacopin
Abstract:
Planning for Autonomous Unmanned Ground Vehicles (AUGV) is still a challenge, especially in difficult, off-road, critical situations. Automatic planning can be used to reach mission objectives, to perform navigation or maneuvers. Most of the time, the problem consists in finding a path from a source to a destination, while satisfying some operational constraints. In a graph without negative cycles…
▽ More
Planning for Autonomous Unmanned Ground Vehicles (AUGV) is still a challenge, especially in difficult, off-road, critical situations. Automatic planning can be used to reach mission objectives, to perform navigation or maneuvers. Most of the time, the problem consists in finding a path from a source to a destination, while satisfying some operational constraints. In a graph without negative cycles, the computation of the single-pair shortest path from a start node to an end node is solved in polynomial time. Additional constraints on the solution path can however make the problem harder to solve. This becomes the case when we need the path to pass through a few mandatory nodes without requiring a specific order of visit. The complexity grows exponentially with the number of mandatory nodes to visit. In this paper, we focus on shortest path search with mandatory nodes on a given connected graph. We propose a hybrid model that combines a constraint-based solver and a graph convolutional neural network to improve search performance. Promising results are obtained on realistic scenarios.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Batch Monte Carlo Tree Search
Authors:
Tristan Cazenave
Abstract:
Making inferences with a deep neural network on a batch of states is much faster with a GPU than making inferences on one state after another. We build on this property to propose Monte Carlo Tree Search algorithms using batched inferences. Instead of using either a search tree or a transposition table we propose to use both in the same algorithm. The transposition table contains the results of th…
▽ More
Making inferences with a deep neural network on a batch of states is much faster with a GPU than making inferences on one state after another. We build on this property to propose Monte Carlo Tree Search algorithms using batched inferences. Instead of using either a search tree or a transposition table we propose to use both in the same algorithm. The transposition table contains the results of the inferences while the search tree contains the statistics of Monte Carlo Tree Search. We also propose to analyze multiple heuristics that improve the search: the $μ$ FPU, the Virtual Mean, the Last Iteration and the Second Move heuristics. They are evaluated for the game of Go using a MobileNet neural network.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
Improving Model and Search for Computer Go
Authors:
Tristan Cazenave
Abstract:
The standard for Deep Reinforcement Learning in games, following Alpha Zero, is to use residual networks and to increase the depth of the network to get better results. We propose to improve mobile networks as an alternative to residual networks and experimentally show the playing strength of the networks according to both their width and their depth. We also propose a generalization of the PUCT s…
▽ More
The standard for Deep Reinforcement Learning in games, following Alpha Zero, is to use residual networks and to increase the depth of the network to get better results. We propose to improve mobile networks as an alternative to residual networks and experimentally show the playing strength of the networks according to both their width and their depth. We also propose a generalization of the PUCT search algorithm that improves on PUCT.
△ Less
Submitted 9 April, 2021; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Optimizing $αμ$
Authors:
Tristan Cazenave,
Swann Legras,
Véronique Ventos
Abstract:
$αμ$ is a search algorithm which repairs two defaults of Perfect Information Monte Carlo search: strategy fusion and non locality. In this paper we optimize $αμ…
▽ More
$αμ$ is a search algorithm which repairs two defaults of Perfect Information Monte Carlo search: strategy fusion and non locality. In this paper we optimize $αμ$ for the game of Bridge, avoiding useless computations. The proposed optimizations are general and apply to other imperfect information turn-based games. We define multiple optimizations involving Pareto fronts, and show that these optimizations speed up the search. Some of these optimizations are cuts that stop the search at a node, while others keep track of which possible worlds have become redundant, avoiding unnecessary, costly evaluations. We also measure the benefits of parallelizing the double dummy searches at the leaves of the $αμ$ search tree.
△ Less
Submitted 29 January, 2021;
originally announced January 2021.
-
Stabilized Nested Rollout Policy Adaptation
Authors:
Tristan Cazenave,
Jean-Baptiste Sevestre,
Matthieu Toulemont
Abstract:
Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorithm for single player games. In this paper we propose to modify NRPA in order to improve the stability of the algorithm. Experiments show it improves the algorithm for different application domains: SameGame, Traveling Salesman with Time Windows and Expression Discovery.
Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorithm for single player games. In this paper we propose to modify NRPA in order to improve the stability of the algorithm. Experiments show it improves the algorithm for different application domains: SameGame, Traveling Salesman with Time Windows and Expression Discovery.
△ Less
Submitted 10 January, 2021;
originally announced January 2021.
-
Minimax Strikes Back
Authors:
Quentin Cohen-Solal,
Tristan Cazenave
Abstract:
Deep Reinforcement Learning (DRL) reaches a superhuman level of play in many complete information games. The state of the art search algorithm used in combination with DRL is Monte Carlo Tree Search (MCTS). We take another approach to DRL using a Minimax algorithm instead of MCTS and learning only the evaluation of states, not the policy. We show that for multiple games it is competitive with the…
▽ More
Deep Reinforcement Learning (DRL) reaches a superhuman level of play in many complete information games. The state of the art search algorithm used in combination with DRL is Monte Carlo Tree Search (MCTS). We take another approach to DRL using a Minimax algorithm instead of MCTS and learning only the evaluation of states, not the policy. We show that for multiple games it is competitive with the state of the art DRL for the learning performances and for the confrontations.
△ Less
Submitted 19 December, 2020;
originally announced December 2020.
-
Mobile Networks for Computer Go
Authors:
Tristan Cazenave
Abstract:
The architecture of the neural networks used in Deep Reinforcement Learning programs such as Alpha Zero or Polygames has been shown to have a great impact on the performances of the resulting playing engines. For example the use of residual networks gave a 600 ELO increase in the strength of Alpha Go. This paper proposes to evaluate the interest of Mobile Network for the game of Go using supervise…
▽ More
The architecture of the neural networks used in Deep Reinforcement Learning programs such as Alpha Zero or Polygames has been shown to have a great impact on the performances of the resulting playing engines. For example the use of residual networks gave a 600 ELO increase in the strength of Alpha Go. This paper proposes to evaluate the interest of Mobile Network for the game of Go using supervised learning as well as the use of a policy head and a value head different from the Alpha Zero heads. The accuracy of the policy, the mean squared error of the value, the efficiency of the networks with the number of parameters, the playing speed and strength of the trained networks are evaluated.
△ Less
Submitted 23 August, 2020;
originally announced August 2020.
-
Monte Carlo Inverse Folding
Authors:
Tristan Cazenave,
Thomas Fournier
Abstract:
The RNA Inverse Folding problem comes from computational biology. The goal is to find a molecule that has a given folding. It is important for scientific fields such as bioengineering, pharmaceutical research, biochemistry, synthetic biology and RNA nanostructures. Nested Monte Carlo Search has given excellent results for this problem. We propose to adapt and evaluate different Monte Carlo Search…
▽ More
The RNA Inverse Folding problem comes from computational biology. The goal is to find a molecule that has a given folding. It is important for scientific fields such as bioengineering, pharmaceutical research, biochemistry, synthetic biology and RNA nanostructures. Nested Monte Carlo Search has given excellent results for this problem. We propose to adapt and evaluate different Monte Carlo Search algorithms for the RNA Inverse Folding problem.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
Generalized Nested Rollout Policy Adaptation
Authors:
Tristan Cazenave
Abstract:
Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorithm for single player games. In this paper we propose to generalize NRPA with a temperature and a bias and to analyze theoretically the algorithms. The generalized algorithm is named GNRPA. Experiments show it improves on NRPA for different application domains: SameGame and the Traveling Salesman Problem with Time Windows.
Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorithm for single player games. In this paper we propose to generalize NRPA with a temperature and a bias and to analyze theoretically the algorithms. The generalized algorithm is named GNRPA. Experiments show it improves on NRPA for different application domains: SameGame and the Traveling Salesman Problem with Time Windows.
△ Less
Submitted 22 March, 2020;
originally announced March 2020.
-
Polygames: Improved Zero Learning
Authors:
Tristan Cazenave,
Yen-Chi Chen,
Guan-Wei Chen,
Shi-Yu Chen,
Xian-Dong Chiu,
Julien Dehos,
Maria Elsa,
Qucheng Gong,
Hengyuan Hu,
Vasil Khalidov,
Cheng-Ling Li,
Hsin-I Lin,
Yu-** Lin,
Xavier Martinet,
Vegard Mella,
Jeremy Rapin,
Baptiste Roziere,
Gabriel Synnaeve,
Fabien Teytaud,
Olivier Teytaud,
Shi-Cheng Ye,
Yi-Jun Ye,
Shi-Jim Yen,
Sergey Zagoruyko
Abstract:
Since DeepMind's AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no fully connected layer). Using such an architecture plus global pooling, we can create bots independent of the board size. The training can be made more robust by kee** track of the best checkpoints during the training and by train…
▽ More
Since DeepMind's AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no fully connected layer). Using such an architecture plus global pooling, we can create bots independent of the board size. The training can be made more robust by kee** track of the best checkpoints during the training and by training against them. Using these features, we release Polygames, our framework for Zero learning, with its library of games and its checkpoints. We won against strong humans at the game of Hex in 19x19, which was often said to be untractable for zero learning; and in Havannah. We also won several first places at the TAAI competitions.
△ Less
Submitted 27 January, 2020;
originally announced January 2020.
-
Monte Carlo Game Solver
Authors:
Tristan Cazenave
Abstract:
We present a general algorithm to order moves so as to speedup exact game solvers. It uses online learning of playout policies and Monte Carlo Tree Search. The learned policy and the information in the Monte Carlo tree are used to order moves in game solvers. They improve greatly the solving time for multiple games.
We present a general algorithm to order moves so as to speedup exact game solvers. It uses online learning of playout policies and Monte Carlo Tree Search. The learned policy and the information in the Monte Carlo tree are used to order moves in game solvers. They improve greatly the solving time for multiple games.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
The αμ Search Algorithm for the Game of Bridge
Authors:
Tristan Cazenave,
Véronique Ventos
Abstract:
αμ is an anytime heuristic search algorithm for incomplete information games that assumes perfect information for the opponents. αμ addresses the strategy fusion and non-locality problems encountered by Perfect Information Monte Carlo sampling. In this paper αμ is applied to the game of Bridge.
αμ is an anytime heuristic search algorithm for incomplete information games that assumes perfect information for the opponents. αμ addresses the strategy fusion and non-locality problems encountered by Perfect Information Monte Carlo sampling. In this paper αμ is applied to the game of Bridge.
△ Less
Submitted 18 November, 2019;
originally announced November 2019.
-
Learning opening books in partially observable games: using random seeds in Phantom Go
Authors:
Tristan Cazenave,
Jialin Liu,
Fabien Teytaud,
Olivier Teytaud
Abstract:
Many artificial intelligences (AIs) are randomized. One can be lucky or unlucky with the random seed; we quantify this effect and show that, maybe contrarily to intuition, this is far from being negligible. Then, we apply two different existing algorithms for selecting good seeds and good probability distributions over seeds. This mainly leads to learning an opening book. We apply this to Phantom…
▽ More
Many artificial intelligences (AIs) are randomized. One can be lucky or unlucky with the random seed; we quantify this effect and show that, maybe contrarily to intuition, this is far from being negligible. Then, we apply two different existing algorithms for selecting good seeds and good probability distributions over seeds. This mainly leads to learning an opening book. We apply this to Phantom Go, which, as all phantom games, is hard for opening book learning. We improve the winning rate from 50% to 70% in 5x5 against the same AI, and from approximately 0% to 40% in 5x5, 7x7 and 9x9 against a stronger (learning) opponent.
△ Less
Submitted 8 July, 2016;
originally announced July 2016.
-
Depth, balancing, and limits of the Elo model
Authors:
Marie-Liesse Cauwet,
Olivier Teytaud,
Hua-Min Liang,
Shi-Jim Yen,
Hung-Hsuan Lin,
I-Chen Wu,
Tristan Cazenave,
Abdallah Saffidine
Abstract:
-Much work has been devoted to the computational complexity of games. However, they are not necessarily relevant for estimating the complexity in human terms. Therefore, human-centered measures have been proposed, e.g. the depth. This paper discusses the depth of various games, extends it to a continuous measure. We provide new depth results and present tool (given-first-move, pie rule, size exten…
▽ More
-Much work has been devoted to the computational complexity of games. However, they are not necessarily relevant for estimating the complexity in human terms. Therefore, human-centered measures have been proposed, e.g. the depth. This paper discusses the depth of various games, extends it to a continuous measure. We provide new depth results and present tool (given-first-move, pie rule, size extension) for increasing it. We also use these measures for analyzing games and opening moves in Y, NoGo, Killall Go, and the effect of pie rules.
△ Less
Submitted 6 November, 2015;
originally announced November 2015.