Search | arXiv e-print repository

doi 10.1109/CoG57401.2023.10333169

Mixture of Public and Private Distributions in Imperfect Information Games

Authors: Jérôme Arjonilla, Abdallah Saffidine, Tristan Cazenave

Abstract: In imperfect information games (e.g. Bridge, Skat, Poker), one of the fundamental considerations is to infer the missing information while at the same time avoiding the disclosure of private information. Disregarding the issue of protecting private information can lead to a highly exploitable performance. Yet, excessive attention to it leads to hesitations that are no longer consistent with our pr… ▽ More In imperfect information games (e.g. Bridge, Skat, Poker), one of the fundamental considerations is to infer the missing information while at the same time avoiding the disclosure of private information. Disregarding the issue of protecting private information can lead to a highly exploitable performance. Yet, excessive attention to it leads to hesitations that are no longer consistent with our private information. In our work, we show that to improve performance, one must choose whether to use a player's private information. We extend our work by proposing a new belief distribution depending on the amount of private and public information desired. We empirically demonstrate an increase in performance and, with the aim of further improving performance, the new distribution should be used according to the position in the game. Our experiments have been done on multiple benchmarks and in multiple determinization-based algorithms (PIMC and IS-MCTS). △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Accepted in CoG 2023

Journal ref: 2023 IEEE Conference on Games (CoG)

arXiv:2405.14265 [pdf, other]

doi 10.1007/978-3-031-30229-9_48

Deep Reinforcement Learning for 5*5 Multiplayer Go

Authors: Brahim Driss, Jérôme Arjonilla, Hui Wang, Abdallah Saffidine, Tristan Cazenave

Abstract: In recent years, much progress has been made in computer Go and most of the results have been obtained thanks to search algorithms (Monte Carlo Tree Search) and Deep Reinforcement Learning (DRL). In this paper, we propose to use and analyze the latest algorithms that use search and DRL (AlphaZero and Descent algorithms) to automatically learn to play an extended version of the game of Go with more… ▽ More In recent years, much progress has been made in computer Go and most of the results have been obtained thanks to search algorithms (Monte Carlo Tree Search) and Deep Reinforcement Learning (DRL). In this paper, we propose to use and analyze the latest algorithms that use search and DRL (AlphaZero and Descent algorithms) to automatically learn to play an extended version of the game of Go with more than two players. We show that using search and DRL we were able to improve the level of play, even though there are more than two players. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Accepted in EvoApps at Evostar2023

Journal ref: International Conference on the Applications of Evolutionary Computation (Part of EvoStar), 2023, 753--764

arXiv:2404.09304 [pdf, other]

Monte Carlo Search Algorithms Discovering Monte Carlo Tree Search Exploration Terms

Authors: Tristan Cazenave

Abstract: Monte Carlo Tree Search and Monte Carlo Search have good results for many combinatorial problems. In this paper we propose to use Monte Carlo Search to design mathematical expressions that are used as exploration terms for Monte Carlo Tree Search algorithms. The optimized Monte Carlo Tree Search algorithms are PUCT and SHUSS. We automatically design the PUCT and the SHUSS root exploration terms. F… ▽ More Monte Carlo Tree Search and Monte Carlo Search have good results for many combinatorial problems. In this paper we propose to use Monte Carlo Search to design mathematical expressions that are used as exploration terms for Monte Carlo Tree Search algorithms. The optimized Monte Carlo Tree Search algorithms are PUCT and SHUSS. We automatically design the PUCT and the SHUSS root exploration terms. For small search budgets of 32 evaluations the discovered root exploration terms make both algorithms competitive with usual PUCT. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2401.10431 [pdf, other]

Learning a Prior for Monte Carlo Search by Replaying Solutions to Combinatorial Problems

Authors: Tristan Cazenave

Abstract: Monte Carlo Search gives excellent results in multiple difficult combinatorial problems. Using a prior to perform non uniform playouts during the search improves a lot the results compared to uniform playouts. Handmade heuristics tailored to the combinatorial problem are often used as priors. We propose a method to automatically compute a prior. It uses statistics on solved problems. It is a simpl… ▽ More Monte Carlo Search gives excellent results in multiple difficult combinatorial problems. Using a prior to perform non uniform playouts during the search improves a lot the results compared to uniform playouts. Handmade heuristics tailored to the combinatorial problem are often used as priors. We propose a method to automatically compute a prior. It uses statistics on solved problems. It is a simple and general method that incurs no computational cost at playout time and that brings large performance gains. The method is applied to three difficult combinatorial problems: Latin Square Completion, Kakuro, and Inverse RNA Folding. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.10420 [pdf, other]

Generalized Nested Rollout Policy Adaptation with Limited Repetitions

Authors: Tristan Cazenave

Abstract: Generalized Nested Rollout Policy Adaptation (GNRPA) is a Monte Carlo search algorithm for optimizing a sequence of choices. We propose to improve on GNRPA by avoiding too deterministic policies that find again and again the same sequence of choices. We do so by limiting the number of repetitions of the best sequence found at a given level. Experiments show that it improves the algorithm for three… ▽ More Generalized Nested Rollout Policy Adaptation (GNRPA) is a Monte Carlo search algorithm for optimizing a sequence of choices. We propose to improve on GNRPA by avoiding too deterministic policies that find again and again the same sequence of choices. We do so by limiting the number of repetitions of the best sequence found at a given level. Experiments show that it improves the algorithm for three different combinatorial problems: Inverse RNA Folding, the Traveling Salesman Problem with Time Windows and the Weak Schur problem. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2309.12711 [pdf, other]

The Mathematical Game

Authors: Marc Pierre, Quentin Cohen-Solal, Tristan Cazenave

Abstract: Monte Carlo Tree Search can be used for automated theorem proving. Holophrasm is a neural theorem prover using MCTS combined with neural networks for the policy and the evaluation. In this paper we propose to improve the performance of the Holophrasm theorem prover using other game tree search algorithms. Monte Carlo Tree Search can be used for automated theorem proving. Holophrasm is a neural theorem prover using MCTS combined with neural networks for the policy and the evaluation. In this paper we propose to improve the performance of the Holophrasm theorem prover using other game tree search algorithms. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.12675 [pdf, other]

Vision Transformers for Computer Go

Authors: Amani Sagri, Tristan Cazenave, Jérôme Arjonilla, Abdallah Saffidine

Abstract: Motivated by the success of transformers in various fields, such as language understanding and image analysis, this investigation explores their application in the context of the game of Go. In particular, our study focuses on the analysis of the Transformer in Vision. Through a detailed analysis of numerous points such as prediction accuracy, win rates, memory, speed, size, or even learning rate,… ▽ More Motivated by the success of transformers in various fields, such as language understanding and image analysis, this investigation explores their application in the context of the game of Go. In particular, our study focuses on the analysis of the Transformer in Vision. Through a detailed analysis of numerous points such as prediction accuracy, win rates, memory, speed, size, or even learning rate, we have been able to highlight the substantial role that transformers can play in the game of Go. This study was carried out by comparing them to the usual Residual Networks. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2308.12767 [pdf, other]

On the Consistency of Average Embeddings for Item Recommendation

Authors: Walid Bendada, Guillaume Salha-Galvan, Romain Hennequin, Thomas Bouabça, Tristan Cazenave

Abstract: A prevalent practice in recommender systems consists in averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently anal… ▽ More A prevalent practice in recommender systems consists in averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting. △ Less

Submitted 30 August, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: 17th ACM Conference on Recommender Systems (RecSys 2023)

arXiv:2304.09061 [pdf, other]

doi 10.1145/3539618.3591628

A Scalable Framework for Automatic Playlist Continuation on Music Streaming Services

Authors: Walid Bendada, Guillaume Salha-Galvan, Thomas Bouabça, Tristan Cazenave

Abstract: Music streaming services often aim to recommend songs for users to extend the playlists they have created on these services. However, extending playlists while preserving their musical characteristics and matching user preferences remains a challenging task, commonly referred to as Automatic Playlist Continuation (APC). Besides, while these services often need to select the best songs to recommend… ▽ More Music streaming services often aim to recommend songs for users to extend the playlists they have created on these services. However, extending playlists while preserving their musical characteristics and matching user preferences remains a challenging task, commonly referred to as Automatic Playlist Continuation (APC). Besides, while these services often need to select the best songs to recommend in real-time and among large catalogs with millions of candidates, recent research on APC mainly focused on models with few scalability guarantees and evaluated on relatively small datasets. In this paper, we introduce a general framework to build scalable yet effective APC models for large-scale applications. Based on a represent-then-aggregate strategy, it ensures scalability by design while remaining flexible enough to incorporate a wide range of representation learning and sequence modeling techniques, e.g., based on Transformers. We demonstrate the relevance of this framework through in-depth experimental validation on Spotify's Million Playlist Dataset (MPD), the largest public dataset for APC. We also describe how, in 2022, we successfully leveraged this framework to improve APC in production on Deezer. We report results from a large-scale online A/B test on this service, emphasizing the practical impact of our approach in such a real-world application. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: Accepted as a Full Paper at the SIGIR 2023 conference

arXiv:2302.13225 [pdf, ps, other]

Towards Tackling MaxSAT by Combining Nested Monte Carlo with Local Search

Authors: Hui Wang, Abdallah Saffidine, Tristan Cazenave

Abstract: Recent work proposed the UCTMAXSAT algorithm to address Maximum Satisfiability Problems (MaxSAT) and shown improved performance over pure Stochastic Local Search algorithms (SLS). UCTMAXSAT is based on Monte Carlo Tree Search but it uses SLS instead of purely random playouts. In this work, we introduce two algorithmic variations over UCTMAXSAT. We carry an empirical analysis on MaxSAT benchmarks f… ▽ More Recent work proposed the UCTMAXSAT algorithm to address Maximum Satisfiability Problems (MaxSAT) and shown improved performance over pure Stochastic Local Search algorithms (SLS). UCTMAXSAT is based on Monte Carlo Tree Search but it uses SLS instead of purely random playouts. In this work, we introduce two algorithmic variations over UCTMAXSAT. We carry an empirical analysis on MaxSAT benchmarks from recent competitions and establish that both ideas lead to performance improvements. First, a nesting of the tree search inspired by the Nested Monte Carlo Search algorithm is effective on most instance types in the benchmark. Second, we observe that using a static flip limit in SLS, the ideal budget depends heavily on the instance size and we propose to set it dynamically. We show that it is a robust way to achieve comparable performance on a variety of instances without requiring additional tuning. △ Less

Submitted 25 February, 2023; originally announced February 2023.

arXiv:2302.04318 [pdf, other]

Learning to Play Stochastic Two-player Perfect-Information Games without Knowledge

Authors: Quentin Cohen-Solal, Tristan Cazenave

Abstract: In this paper, we extend the Descent framework, which enables learning and planning in the context of two-player games with perfect information, to the framework of stochastic games. We propose two ways of doing this, the first way generalizes the search algorithm, i.e. Descent, to stochastic games and the second way approximates stochastic games by deterministic games. We then evaluate them o… ▽ More In this paper, we extend the Descent framework, which enables learning and planning in the context of two-player games with perfect information, to the framework of stochastic games. We propose two ways of doing this, the first way generalizes the search algorithm, i.e. Descent, to stochastic games and the second way approximates stochastic games by deterministic games. We then evaluate them on the game EinStein wurfelt nicht! against state-of-the-art algorithms: Expectiminimax and Polygames (i.e. the Alpha Zero algorithm). It is our generalization of Descent which obtains the best results. The approximation by deterministic games nevertheless obtains good results, presaging that it could give better results in particular contexts. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2301.09533 [pdf, ps, other]

Solving the HP model with Nested Monte Carlo Search

Authors: Milo Roucairol, Tristan Cazenave

Abstract: In this paper we present a new Monte Carlo Search (MCS) algorithm for finding the ground state energy of proteins in the HP-model. We also compare it briefly to other MCS algorithms not usually used on the HP-model and provide an overview of the algorithms used on HP-model. The algorithm presented in this paper does not beat state of the art algorithms, see PERM (Hsu and Grassberger 2011), REMC (T… ▽ More In this paper we present a new Monte Carlo Search (MCS) algorithm for finding the ground state energy of proteins in the HP-model. We also compare it briefly to other MCS algorithms not usually used on the HP-model and provide an overview of the algorithms used on HP-model. The algorithm presented in this paper does not beat state of the art algorithms, see PERM (Hsu and Grassberger 2011), REMC (Thachuk, Shmygelska, and Hoos 2007) or WLRE (Wüst and Landau 2012) for better results. Hsu, H.-P.; and Grassberger, P. 2011. A review of Monte Carlo simulations of polymers with PERM. Journal of Statistical Physics, 144 (3): 597 to 637. Thachuk, C.; Shmygelska, A.; and Hoos, H. H. 2007. A replica exchange Monte Carlo algorithm for protein folding in the HP model. BMC Bioinformatics, 8(1): 342. Wüst, T.; and Landau, D. P. 2012. Optimized Wang-Landau sampling of lattice polymers: Ground state search and folding thermodynamics of HP model proteins. The Journal of Chemical Physics, 137(6): 064903. △ Less

Submitted 25 January, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: Accepted to AAAI's workshop AI2ASE 2023: 2nd Annual AAAI Workshop on AI to Accelerate Science and Engineering. 6 pages, 1 for references

arXiv:2210.08844 [pdf, other]

Sequential Elimination Voting Games

Authors: Ulysse Pavloff, Tristan Cazenave, Jérôme Lang

Abstract: Voting by sequential elimination is a low-communication voting protocol: voters play in sequence and eliminate one or more of the remaining candidates, until only one remains. While the fairness and efficiency of such protocols have been explored, the impact of strategic behaviour has not been addressed. We model voting by sequential elimination as a game. Given a fixed elimination sequence, we sh… ▽ More Voting by sequential elimination is a low-communication voting protocol: voters play in sequence and eliminate one or more of the remaining candidates, until only one remains. While the fairness and efficiency of such protocols have been explored, the impact of strategic behaviour has not been addressed. We model voting by sequential elimination as a game. Given a fixed elimination sequence, we show that the outcome is the same in all subgame-perfect Nash equilibria of the corresponding game, and is polynomial-time computable. We measure the loss of social welfare due to strategic behaviour, with respect to the outcome under sincere behaviour, and with respect to the outcome maximizing social welfare. We give tight bounds for worst-case ratios, and show using experiments that the average impact of manipulation can be much lower than in the worst case. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2210.00216 [pdf, other]

Nested Search versus Limited Discrepancy Search

Authors: Tristan Cazenave

Abstract: Limited Discrepancy Search (LDS) is a popular algorithm to search a state space with a heuristic to order the possible actions. Nested Search (NS) is another algorithm to search a state space with the same heuristic. NS spends more time on the move associated to the best heuristic playout while LDS spends more time on the best heuristic move. They both use similar times for the same level of searc… ▽ More Limited Discrepancy Search (LDS) is a popular algorithm to search a state space with a heuristic to order the possible actions. Nested Search (NS) is another algorithm to search a state space with the same heuristic. NS spends more time on the move associated to the best heuristic playout while LDS spends more time on the best heuristic move. They both use similar times for the same level of search. We advocate in this paper that it is often better to follow the best heuristic playout as in NS than to follow the heuristic as in LDS. △ Less

Submitted 1 October, 2022; originally announced October 2022.

arXiv:2207.13181 [pdf, other]

Planning and Learning: Path-Planning for Autonomous Vehicles, a Review of the Literature

Authors: Kevin Osanlou, Christophe Guettier, Tristan Cazenave, Eric Jacopin

Abstract: This short review aims to make the reader familiar with state-of-the-art works relating to planning, scheduling and learning. First, we study state-of-the-art planning algorithms. We give a brief introduction of neural networks. Then we explore in more detail graph neural networks, a recent variant of neural networks suited for processing graph-structured inputs. We describe briefly the concept of… ▽ More This short review aims to make the reader familiar with state-of-the-art works relating to planning, scheduling and learning. First, we study state-of-the-art planning algorithms. We give a brief introduction of neural networks. Then we explore in more detail graph neural networks, a recent variant of neural networks suited for processing graph-structured inputs. We describe briefly the concept of reinforcement learning algorithms and some approaches designed to date. Next, we study some successful approaches combining neural networks for path-planning. Lastly, we focus on temporal planning problems with uncertainty. △ Less

Submitted 17 October, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

Comments: AAAI-format & updated

arXiv:2207.03343 [pdf, other]

Refutation of Spectral Graph Theory Conjectures with Monte Carlo Search

Authors: Milo Roucairol, Tristan Cazenave

Abstract: We demonstrate how Monte Carlo Search (MCS) algorithms, namely Nested Monte Carlo Search (NMCS) and Nested Rollout Policy Adaptation (NRPA), can be used to build graphs and find counter-examples to spectral graph theory conjectures in minutes. We demonstrate how Monte Carlo Search (MCS) algorithms, namely Nested Monte Carlo Search (NMCS) and Nested Rollout Policy Adaptation (NRPA), can be used to build graphs and find counter-examples to spectral graph theory conjectures in minutes. △ Less

Submitted 3 August, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: 11 pages, figures and pseudocode in appendix

arXiv:2203.15030 [pdf, other]

Solving Disjunctive Temporal Networks with Uncertainty under Restricted Time-Based Controllability using Tree Search and Graph Neural Networks

Authors: Kevin Osanlou, Jeremy Frank, Andrei Bursuc, Tristan Cazenave, Eric Jacopin, Christophe Guettier, J. Benton

Abstract: Planning under uncertainty is an area of interest in artificial intelligence. We present a novel approach based on tree search and graph machine learning for the scheduling problem known as Disjunctive Temporal Networks with Uncertainty (DTNU). Dynamic Controllability (DC) of DTNUs seeks a reactive scheduling strategy to satisfy temporal constraints in response to uncontrollable action durations.… ▽ More Planning under uncertainty is an area of interest in artificial intelligence. We present a novel approach based on tree search and graph machine learning for the scheduling problem known as Disjunctive Temporal Networks with Uncertainty (DTNU). Dynamic Controllability (DC) of DTNUs seeks a reactive scheduling strategy to satisfy temporal constraints in response to uncontrollable action durations. We introduce new semantics for reactive scheduling: Time-based Dynamic Controllability (TDC) and a restricted subset of TDC, R-TDC. We design a tree search algorithm to determine whether or not a DTNU is R-TDC. Moreover, we leverage a graph neural network as a heuristic for tree search guidance. Finally, we conduct experiments on a known benchmark on which we show R-TDC to retain significant completeness with regard to DC, while being faster to prove. This results in the tree search processing fifty percent more DTNU problems in R-TDC than the state-of-the-art DC solver does in DC with the same time budget. We also observe that graph neural network search guidance leads to substantial performance gains on benchmarks of more complex DTNUs, with up to eleven times more problems solved than the baseline tree search. △ Less

Submitted 30 March, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

Comments: Thirty-Sixth AAAI Conference on Artificial Intelligence. This version includes the technical appendix. arXiv admin note: substantial text overlap with arXiv:2108.01068

Journal ref: Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

arXiv:2111.06928 [pdf, other]

Generalized Nested Rollout Policy Adaptation with Dynamic Bias for Vehicle Routing

Authors: Julien Sentuc, Tristan Cazenave, Jean-Yves Lucas

Abstract: In this paper we present an extension of the Nested Rollout Policy Adaptation algorithm (NRPA), namely the Generalized Nested Rollout Policy Adaptation (GNRPA), as well as its use for solving some instances of the Vehicle Routing Problem. We detail some results obtained on the Solomon instances set which is a conventional benchmark for the Vehicle Routing Problem (VRP). We show that on all instanc… ▽ More In this paper we present an extension of the Nested Rollout Policy Adaptation algorithm (NRPA), namely the Generalized Nested Rollout Policy Adaptation (GNRPA), as well as its use for solving some instances of the Vehicle Routing Problem. We detail some results obtained on the Solomon instances set which is a conventional benchmark for the Vehicle Routing Problem (VRP). We show that on all instances, GNRPA performs better than NRPA. On some instances, it performs better than the Google OR Tool module dedicated to VRP. △ Less

Submitted 29 December, 2021; v1 submitted 12 November, 2021; originally announced November 2021.

arXiv:2108.01080 [pdf, other]

Learning-based Preference Prediction for Constrained Multi-Criteria Path-Planning

Authors: Kevin Osanlou, Christophe Guettier, Andrei Bursuc, Tristan Cazenave, Eric Jacopin

Abstract: Learning-based methods are increasingly popular for search algorithms in single-criterion optimization problems. In contrast, for multiple-criteria optimization there are significantly fewer approaches despite the existence of numerous applications. Constrained path-planning for Autonomous Ground Vehicles (AGV) is one such application, where an AGV is typically deployed in disaster relief or searc… ▽ More Learning-based methods are increasingly popular for search algorithms in single-criterion optimization problems. In contrast, for multiple-criteria optimization there are significantly fewer approaches despite the existence of numerous applications. Constrained path-planning for Autonomous Ground Vehicles (AGV) is one such application, where an AGV is typically deployed in disaster relief or search and rescue applications in off-road environments. The agent can be faced with the following dilemma : optimize a source-destination path according to a known criterion and an uncertain criterion under operational constraints. The known criterion is associated to the cost of the path, representing the distance. The uncertain criterion represents the feasibility of driving through the path without requiring human intervention. It depends on various external parameters such as the physics of the vehicle, the state of the explored terrains or weather conditions. In this work, we leverage knowledge acquired through offline simulations by training a neural network model to predict the uncertain criterion. We integrate this model inside a path-planner which can solve problems online. Finally, we conduct experiments on realistic AGV scenarios which illustrate that the proposed framework requires human intervention less frequently, trading for a limited increase in the path distance. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: arXiv admin note: text overlap with arXiv:2108.00978

Journal ref: International Conference on Automated Planning and Scheduling 2019, Workshop SPARK

arXiv:2108.01068 [pdf, other]

Time-based Dynamic Controllability of Disjunctive Temporal Networks with Uncertainty: A Tree Search Approach with Graph Neural Network Guidance

Authors: Kevin Osanlou, Jeremy Frank, J. Benton, Andrei Bursuc, Christophe Guettier, Eric Jacopin, Tristan Cazenave

Abstract: Scheduling in the presence of uncertainty is an area of interest in artificial intelligence due to the large number of applications. We study the problem of dynamic controllability (DC) of disjunctive temporal networks with uncertainty (DTNU), which seeks a strategy to satisfy all constraints in response to uncontrollable action durations. We introduce a more restricted, stronger form of controlla… ▽ More Scheduling in the presence of uncertainty is an area of interest in artificial intelligence due to the large number of applications. We study the problem of dynamic controllability (DC) of disjunctive temporal networks with uncertainty (DTNU), which seeks a strategy to satisfy all constraints in response to uncontrollable action durations. We introduce a more restricted, stronger form of controllability than DC for DTNUs, time-based dynamic controllability (TDC), and present a tree search approach to determine whether or not a DTNU is TDC. Moreover, we leverage the learning capability of a message passing neural network (MPNN) as a heuristic for tree search guidance. Finally, we conduct experiments for which the tree search shows superior results to state-of-the-art timed-game automata (TGA) based approaches. We observe that using an MPNN for tree search guidance leads to a significant increase in solving performance and scalability to harder DTNU problems. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Journal ref: International Conference on Automated Planning and Scheduling 2020. Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL)

arXiv:2108.01036 [pdf, other]

doi 10.1109/IROS40897.2019.8968113

Optimal Solving of Constrained Path-Planning Problems with Graph Convolutional Networks and Optimized Tree Search

Authors: Kevin Osanlou, Andrei Bursuc, Christophe Guettier, Tristan Cazenave, Eric Jacopin

Abstract: Deep learning-based methods are growing prominence for planning purposes. In this paper, we present a hybrid planner that combines a graph machine learning model and an optimal solver based on branch and bound tree search for path-planning tasks. More specifically, a graph neural network is used to assist the branch and bound algorithm in handling constraints associated with a desired solution pat… ▽ More Deep learning-based methods are growing prominence for planning purposes. In this paper, we present a hybrid planner that combines a graph machine learning model and an optimal solver based on branch and bound tree search for path-planning tasks. More specifically, a graph neural network is used to assist the branch and bound algorithm in handling constraints associated with a desired solution path. There are multiple downstream practical applications, such as Autonomous Unmanned Ground Vehicles (AUGV), typically deployed in disaster relief or search and rescue operations. In off-road environments, AUGVs must dynamically optimize a source-destination path under various operational constraints, out of which several are difficult to predict in advance and need to be addressed online. We conduct experiments on realistic scenarios and show that graph neural network support enables substantial speedup and smoother scaling to harder path-planning problems. Additionally, information provided by the graph neural network enables the approach to outperform problem-specific handcrafted heuristics, highlighting the potential graph neural networks hold for path-planning tasks. △ Less

Submitted 3 April, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: Published as a conference paper at IROS 2019

Journal ref: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 3519-3525

arXiv:2108.00978 [pdf, other]

Constrained Shortest Path Search with Graph Convolutional Neural Networks

Authors: Kevin Osanlou, Christophe Guettier, Andrei Bursuc, Tristan Cazenave, Eric Jacopin

Abstract: Planning for Autonomous Unmanned Ground Vehicles (AUGV) is still a challenge, especially in difficult, off-road, critical situations. Automatic planning can be used to reach mission objectives, to perform navigation or maneuvers. Most of the time, the problem consists in finding a path from a source to a destination, while satisfying some operational constraints. In a graph without negative cycles… ▽ More Planning for Autonomous Unmanned Ground Vehicles (AUGV) is still a challenge, especially in difficult, off-road, critical situations. Automatic planning can be used to reach mission objectives, to perform navigation or maneuvers. Most of the time, the problem consists in finding a path from a source to a destination, while satisfying some operational constraints. In a graph without negative cycles, the computation of the single-pair shortest path from a start node to an end node is solved in polynomial time. Additional constraints on the solution path can however make the problem harder to solve. This becomes the case when we need the path to pass through a few mandatory nodes without requiring a specific order of visit. The complexity grows exponentially with the number of mandatory nodes to visit. In this paper, we focus on shortest path search with mandatory nodes on a given connected graph. We propose a hybrid model that combines a constraint-based solver and a graph convolutional neural network to improve search performance. Promising results are obtained on realistic scenarios. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Journal ref: AAAI - ICML / IJCAI / AAMAS 2018 Workshop on Planning and Learning (PAL-18). Stockholm, Sweden 2018

arXiv:2104.04278 [pdf, ps, other]

Batch Monte Carlo Tree Search

Authors: Tristan Cazenave

Abstract: Making inferences with a deep neural network on a batch of states is much faster with a GPU than making inferences on one state after another. We build on this property to propose Monte Carlo Tree Search algorithms using batched inferences. Instead of using either a search tree or a transposition table we propose to use both in the same algorithm. The transposition table contains the results of th… ▽ More Making inferences with a deep neural network on a batch of states is much faster with a GPU than making inferences on one state after another. We build on this property to propose Monte Carlo Tree Search algorithms using batched inferences. Instead of using either a search tree or a transposition table we propose to use both in the same algorithm. The transposition table contains the results of the inferences while the search tree contains the statistics of Monte Carlo Tree Search. We also propose to analyze multiple heuristics that improve the search: the $μ$ FPU, the Virtual Mean, the Last Iteration and the Second Move heuristics. They are evaluated for the game of Go using a MobileNet neural network. △ Less

Submitted 9 April, 2021; originally announced April 2021.

arXiv:2102.03467 [pdf, other]

Improving Model and Search for Computer Go

Authors: Tristan Cazenave

Abstract: The standard for Deep Reinforcement Learning in games, following Alpha Zero, is to use residual networks and to increase the depth of the network to get better results. We propose to improve mobile networks as an alternative to residual networks and experimentally show the playing strength of the networks according to both their width and their depth. We also propose a generalization of the PUCT s… ▽ More The standard for Deep Reinforcement Learning in games, following Alpha Zero, is to use residual networks and to increase the depth of the network to get better results. We propose to improve mobile networks as an alternative to residual networks and experimentally show the playing strength of the networks according to both their width and their depth. We also propose a generalization of the PUCT search algorithm that improves on PUCT. △ Less

Submitted 9 April, 2021; v1 submitted 5 February, 2021; originally announced February 2021.

arXiv:2101.12639 [pdf, other]

Optimizing $αμ$

Authors: Tristan Cazenave, Swann Legras, Véronique Ventos

Abstract: $αμ$ is a search algorithm which repairs two defaults of Perfect Information Monte Carlo search: strategy fusion and non locality. In this paper we optimize $αμ… ▽ More $αμ$ is a search algorithm which repairs two defaults of Perfect Information Monte Carlo search: strategy fusion and non locality. In this paper we optimize $αμ$ for the game of Bridge, avoiding useless computations. The proposed optimizations are general and apply to other imperfect information turn-based games. We define multiple optimizations involving Pareto fronts, and show that these optimizations speed up the search. Some of these optimizations are cuts that stop the search at a node, while others keep track of which possible worlds have become redundant, avoiding unnecessary, costly evaluations. We also measure the benefits of parallelizing the double dummy searches at the leaves of the $αμ$ search tree. △ Less

Submitted 29 January, 2021; originally announced January 2021.

arXiv:2101.03563 [pdf, other]

Stabilized Nested Rollout Policy Adaptation

Authors: Tristan Cazenave, Jean-Baptiste Sevestre, Matthieu Toulemont

Abstract: Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorithm for single player games. In this paper we propose to modify NRPA in order to improve the stability of the algorithm. Experiments show it improves the algorithm for different application domains: SameGame, Traveling Salesman with Time Windows and Expression Discovery. Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorithm for single player games. In this paper we propose to modify NRPA in order to improve the stability of the algorithm. Experiments show it improves the algorithm for different application domains: SameGame, Traveling Salesman with Time Windows and Expression Discovery. △ Less

Submitted 10 January, 2021; originally announced January 2021.

Comments: arXiv admin note: text overlap with arXiv:2003.10024

arXiv:2012.10700 [pdf, other]

Minimax Strikes Back

Authors: Quentin Cohen-Solal, Tristan Cazenave

Abstract: Deep Reinforcement Learning (DRL) reaches a superhuman level of play in many complete information games. The state of the art search algorithm used in combination with DRL is Monte Carlo Tree Search (MCTS). We take another approach to DRL using a Minimax algorithm instead of MCTS and learning only the evaluation of states, not the policy. We show that for multiple games it is competitive with the… ▽ More Deep Reinforcement Learning (DRL) reaches a superhuman level of play in many complete information games. The state of the art search algorithm used in combination with DRL is Monte Carlo Tree Search (MCTS). We take another approach to DRL using a Minimax algorithm instead of MCTS and learning only the evaluation of states, not the policy. We show that for multiple games it is competitive with the state of the art DRL for the learning performances and for the confrontations. △ Less

Submitted 19 December, 2020; originally announced December 2020.

arXiv:2008.10080 [pdf, other]

Mobile Networks for Computer Go

Authors: Tristan Cazenave

Abstract: The architecture of the neural networks used in Deep Reinforcement Learning programs such as Alpha Zero or Polygames has been shown to have a great impact on the performances of the resulting playing engines. For example the use of residual networks gave a 600 ELO increase in the strength of Alpha Go. This paper proposes to evaluate the interest of Mobile Network for the game of Go using supervise… ▽ More The architecture of the neural networks used in Deep Reinforcement Learning programs such as Alpha Zero or Polygames has been shown to have a great impact on the performances of the resulting playing engines. For example the use of residual networks gave a 600 ELO increase in the strength of Alpha Go. This paper proposes to evaluate the interest of Mobile Network for the game of Go using supervised learning as well as the use of a policy head and a value head different from the Alpha Zero heads. The accuracy of the policy, the mean squared error of the value, the efficiency of the networks with the number of parameters, the playing speed and strength of the trained networks are evaluated. △ Less

Submitted 23 August, 2020; originally announced August 2020.

arXiv:2005.09961 [pdf, other]

Monte Carlo Inverse Folding

Authors: Tristan Cazenave, Thomas Fournier

Abstract: The RNA Inverse Folding problem comes from computational biology. The goal is to find a molecule that has a given folding. It is important for scientific fields such as bioengineering, pharmaceutical research, biochemistry, synthetic biology and RNA nanostructures. Nested Monte Carlo Search has given excellent results for this problem. We propose to adapt and evaluate different Monte Carlo Search… ▽ More The RNA Inverse Folding problem comes from computational biology. The goal is to find a molecule that has a given folding. It is important for scientific fields such as bioengineering, pharmaceutical research, biochemistry, synthetic biology and RNA nanostructures. Nested Monte Carlo Search has given excellent results for this problem. We propose to adapt and evaluate different Monte Carlo Search algorithms for the RNA Inverse Folding problem. △ Less

Submitted 20 May, 2020; originally announced May 2020.

arXiv:2003.10024 [pdf, other]

Generalized Nested Rollout Policy Adaptation

Authors: Tristan Cazenave

Abstract: Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorithm for single player games. In this paper we propose to generalize NRPA with a temperature and a bias and to analyze theoretically the algorithms. The generalized algorithm is named GNRPA. Experiments show it improves on NRPA for different application domains: SameGame and the Traveling Salesman Problem with Time Windows. Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search algorithm for single player games. In this paper we propose to generalize NRPA with a temperature and a bias and to analyze theoretically the algorithms. The generalized algorithm is named GNRPA. Experiments show it improves on NRPA for different application domains: SameGame and the Traveling Salesman Problem with Time Windows. △ Less

Submitted 22 March, 2020; originally announced March 2020.

arXiv:2001.09832 [pdf, other]

Polygames: Improved Zero Learning

Authors: Tristan Cazenave, Yen-Chi Chen, Guan-Wei Chen, Shi-Yu Chen, Xian-Dong Chiu, Julien Dehos, Maria Elsa, Qucheng Gong, Hengyuan Hu, Vasil Khalidov, Cheng-Ling Li, Hsin-I Lin, Yu-** Lin, Xavier Martinet, Vegard Mella, Jeremy Rapin, Baptiste Roziere, Gabriel Synnaeve, Fabien Teytaud, Olivier Teytaud, Shi-Cheng Ye, Yi-Jun Ye, Shi-Jim Yen, Sergey Zagoruyko

Abstract: Since DeepMind's AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no fully connected layer). Using such an architecture plus global pooling, we can create bots independent of the board size. The training can be made more robust by kee** track of the best checkpoints during the training and by train… ▽ More Since DeepMind's AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no fully connected layer). Using such an architecture plus global pooling, we can create bots independent of the board size. The training can be made more robust by kee** track of the best checkpoints during the training and by training against them. Using these features, we release Polygames, our framework for Zero learning, with its library of games and its checkpoints. We won against strong humans at the game of Hex in 19x19, which was often said to be untractable for zero learning; and in Havannah. We also won several first places at the TAAI competitions. △ Less

Submitted 27 January, 2020; originally announced January 2020.

arXiv:2001.05087 [pdf, ps, other]

Monte Carlo Game Solver

Authors: Tristan Cazenave

Abstract: We present a general algorithm to order moves so as to speedup exact game solvers. It uses online learning of playout policies and Monte Carlo Tree Search. The learned policy and the information in the Monte Carlo tree are used to order moves in game solvers. They improve greatly the solving time for multiple games. We present a general algorithm to order moves so as to speedup exact game solvers. It uses online learning of playout policies and Monte Carlo Tree Search. The learned policy and the information in the Monte Carlo tree are used to order moves in game solvers. They improve greatly the solving time for multiple games. △ Less

Submitted 14 January, 2020; originally announced January 2020.

arXiv:1911.07960 [pdf, ps, other]

The αμ Search Algorithm for the Game of Bridge

Authors: Tristan Cazenave, Véronique Ventos

Abstract: αμ is an anytime heuristic search algorithm for incomplete information games that assumes perfect information for the opponents. αμ addresses the strategy fusion and non-locality problems encountered by Perfect Information Monte Carlo sampling. In this paper αμ is applied to the game of Bridge. αμ is an anytime heuristic search algorithm for incomplete information games that assumes perfect information for the opponents. αμ addresses the strategy fusion and non-locality problems encountered by Perfect Information Monte Carlo sampling. In this paper αμ is applied to the game of Bridge. △ Less

Submitted 18 November, 2019; originally announced November 2019.

arXiv:1607.02431 [pdf, other]

Learning opening books in partially observable games: using random seeds in Phantom Go

Authors: Tristan Cazenave, Jialin Liu, Fabien Teytaud, Olivier Teytaud

Abstract: Many artificial intelligences (AIs) are randomized. One can be lucky or unlucky with the random seed; we quantify this effect and show that, maybe contrarily to intuition, this is far from being negligible. Then, we apply two different existing algorithms for selecting good seeds and good probability distributions over seeds. This mainly leads to learning an opening book. We apply this to Phantom… ▽ More Many artificial intelligences (AIs) are randomized. One can be lucky or unlucky with the random seed; we quantify this effect and show that, maybe contrarily to intuition, this is far from being negligible. Then, we apply two different existing algorithms for selecting good seeds and good probability distributions over seeds. This mainly leads to learning an opening book. We apply this to Phantom Go, which, as all phantom games, is hard for opening book learning. We improve the winning rate from 50% to 70% in 5x5 against the same AI, and from approximately 0% to 40% in 5x5, 7x7 and 9x9 against a stronger (learning) opponent. △ Less

Submitted 8 July, 2016; originally announced July 2016.

Comments: 7 pages, 15 figures. Accepted by CIG2016

MSC Class: 91A05; 91A10

arXiv:1511.02006 [pdf, other]

Depth, balancing, and limits of the Elo model

Authors: Marie-Liesse Cauwet, Olivier Teytaud, Hua-Min Liang, Shi-Jim Yen, Hung-Hsuan Lin, I-Chen Wu, Tristan Cazenave, Abdallah Saffidine

Abstract: -Much work has been devoted to the computational complexity of games. However, they are not necessarily relevant for estimating the complexity in human terms. Therefore, human-centered measures have been proposed, e.g. the depth. This paper discusses the depth of various games, extends it to a continuous measure. We provide new depth results and present tool (given-first-move, pie rule, size exten… ▽ More -Much work has been devoted to the computational complexity of games. However, they are not necessarily relevant for estimating the complexity in human terms. Therefore, human-centered measures have been proposed, e.g. the depth. This paper discusses the depth of various games, extends it to a continuous measure. We provide new depth results and present tool (given-first-move, pie rule, size extension) for increasing it. We also use these measures for analyzing games and opening moves in Y, NoGo, Killall Go, and the effect of pie rules. △ Less

Submitted 6 November, 2015; originally announced November 2015.

Journal ref: IEEE Conference on Computational Intelligence and Games 2015, Aug 2015, Tainan, Taiwan. 2015

Showing 1–35 of 35 results for author: Cazenave, T