-
A Simple, Solid, and Reproducible Baseline for Bridge Bidding AI
Authors:
Haruka Kita,
Sotetsu Koyamada,
Yotaro Yamaguchi,
Shin Ishii
Abstract:
Contract bridge, a cooperative game characterized by imperfect information and multi-agent dynamics, poses significant challenges and serves as a critical benchmark in artificial intelligence (AI) research. Success in this domain requires agents to effectively cooperate with their partners. This study demonstrates that an appropriate combination of existing methods can perform surprisingly well in…
▽ More
Contract bridge, a cooperative game characterized by imperfect information and multi-agent dynamics, poses significant challenges and serves as a critical benchmark in artificial intelligence (AI) research. Success in this domain requires agents to effectively cooperate with their partners. This study demonstrates that an appropriate combination of existing methods can perform surprisingly well in bridge bidding against WBridge5, a leading benchmark in the bridge bidding system and a multiple-time World Computer-Bridge Championship winner. Our approach is notably simple, yet it outperforms the current state-of-the-art methodologies in this field. Furthermore, we have made our code and models publicly available as open-source software. This initiative provides a strong starting foundation for future bridge AI research, facilitating the development and verification of new strategies and advancements in the field.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
A Batch Sequential Halving Algorithm without Performance Degradation
Authors:
Sotetsu Koyamada,
Soichiro Nishimori,
Shin Ishii
Abstract:
In this paper, we investigate the problem of pure exploration in the context of multi-armed bandits, with a specific focus on scenarios where arms are pulled in fixed-size batches. Batching has been shown to enhance computational efficiency, but it can potentially lead to a degradation compared to the original sequential algorithm's performance due to delayed feedback and reduced adaptability. We…
▽ More
In this paper, we investigate the problem of pure exploration in the context of multi-armed bandits, with a specific focus on scenarios where arms are pulled in fixed-size batches. Batching has been shown to enhance computational efficiency, but it can potentially lead to a degradation compared to the original sequential algorithm's performance due to delayed feedback and reduced adaptability. We introduce a simple batch version of the Sequential Halving (SH) algorithm (Karnin et al., 2013) and provide theoretical evidence that batching does not degrade the performance of the original algorithm under practical conditions. Furthermore, we empirically validate our claim through experiments, demonstrating the robust nature of the SH algorithm in fixed-size batch settings.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
End-to-End Policy Gradient Method for POMDPs and Explainable Agents
Authors:
Soichiro Nishimori,
Sotetsu Koyamada,
Shin Ishii
Abstract:
Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to real-world tasks s…
▽ More
Real-world decision-making problems are often partially observable, and many can be formulated as a Partially Observable Markov Decision Process (POMDP). When we apply reinforcement learning (RL) algorithms to the POMDP, reasonable estimation of the hidden states can help solve the problems. Furthermore, explainable decision-making is preferable, considering their application to real-world tasks such as autonomous driving cars. We proposed an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Pgx: Hardware-Accelerated Parallel Game Simulators for Reinforcement Learning
Authors:
Sotetsu Koyamada,
Shinri Okano,
Soichiro Nishimori,
Yu Murata,
Keigo Habara,
Haruka Kita,
Shin Ishii
Abstract:
We propose Pgx, a suite of board game reinforcement learning (RL) environments written in JAX and optimized for GPU/TPU accelerators. By leveraging JAX's auto-vectorization and parallelization over accelerators, Pgx can efficiently scale to thousands of simultaneous simulations over accelerators. In our experiments on a DGX-A100 workstation, we discovered that Pgx can simulate RL environments 10-1…
▽ More
We propose Pgx, a suite of board game reinforcement learning (RL) environments written in JAX and optimized for GPU/TPU accelerators. By leveraging JAX's auto-vectorization and parallelization over accelerators, Pgx can efficiently scale to thousands of simultaneous simulations over accelerators. In our experiments on a DGX-A100 workstation, we discovered that Pgx can simulate RL environments 10-100x faster than existing implementations available in Python. Pgx includes RL environments commonly used as benchmarks in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx offers miniature game sets and baseline models to facilitate rapid research cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm with Pgx environments. Overall, Pgx provides high-performance environment simulators for researchers to accelerate their RL experiments. Pgx is available at http://github.com/sotetsuk/pgx.
△ Less
Submitted 15 January, 2024; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Suphx: Mastering Mahjong with Deep Reinforcement Learning
Authors:
Junjie Li,
Sotetsu Koyamada,
Qiwei Ye,
Guoqing Liu,
Chao Wang,
Ruihan Yang,
Li Zhao,
Tao Qin,
Tie-Yan Liu,
Hsiao-Wuen Hon
Abstract:
Artificial Intelligence (AI) has achieved great success in many domains, and game AI is widely regarded as its beachhead since the dawn of AI. In recent years, studies on game AI have gradually evolved from relatively simple environments (e.g., perfect-information games such as Go, chess, shogi or two-player imperfect-information games such as heads-up Texas hold'em) to more complex ones (e.g., mu…
▽ More
Artificial Intelligence (AI) has achieved great success in many domains, and game AI is widely regarded as its beachhead since the dawn of AI. In recent years, studies on game AI have gradually evolved from relatively simple environments (e.g., perfect-information games such as Go, chess, shogi or two-player imperfect-information games such as heads-up Texas hold'em) to more complex ones (e.g., multi-player imperfect-information games such as multi-player Texas hold'em and StartCraft II). Mahjong is a popular multi-player imperfect-information game worldwide but very challenging for AI research due to its complex playing/scoring rules and rich hidden information. We design an AI for Mahjong, named Suphx, based on deep reinforcement learning with some newly introduced techniques including global reward prediction, oracle guiding, and run-time policy adaptation. Suphx has demonstrated stronger performance than most top human players in terms of stable rank and is rated above 99.99% of all the officially ranked human players in the Tenhou platform. This is the first time that a computer program outperforms most top human players in Mahjong.
△ Less
Submitted 31 March, 2020; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Neural Sequence Model Training via $α$-divergence Minimization
Authors:
Sotetsu Koyamada,
Yuta Kikuchi,
Atsunori Kanemura,
Shin-ichi Maeda,
Shin Ishii
Abstract:
We propose a new neural sequence model training method in which the objective function is defined by $α$-divergence. We demonstrate that the objective function generalizes the maximum-likelihood (ML)-based and reinforcement learning (RL)-based objective functions as special cases (i.e., ML corresponds to $α\to 0$ and RL to $α\to1$). We also show that the gradient of the objective function can be c…
▽ More
We propose a new neural sequence model training method in which the objective function is defined by $α$-divergence. We demonstrate that the objective function generalizes the maximum-likelihood (ML)-based and reinforcement learning (RL)-based objective functions as special cases (i.e., ML corresponds to $α\to 0$ and RL to $α\to1$). We also show that the gradient of the objective function can be considered a mixture of ML- and RL-based objective gradients. The experimental results of a machine translation task show that minimizing the objective function with $α> 0$ outperforms $α\to 0$, which corresponds to ML-based methods.
△ Less
Submitted 30 June, 2017;
originally announced June 2017.
-
Deep learning of fMRI big data: a novel approach to subject-transfer decoding
Authors:
Sotetsu Koyamada,
Yumi Shikauchi,
Ken Nakae,
Masanori Koyama,
Shin Ishii
Abstract:
As a technology to read brain states from measurable brain activities, brain decoding are widely applied in industries and medical sciences. In spite of high demands in these applications for a universal decoder that can be applied to all individuals simultaneously, large variation in brain activities across individuals has limited the scope of many studies to the development of individual-specifi…
▽ More
As a technology to read brain states from measurable brain activities, brain decoding are widely applied in industries and medical sciences. In spite of high demands in these applications for a universal decoder that can be applied to all individuals simultaneously, large variation in brain activities across individuals has limited the scope of many studies to the development of individual-specific decoders. In this study, we used deep neural network (DNN), a nonlinear hierarchical model, to construct a subject-transfer decoder. Our decoder is the first successful DNN-based subject-transfer decoder. When applied to a large-scale functional magnetic resonance imaging (fMRI) database, our DNN-based decoder achieved higher decoding accuracy than other baseline methods, including support vector machine (SVM). In order to analyze the knowledge acquired by this decoder, we applied principal sensitivity analysis (PSA) to the decoder and visualized the discriminative features that are common to all subjects in the dataset. Our PSA successfully visualized the subject-independent features contributing to the subject-transferability of the trained decoder.
△ Less
Submitted 31 January, 2015;
originally announced February 2015.
-
Principal Sensitivity Analysis
Authors:
Sotetsu Koyamada,
Masanori Koyama,
Ken Nakae,
Shin Ishii
Abstract:
We present a novel algorithm (Principal Sensitivity Analysis; PSA) to analyze the knowledge of the classifier obtained from supervised machine learning techniques. In particular, we define principal sensitivity map (PSM) as the direction on the input space to which the trained classifier is most sensitive, and use analogously defined k-th PSM to define a basis for the input space. We train neural…
▽ More
We present a novel algorithm (Principal Sensitivity Analysis; PSA) to analyze the knowledge of the classifier obtained from supervised machine learning techniques. In particular, we define principal sensitivity map (PSM) as the direction on the input space to which the trained classifier is most sensitive, and use analogously defined k-th PSM to define a basis for the input space. We train neural networks with artificial data and real data, and apply the algorithm to the obtained supervised classifiers. We then visualize the PSMs to demonstrate the PSA's ability to decompose the knowledge acquired by the trained classifiers.
△ Less
Submitted 11 March, 2015; v1 submitted 21 December, 2014;
originally announced December 2014.