Search | arXiv e-print repository

arXiv:2406.19655 [pdf, other]

Basketball-SORT: An Association Method for Complex Multi-object Occlusion Problems in Basketball Multi-object Tracking

Authors: Qingrui Hu, Atom Scott, Calvin Yeung, Keisuke Fujii

Abstract: Recent deep learning-based object detection approaches have led to significant progress in multi-object tracking (MOT) algorithms. The current MOT methods mainly focus on pedestrian or vehicle scenes, but basketball sports scenes are usually accompanied by three or more object occlusion problems with similar appearances and high-intensity complex motions, which we call complex multi-object occlusi… ▽ More Recent deep learning-based object detection approaches have led to significant progress in multi-object tracking (MOT) algorithms. The current MOT methods mainly focus on pedestrian or vehicle scenes, but basketball sports scenes are usually accompanied by three or more object occlusion problems with similar appearances and high-intensity complex motions, which we call complex multi-object occlusion (CMOO). Here, we propose an online and robust MOT approach, named Basketball-SORT, which focuses on the CMOO problems in basketball videos. To overcome the CMOO problem, instead of using the intersection-over-union-based (IoU-based) approach, we use the trajectories of neighboring frames based on the projected positions of the players. Our method designs the basketball game restriction (BGR) and reacquiring Long-Lost IDs (RLLI) based on the characteristics of basketball scenes, and we also solve the occlusion problem based on the player trajectories and appearance features. Experimental results show that our method achieves a Higher Order Tracking Accuracy (HOTA) score of 63.48$\%$ on the basketball fixed video dataset and outperforms other recent popular approaches. Overall, our approach solved the CMOO problem more effectively than recent MOT algorithms. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.08749 [pdf, other]

Mathematical models for off-ball scoring prediction in basketball

Authors: Rikako Kono, Keisuke Fujii

Abstract: In professional basketball, the accurate prediction of scoring opportunities based on strategic decision-making is crucial for space and player evaluations. However, traditional models often face challenges in accounting for the complexities of off-ball movements, which are essential for accurate predictive performance. In this study, we propose two mathematical models to predict off-ball scoring… ▽ More In professional basketball, the accurate prediction of scoring opportunities based on strategic decision-making is crucial for space and player evaluations. However, traditional models often face challenges in accounting for the complexities of off-ball movements, which are essential for accurate predictive performance. In this study, we propose two mathematical models to predict off-ball scoring opportunities in basketball, considering both pass-to-score and dribble-to-score movements: the Ball Movement for Off-ball Scoring (BMOS) and the Ball Intercept and Movement for Off-ball Scoring (BIMOS) models. The BMOS adapts principles from the Off-Ball Scoring Opportunities (OBSO) model, originally designed for soccer, to basketball, whereas the BIMOS also incorporates the likelihood of interception during ball movements. We evaluated these models using player tracking data from 630 NBA games in the 2015-2016 regular season, demonstrating that the BIMOS outperforms the BMOS in terms of scoring prediction accuracy. Thus, our models provide valuable insights for tactical analysis and player evaluation in basketball. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 12 pages

arXiv:2405.12070 [pdf, other]

AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements

Authors: Calvin Yeung, Kenjiro Ide, Keisuke Fujii

Abstract: Image understanding is a foundational task in computer vision, with recent applications emerging in soccer posture analysis. However, existing publicly available datasets lack comprehensive information, notably in the form of posture sequences and 2D pose annotations. Moreover, current analysis models often rely on interpretable linear models (e.g., PCA and regression), limiting their capacity to… ▽ More Image understanding is a foundational task in computer vision, with recent applications emerging in soccer posture analysis. However, existing publicly available datasets lack comprehensive information, notably in the form of posture sequences and 2D pose annotations. Moreover, current analysis models often rely on interpretable linear models (e.g., PCA and regression), limiting their capacity to capture non-linear spatiotemporal relationships in complex and diverse scenarios. To address these gaps, we introduce the 3D Shot Posture (3DSP) dataset in soccer broadcast videos, which represents the most extensive sports image dataset with 2D pose annotations to our knowledge. Additionally, we present the 3DSP-GRAE (Graph Recurrent AutoEncoder) model, a non-linear approach for embedding pose sequences. Furthermore, we propose AutoSoccerPose, a pipeline aimed at semi-automating 2D and 3D pose estimation and posture analysis. While achieving full automation proved challenging, we provide a foundational baseline, extending its utility beyond the scope of annotated data. We validate AutoSoccerPose on SoccerNet and 3DSP datasets, and present posture analysis results based on 3DSP. The dataset, code, and models are available at: https://github.com/calvinyeungck/3D-Shot-Posture-Dataset. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2404.17790 [pdf, other]

Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities

Authors: Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, Naoaki Okazaki

Abstract: Cross-lingual continual pre-training of large language models (LLMs) initially trained on English corpus allows us to leverage the vast amount of English language resources and reduce the pre-training cost. In this study, we constructed Swallow, an LLM with enhanced Japanese capability, by extending the vocabulary of Llama 2 to include Japanese characters and conducting continual pre-training on a… ▽ More Cross-lingual continual pre-training of large language models (LLMs) initially trained on English corpus allows us to leverage the vast amount of English language resources and reduce the pre-training cost. In this study, we constructed Swallow, an LLM with enhanced Japanese capability, by extending the vocabulary of Llama 2 to include Japanese characters and conducting continual pre-training on a large Japanese web corpus. Experimental results confirmed that the performance on Japanese tasks drastically improved through continual pre-training, and the performance monotonically increased with the amount of training data up to 100B tokens. Consequently, Swallow achieved superior performance compared to other LLMs that were trained from scratch in English and Japanese. An analysis of the effects of continual pre-training revealed that it was particularly effective for Japanese question answering tasks. Furthermore, to elucidate effective methodologies for cross-lingual continual pre-training from English to Japanese, we investigated the impact of vocabulary expansion and the effectiveness of incorporating parallel corpora. The results showed that the efficiency gained through vocabulary expansion had no negative impact on performance, except for the summarization task, and that the combined use of parallel corpora enhanced translation ability. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.17733 [pdf, other]

Building a Large Japanese Web Corpus for Large Language Models

Authors: Naoaki Okazaki, Kakeru Hattori, Hirai Shota, Hiroki Iida, Masanari Ohi, Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Rio Yokota, Sakae Mizuki

Abstract: Open Japanese large language models (LLMs) have been trained on the Japanese portions of corpora such as CC-100, mC4, and OSCAR. However, these corpora were not created for the quality of Japanese texts. This study builds a large Japanese web corpus by extracting and refining text from the Common Crawl archive (21 snapshots of approximately 63.4 billion pages crawled between 2020 and 2023). This c… ▽ More Open Japanese large language models (LLMs) have been trained on the Japanese portions of corpora such as CC-100, mC4, and OSCAR. However, these corpora were not created for the quality of Japanese texts. This study builds a large Japanese web corpus by extracting and refining text from the Common Crawl archive (21 snapshots of approximately 63.4 billion pages crawled between 2020 and 2023). This corpus consists of approximately 312.1 billion characters (approximately 173 million pages), which is the largest of all available training corpora for Japanese LLMs, surpassing CC-100 (approximately 25.8 billion characters), mC4 (approximately 239.7 billion characters) and OSCAR 23.10 (approximately 74 billion characters). To confirm the quality of the corpus, we performed continual pre-training on Llama 2 7B, 13B, 70B, Mistral 7B v0.1, and Mixtral 8x7B Instruct as base LLMs and gained consistent (6.6-8.1 points) improvements on Japanese benchmark datasets. We also demonstrate that the improvement on Llama 2 13B brought from the presented corpus was the largest among those from other existing corpora. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 17 pages

arXiv:2404.13868 [pdf, other]

TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos

Authors: Atom Scott, Ikuma Uchida, Ning Ding, Rikuhei Umemoto, Rory Bunker, Ren Kobayashi, Takeshi Koyama, Masaki Onishi, Yoshinari Kameda, Keisuke Fujii

Abstract: Multi-object tracking (MOT) is a critical and challenging task in computer vision, particularly in situations involving objects with similar appearances but diverse movements, as seen in team sports. Current methods, largely reliant on object detection and appearance, often fail to track targets in such complex scenarios accurately. This limitation is further exacerbated by the lack of comprehensi… ▽ More Multi-object tracking (MOT) is a critical and challenging task in computer vision, particularly in situations involving objects with similar appearances but diverse movements, as seen in team sports. Current methods, largely reliant on object detection and appearance, often fail to track targets in such complex scenarios accurately. This limitation is further exacerbated by the lack of comprehensive and diverse datasets covering the full view of sports pitches. Addressing these issues, we introduce TeamTrack, a pioneering benchmark dataset specifically designed for MOT in sports. TeamTrack is an extensive collection of full-pitch video data from various sports, including soccer, basketball, and handball. Furthermore, we perform a comprehensive analysis and benchmarking effort to underscore TeamTrack's utility and potential impact. Our work signifies a crucial step forward, promising to elevate the precision and effectiveness of MOT in complex, dynamic settings such as team sports. The dataset, project code and competition is released at: https://atomscott.github.io/TeamTrack/. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.07824 [pdf, other]

Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese

Authors: Yuichi Inoue, Kento Sasaki, Yuma Ochi, Kazuki Fujii, Kotaro Tanahashi, Yu Yamaguchi

Abstract: Vision Language Models (VLMs) have undergone a rapid evolution, giving rise to significant advancements in the realm of multimodal understanding tasks. However, the majority of these models are trained and evaluated on English-centric datasets, leaving a gap in the development and evaluation of VLMs for other languages, such as Japanese. This gap can be attributed to the lack of methodologies for… ▽ More Vision Language Models (VLMs) have undergone a rapid evolution, giving rise to significant advancements in the realm of multimodal understanding tasks. However, the majority of these models are trained and evaluated on English-centric datasets, leaving a gap in the development and evaluation of VLMs for other languages, such as Japanese. This gap can be attributed to the lack of methodologies for constructing VLMs and the absence of benchmarks to accurately measure their performance. To address this issue, we introduce a novel benchmark, Japanese Heron-Bench, for evaluating Japanese capabilities of VLMs. The Japanese Heron-Bench consists of a variety of imagequestion answer pairs tailored to the Japanese context. Additionally, we present a baseline Japanese VLM that has been trained with Japanese visual instruction tuning datasets. Our Heron-Bench reveals the strengths and limitations of the proposed VLM across various ability dimensions. Furthermore, we clarify the capability gap between strong closed models like GPT-4V and the baseline model, providing valuable insights for future research in this domain. We release the benchmark dataset and training code to facilitate further developments in Japanese VLM research. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.13821 [pdf, other]

Offensive Lineup Analysis in Basketball with Clustering Players Based on Shooting Style and Offensive Role

Authors: Kazuhiro Yamada, Keisuke Fujii

Abstract: In a basketball game, scoring efficiency holds significant importance due to the numerous offensive possessions per game. Enhancing scoring efficiency necessitates effective collaboration among players with diverse playing styles. In previous studies, basketball lineups have been analyzed, but their playing style compatibility has not been quantitatively examined. The purpose of this study is to a… ▽ More In a basketball game, scoring efficiency holds significant importance due to the numerous offensive possessions per game. Enhancing scoring efficiency necessitates effective collaboration among players with diverse playing styles. In previous studies, basketball lineups have been analyzed, but their playing style compatibility has not been quantitatively examined. The purpose of this study is to analyze more specifically the impact of playing style compatibility on scoring efficiency, focusing only on offense. This study employs two methods to capture the playing styles of players on offense: shooting style clustering using tracking data, and offensive role clustering based on annotated playtypes and advanced statistics. For the former, interpretable hand-crafted shot features and Wasserstein distances between shooting style distributions were utilized. For the latter, soft clustering was applied to playtype data for the first time. Subsequently, based on the lineup information derived from these two clusterings, machine learning models Bayesian models that predict statistics representing scoring efficiency were trained and interpreted. These approaches provide insights into which combinations of five players tend to be effective and which combinations of two players tend to produce good effects. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 28 pages

arXiv:2403.07669 [pdf]

Machine Learning for Soccer Match Result Prediction

Authors: Rory Bunker, Calvin Yeung, Keisuke Fujii

Abstract: Machine learning has become a common approach to predicting the outcomes of soccer matches, and the body of literature in this domain has grown substantially in the past decade and a half. This chapter discusses available datasets, the types of models and features, and ways of evaluating model performance in this application domain. The aim of this chapter is to give a broad overview of the curren… ▽ More Machine learning has become a common approach to predicting the outcomes of soccer matches, and the body of literature in this domain has grown substantially in the past decade and a half. This chapter discusses available datasets, the types of models and features, and ways of evaluating model performance in this application domain. The aim of this chapter is to give a broad overview of the current state and potential future developments in machine learning for soccer match results prediction, as a resource for those interested in conducting future studies in the area. Our main findings are that while gradient-boosted tree models such as CatBoost, applied to soccer-specific ratings such as pi-ratings, are currently the best-performing models on datasets containing only goals as the match features, there needs to be a more thorough comparison of the performance of deep learning models and Random Forest on a range of datasets with different types of features. Furthermore, new rating systems using both player- and team-level information and incorporating additional information from, e.g., spatiotemporal tracking and event data, could be investigated further. Finally, the interpretability of match result prediction models needs to be enhanced for them to be more useful for team management. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.09650 [pdf, other]

Foul prediction with estimated poses from soccer broadcast video

Authors: Jiale Fang, Calvin Yeung, Keisuke Fujii

Abstract: Recent advances in computer vision have made significant progress in tracking and pose estimation of sports players. However, there have been fewer studies on behavior prediction with pose estimation in sports, in particular, the prediction of soccer fouls is challenging because of the smaller image size of each player and of difficulty in the usage of e.g., the ball and pose information. In our r… ▽ More Recent advances in computer vision have made significant progress in tracking and pose estimation of sports players. However, there have been fewer studies on behavior prediction with pose estimation in sports, in particular, the prediction of soccer fouls is challenging because of the smaller image size of each player and of difficulty in the usage of e.g., the ball and pose information. In our research, we introduce an innovative deep learning approach for anticipating soccer fouls. This method integrates video data, bounding box positions, image details, and pose information by curating a novel soccer foul dataset. Our model utilizes a combination of convolutional and recurrent neural networks (CNNs and RNNs) to effectively merge information from these four modalities. The experimental results show that our full model outperformed the ablated models, and all of the RNN modules, bounding box position and image, and estimated pose were useful for the foul prediction. Our findings have important implications for a deeper understanding of foul play in soccer and provide a valuable reference for future research and practice in this area. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2312.00352 [pdf, other]

Quantum Kernel t-Distributed Stochastic Neighbor Embedding

Authors: Yoshiaki Kawase, Kosuke Mitarai, Keisuke Fujii

Abstract: Data visualization is important in understanding the characteristics of data that are difficult to see directly. It is used to visualize loss landscapes and optimization trajectories to analyze optimization performance. Popular optimization analysis is performed by visualizing a loss landscape around the reached local or global minimum using principal component analysis. However, this visualizatio… ▽ More Data visualization is important in understanding the characteristics of data that are difficult to see directly. It is used to visualize loss landscapes and optimization trajectories to analyze optimization performance. Popular optimization analysis is performed by visualizing a loss landscape around the reached local or global minimum using principal component analysis. However, this visualization depends on the variational parameters of a quantum circuit rather than quantum states, which makes it difficult to understand the mechanism of optimization process through the property of quantum states. Here, we propose a quantum data visualization method using quantum kernels, which enables us to offer fast and highly accurate visualization of quantum states. In our numerical experiments, we visualize hand-written digits dataset and apply $k$-nearest neighbor algorithm to the low-dimensional data to quantitatively evaluate our proposed method compared with a classical kernel method. As a result, our proposed method achieves comparable accuracy to the state-of-the-art classical kernel method, meaning that the proposed visualization method based on quantum machine learning does not degrade the separability of the input higher dimensional data. Furthermore, we visualize the optimization trajectories of finding the ground states of transverse field Ising model and successfully find the trajectory characteristics. Since quantum states are higher dimensional objects that can only be seen via observables, our visualization method, which inherits the similarity of quantum data, would be useful in understanding the behavior of quantum circuits and algorithms. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 10pages, 8 figures, 2 tables

arXiv:2311.17353 [pdf, other]

doi 10.1103/PhysRevResearch.6.023191

Continuous optimization by quantum adaptive distribution search

Authors: Kohei Morimoto, Yusuke Takase, Kosuke Mitarai, Keisuke Fujii

Abstract: In this paper, we introduce the quantum adaptive distribution search (QuADS), a quantum continuous optimization algorithm that integrates Grover adaptive search (GAS) with the covariance matrix adaptation - evolution strategy (CMA-ES), a classical technique for continuous optimization. QuADS utilizes the quantum-based search capabilities of GAS and enhances them with the principles of CMA-ES for m… ▽ More In this paper, we introduce the quantum adaptive distribution search (QuADS), a quantum continuous optimization algorithm that integrates Grover adaptive search (GAS) with the covariance matrix adaptation - evolution strategy (CMA-ES), a classical technique for continuous optimization. QuADS utilizes the quantum-based search capabilities of GAS and enhances them with the principles of CMA-ES for more efficient optimization. It employs a multivariate normal distribution for the initial state of the quantum search and repeatedly updates it throughout the optimization process. Our numerical experiments show that QuADS outperforms both GAS and CMA-ES. This is achieved through adaptive refinement of the initial state distribution rather than consistently using a uniform state, resulting in fewer oracle calls. This study presents an important step toward exploiting the potential of quantum computing for continuous optimization. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.16564 [pdf, other]

Multi-agent statistical discriminative sub-trajectory mining and an application to NBA basketball

Authors: Rory Bunker, Vo Nguyen Le Duy, Yasuo Tabei, Ichiro Takeuchi, Keisuke Fujii

Abstract: Improvements in tracking technology through optical and computer vision systems have enabled a greater understanding of the movement-based behaviour of multiple agents, including in team sports. In this study, a Multi-Agent Statistically Discriminative Sub-Trajectory Mining (MA-Stat-DSM) method is proposed that takes a set of binary-labelled agent trajectory matrices as input and incorporates Haus… ▽ More Improvements in tracking technology through optical and computer vision systems have enabled a greater understanding of the movement-based behaviour of multiple agents, including in team sports. In this study, a Multi-Agent Statistically Discriminative Sub-Trajectory Mining (MA-Stat-DSM) method is proposed that takes a set of binary-labelled agent trajectory matrices as input and incorporates Hausdorff distance to identify sub-matrices that statistically significantly discriminate between the two groups of labelled trajectory matrices. Utilizing 2015/16 SportVU NBA tracking data, agent trajectory matrices representing attacks consisting of the trajectories of five agents (the ball, shooter, last passer, shooter defender, and last passer defender), were truncated to correspond to the time interval following the receipt of the ball by the last passer, and labelled as effective or ineffective based on a definition of attack effectiveness that we devise in the current study. After identifying appropriate parameters for MA-Stat-DSM by iteratively applying it to all matches involving the two top- and two bottom-placed teams from the 2015/16 NBA season, the method was then applied to selected matches and could identify and visualize the portions of plays, e.g., involving passing, on-, and/or off-the-ball movements, which were most relevant in rendering attacks effective or ineffective. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.06497 [pdf, other]

DRUformer: Enhancing the driving scene Important object detection with driving relationship self-understanding

Authors: Yingjie Niu, Ming Ding, Keisuke Fujii, Kento Ohtani, Alexander Carballo, Kazuya Takeda

Abstract: Traffic accidents frequently lead to fatal injuries, contributing to over 50 million deaths until 2023. To mitigate driving hazards and ensure personal safety, it is crucial to assist vehicles in anticipating important objects during travel. Previous research on important object detection primarily assessed the importance of individual participants, treating them as independent entities and freque… ▽ More Traffic accidents frequently lead to fatal injuries, contributing to over 50 million deaths until 2023. To mitigate driving hazards and ensure personal safety, it is crucial to assist vehicles in anticipating important objects during travel. Previous research on important object detection primarily assessed the importance of individual participants, treating them as independent entities and frequently overlooking the connections between these participants. Unfortunately, this approach has proven less effective in detecting important objects in complex scenarios. In response, we introduce Driving scene Relationship self-Understanding transformer (DRUformer), designed to enhance the important object detection task. The DRUformer is a transformer-based multi-modal important object detection model that takes into account the relationships between all the participants in the driving scenario. Recognizing that driving intention also significantly affects the detection of important objects during driving, we have incorporated a module for embedding driving intention. To assess the performance of our approach, we conducted a comparative experiment on the DRAMA dataset, pitting our model against other state-of-the-art (SOTA) models. The results demonstrated a noteworthy 16.2\% improvement in mIoU and a substantial 12.3\% boost in ACC compared to SOTA methods. Furthermore, we conducted a qualitative analysis of our model's ability to detect important objects across different road scenarios and classes, highlighting its effectiveness in diverse contexts. Finally, we conducted various ablation studies to assess the efficiency of the proposed modules in our DRUformer model. △ Less

Submitted 13 December, 2023; v1 submitted 11 November, 2023; originally announced November 2023.

arXiv:2310.17193 [pdf, other]

Automatic Edge Error Judgment in Figure Skating Using 3D Pose Estimation from a Monocular Camera and IMUs

Authors: Ryota Tanaka, Tomohiro Suzuki, Kazuya Takeda, Keisuke Fujii

Abstract: Automatic evaluating systems are fundamental issues in sports technologies. In many sports, such as figure skating, automated evaluating methods based on pose estimation have been proposed. However, previous studies have evaluated skaters' skills in 2D analysis. In this paper, we propose an automatic edge error judgment system with a monocular smartphone camera and inertial sensors, which enable u… ▽ More Automatic evaluating systems are fundamental issues in sports technologies. In many sports, such as figure skating, automated evaluating methods based on pose estimation have been proposed. However, previous studies have evaluated skaters' skills in 2D analysis. In this paper, we propose an automatic edge error judgment system with a monocular smartphone camera and inertial sensors, which enable us to analyze 3D motions. Edge error is one of the most significant scoring items and is challenging to automatically judge due to its 3D motion. The results show that the model using 3D joint position coordinates estimated from the monocular camera as the input feature had the highest accuracy at 83% for unknown skaters' data. We also analyzed the detailed motion analysis for edge error judgment. These results indicate that the monocular camera can be used to judge edge errors automatically. We will provide the figure skating single Lutz jump dataset, including pre-processed videos and labels, at https://github.com/ryota-takedalab/JudgeAI-LutzEdge. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.11700 [pdf, other]

doi 10.1007/s11042-024-18881-x

Runner re-identification from single-view running video in the open-world setting

Authors: Tomohiro Suzuki, Kazushi Tsutsui, Kazuya Takeda, Keisuke Fujii

Abstract: In many sports, player re-identification is crucial for automatic video processing and analysis. However, most of the current studies on player re-identification in multi- or single-view sports videos focus on re-identification in the closed-world setting using labeled image dataset, and player re-identification in the open-world setting for automatic video analysis is not well developed. In this… ▽ More In many sports, player re-identification is crucial for automatic video processing and analysis. However, most of the current studies on player re-identification in multi- or single-view sports videos focus on re-identification in the closed-world setting using labeled image dataset, and player re-identification in the open-world setting for automatic video analysis is not well developed. In this paper, we propose a runner re-identification system that directly processes single-view video to address the open-world setting. In the open-world setting, we cannot use labeled dataset and have to process video directly. The proposed system automatically processes raw video as input to identify runners, and it can identify runners even when they are framed out multiple times. For the automatic processing, we first detect the runners in the video using the pre-trained YOLOv8 and the fine-tuned EfficientNet. We then track the runners using ByteTrack and detect their shoes with the fine-tuned YOLOv8. Finally, we extract the image features of the runners using an unsupervised method with the gated recurrent unit autoencoder and global and local features mixing. To improve the accuracy of runner re-identification, we use shoe images as local image features and dynamic features of running sequence images. We evaluated the system on a running practice video dataset and showed that the proposed method identified runners with higher accuracy than some state-of-the-art models in unsupervised re-identification. We also showed that our proposed local image feature and running dynamic feature were effective for runner re-identification. Our runner re-identification system can be useful for the automatic analysis of running videos. △ Less

Submitted 16 April, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: 20 pages, 7 figures

Journal ref: Multimedia Tools and Applications, 2024, 1-17

arXiv:2309.14807 [pdf, other]

Evaluating Soccer Match Prediction Models: A Deep Learning Approach and Feature Optimization for Gradient-Boosted Trees

Authors: Calvin Yeung, Rory Bunker, Rikuhei Umemoto, Keisuke Fujii

Abstract: Machine learning models have become increasingly popular for predicting the results of soccer matches, however, the lack of publicly-available benchmark datasets has made model evaluation challenging. The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and los… ▽ More Machine learning models have become increasingly popular for predicting the results of soccer matches, however, the lack of publicly-available benchmark datasets has made model evaluation challenging. The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and loss. The original training set of matches and features, which was provided for the competition, was augmented with additional matches that were played between 4 April and 13 April 2023, representing the period after which the training set ended, but prior to the first matches that were to be predicted (upon which the performance was evaluated). A CatBoost model was employed using pi-ratings as the features, which were initially identified as the optimal choice for calculating the win/draw/loss probabilities. Notably, deep learning models have frequently been disregarded in this particular task. Therefore, in this study, we aimed to assess the performance of a deep learning model and determine the optimal feature set for a gradient-boosted tree model. The model was trained using the most recent five years of data, and three training and validation sets were used in a hyperparameter grid search. The results from the validation sets show that our model had strong performance and stability compared to previously published models from the 2017 Soccer Prediction Challenge for win/draw/loss prediction. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2307.14732 [pdf, other]

A Strategic Framework for Optimal Decisions in Football 1-vs-1 Shot-Taking Situations: An Integrated Approach of Machine Learning, Theory-Based Modeling, and Game Theory

Authors: Calvin C. K. Yeung, Keisuke Fujii

Abstract: Complex interactions between two opposing agents frequently occur in domains of machine learning, game theory, and other application domains. Quantitatively analyzing the strategies involved can provide an objective basis for decision-making. One such critical scenario is shot-taking in football, where decisions, such as whether the attacker should shoot or pass the ball and whether the defender s… ▽ More Complex interactions between two opposing agents frequently occur in domains of machine learning, game theory, and other application domains. Quantitatively analyzing the strategies involved can provide an objective basis for decision-making. One such critical scenario is shot-taking in football, where decisions, such as whether the attacker should shoot or pass the ball and whether the defender should attempt to block the shot, play a crucial role in the outcome of the game. However, there are currently no effective data-driven and/or theory-based approaches to analyzing such situations. To address this issue, we proposed a novel framework to analyze such scenarios based on game theory, where we estimate the expected payoff with machine learning (ML) models, and additional features for ML models were extracted with a theory-based shot block model. Conventionally, successes or failures (1 or 0) are used as payoffs, while a success shot (goal) is extremely rare in football. Therefore, we proposed the Expected Probability of Shot On Target (xSOT) metric to evaluate players' actions even if the shot results in no goal; this allows for effective differentiation and comparison between different shots and even enables counterfactual shot situation analysis. In our experiments, we have validated the framework by comparing it with baseline and ablated models. Furthermore, we have observed a high correlation between the xSOT and existing metrics. This alignment of information suggests that xSOT provides valuable insights. Lastly, as an illustration, we studied optimal strategies in the World Cup 2022 and analyzed a shot situation in EURO 2020. △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2307.00263 [pdf, ps, other]

Solving break minimization problems in mirrored double round-robin tournament with QUBO solver

Authors: Koichi Fujii, Tomomi Matsui

Abstract: The break minimization problem is a fundamental problem in sports scheduling. Recently, its quadratic unconstrained binary optimization (QUBO) formulation has been proposed, which has gained much interest with the rapidly growing field of quantum computing. In this paper, we demonstrate that the state-of-the-art QUBO solver outperforms the general mixed integer quadratic programming (MIQP) solver… ▽ More The break minimization problem is a fundamental problem in sports scheduling. Recently, its quadratic unconstrained binary optimization (QUBO) formulation has been proposed, which has gained much interest with the rapidly growing field of quantum computing. In this paper, we demonstrate that the state-of-the-art QUBO solver outperforms the general mixed integer quadratic programming (MIQP) solver on break minimization problems in a mirrored double round-robin tournament. Moreover, we demonstrate that it still outperforms or is competitive even if we add practical constraints, such as consecutive constraints, to the break minimization problem. △ Less

Submitted 1 July, 2023; originally announced July 2023.

arXiv:2306.16627 [pdf, other]

MNISQ: A Large-Scale Quantum Circuit Dataset for Machine Learning on/for Quantum Computers in the NISQ era

Authors: Leonardo Placidi, Ryuichiro Hataya, Toshio Mori, Koki Aoyama, Hayata Morisaki, Kosuke Mitarai, Keisuke Fujii

Abstract: We introduce the first large-scale dataset, MNISQ, for both the Quantum and the Classical Machine Learning community during the Noisy Intermediate-Scale Quantum era. MNISQ consists of 4,950,000 data points organized in 9 subdatasets. Building our dataset from the quantum encoding of classical information (e.g., MNIST dataset), we deliver a dataset in a dual form: in quantum form, as circuits, and… ▽ More We introduce the first large-scale dataset, MNISQ, for both the Quantum and the Classical Machine Learning community during the Noisy Intermediate-Scale Quantum era. MNISQ consists of 4,950,000 data points organized in 9 subdatasets. Building our dataset from the quantum encoding of classical information (e.g., MNIST dataset), we deliver a dataset in a dual form: in quantum form, as circuits, and in classical form, as quantum circuit descriptions (quantum programming language, QASM). In fact, also the Machine Learning research related to quantum computers undertakes a dual challenge: enhancing machine learning exploiting the power of quantum computers, while also leveraging state-of-the-art classical machine learning methodologies to help the advancement of quantum computing. Therefore, we perform circuit classification on our dataset, tackling the task with both quantum and classical models. In the quantum endeavor, we test our circuit dataset with Quantum Kernel methods, and we show excellent results up to $97\%$ accuracy. In the classical world, the underlying quantum mechanical structures within the quantum circuit data are not trivial. Nevertheless, we test our dataset on three classical models: Structured State Space sequence model (S4), Transformer and LSTM. In particular, the S4 model applied on the tokenized QASM sequences reaches an impressive $77\%$ accuracy. These findings illustrate that quantum circuit-related datasets are likely to be quantum advantageous, but also that state-of-the-art machine learning methodologies can competently classify and recognize quantum circuits. We finally entrust the quantum and classical machine learning community the fundamental challenge to build more quantum-classical datasets like ours and to build future benchmarks from our experiments. The dataset is accessible on GitHub and its circuits are easily run in qulacs or qiskit. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: Preprint. Under review

arXiv:2306.08340 [pdf, other]

doi 10.1287/moor.2022.0031

The Secretary Problem with Predictions

Authors: Kaito Fujii, Yuichi Yoshida

Abstract: The value maximization version of the secretary problem is the problem of hiring a candidate with the largest value from a randomly ordered sequence of candidates. In this work, we consider a setting where predictions of candidate values are provided in advance. We propose an algorithm that achieves a nearly optimal value if the predictions are accurate and results in a constant-factor competitive… ▽ More The value maximization version of the secretary problem is the problem of hiring a candidate with the largest value from a randomly ordered sequence of candidates. In this work, we consider a setting where predictions of candidate values are provided in advance. We propose an algorithm that achieves a nearly optimal value if the predictions are accurate and results in a constant-factor competitive ratio otherwise. We also show that the worst-case competitive ratio of an algorithm cannot be higher than some constant $< 1/\mathrm{e}$, which is the best possible competitive ratio when we ignore predictions, if the algorithm performs nearly optimally when the predictions are accurate. Additionally, for the multiple-choice secretary problem, we propose an algorithm with a similar theoretical guarantee. We empirically illustrate that if the predictions are accurate, the proposed algorithms perform well; meanwhile, if the predictions are inaccurate, performance is comparable to existing algorithms that do not use predictions. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2305.17886 [pdf, other]

doi 10.1109/ACCESS.2023.3336425

Action valuation of on- and off-ball soccer players based on multi-agent deep reinforcement learning

Authors: Hiroshi Nakahara, Kazushi Tsutsui, Kazuya Takeda, Keisuke Fujii

Abstract: Analysis of invasive sports such as soccer is challenging because the game situation changes continuously in time and space, and multiple agents individually recognize the game situation and make decisions. Previous studies using deep reinforcement learning have often considered teams as a single agent and valued the teams and players who hold the ball in each discrete event. Then it was challengi… ▽ More Analysis of invasive sports such as soccer is challenging because the game situation changes continuously in time and space, and multiple agents individually recognize the game situation and make decisions. Previous studies using deep reinforcement learning have often considered teams as a single agent and valued the teams and players who hold the ball in each discrete event. Then it was challenging to value the actions of multiple players, including players far from the ball, in a spatiotemporally continuous state space. In this paper, we propose a method of valuing possible actions for on- and off-ball soccer players in a single holistic framework based on multi-agent deep reinforcement learning. We consider a discrete action space in a continuous state space that mimics that of Google research football and leverages supervised learning for actions in reinforcement learning. In the experiment, we analyzed the relationships with conventional indicators, season goals, and game ratings by experts, and showed the effectiveness of the proposed method. Our approach can assess how multiple players move continuously throughout the game, which is difficult to be discretized or labeled but vital for teamwork, scouting, and fan engagement. △ Less

Submitted 1 December, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: 12 pages, 4 figures, published in IEEE Access. The latest version is at https://ieeexplore.ieee.org/document/10328596

arXiv:2305.13030 [pdf, other]

Adaptive action supervision in reinforcement learning from real-world multi-agent demonstrations

Authors: Keisuke Fujii, Kazushi Tsutsui, Atom Scott, Hiroshi Nakahara, Naoya Takeishi, Yoshinobu Kawahara

Abstract: Modeling of real-world biological multi-agents is a fundamental problem in various scientific and engineering fields. Reinforcement learning (RL) is a powerful framework to generate flexible and diverse behaviors in cyberspace; however, when modeling real-world biological multi-agents, there is a domain gap between behaviors in the source (i.e., real-world data) and the target (i.e., cyberspace fo… ▽ More Modeling of real-world biological multi-agents is a fundamental problem in various scientific and engineering fields. Reinforcement learning (RL) is a powerful framework to generate flexible and diverse behaviors in cyberspace; however, when modeling real-world biological multi-agents, there is a domain gap between behaviors in the source (i.e., real-world data) and the target (i.e., cyberspace for RL), and the source environment parameters are usually unknown. In this paper, we propose a method for adaptive action supervision in RL from real-world demonstrations in multi-agent scenarios. We adopt an approach that combines RL and supervised learning by selecting actions of demonstrations in RL based on the minimum distance of dynamic time war** for utilizing the information of the unknown source dynamics. This approach can be easily applied to many existing neural network architectures and provide us with an RL model balanced between reproducibility as imitation and generalization ability to obtain rewards in cyberspace. In the experiments, using chase-and-escape and football tasks with the different dynamics between the unknown source and target environments, we show that our approach achieved a balance between the reproducibility and the generalization ability compared with the baselines. In particular, we used the tracking data of professional football players as expert demonstrations in football and show successful performances despite the larger gap between behaviors in the source and target environments than the chase-and-escape task. △ Less

Submitted 19 December, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: 14 pages, 5 figures, accepted in ICAART 2024 Oral

arXiv:2305.04247 [pdf, other]

doi 10.1007/s11042-023-16362-1

Estimation of control area in badminton doubles with pose information from top and back view drone videos

Authors: Ning Ding, Kazuya Takeda, Wenhui **, Yingjiu Bei, Keisuke Fujii

Abstract: The application of visual tracking to the performance analysis of sports players in dynamic competitions is vital for effective coaching. In doubles matches, coordinated positioning is crucial for maintaining control of the court and minimizing opponents' scoring opportunities. The analysis of such teamwork plays a vital role in understanding the dynamics of the game. However, previous studies hav… ▽ More The application of visual tracking to the performance analysis of sports players in dynamic competitions is vital for effective coaching. In doubles matches, coordinated positioning is crucial for maintaining control of the court and minimizing opponents' scoring opportunities. The analysis of such teamwork plays a vital role in understanding the dynamics of the game. However, previous studies have primarily focused on analyzing and assessing singles players without considering occlusion in broadcast videos. These studies have relied on discrete representations, which involve the analysis and representation of specific actions (e.g., strokes) or events that occur during the game while overlooking the meaningful spatial distribution. In this work, we present the first annotated drone dataset from top and back views in badminton doubles and propose a framework to estimate the control area probability map, which can be used to evaluate teamwork performance. We present an efficient framework of deep neural networks that enables the calculation of full probability surfaces. This framework utilizes the embedding of a Gaussian mixture map of players' positions and employs graph convolution on their poses. In the experiment, we verify our approach by comparing various baselines and discovering the correlations between the score and control area. Additionally, we propose a practical application for assessing optimal positioning to provide instructions during a game. Our approach offers both visual and quantitative evaluations of players' movements, thereby providing valuable insights into doubles teamwork. The dataset and related project code is available at https://github.com/Ning-D/Drone_BD_ControlArea △ Less

Submitted 26 October, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

Comments: 15 pages, 10 figures, to appear in Multimedia Tools and Applications

Journal ref: Multimedia Tools and Applications (2023)

arXiv:2304.13242 [pdf, other]

doi 10.1109/LRA.2023.3291924

Learning to Predict Navigational Patterns from Partial Observations

Authors: Robin Karlsson, Alexander Carballo, Francisco Lepe-Salazar, Keisuke Fujii, Kento Ohtani, Kazuya Takeda

Abstract: Human beings cooperatively navigate rule-constrained environments by adhering to mutually known navigational patterns, which may be represented as directional pathways or road lanes. Inferring these navigational patterns from incompletely observed environments is required for intelligent mobile robots operating in unmapped locations. However, algorithmically defining these navigational patterns is… ▽ More Human beings cooperatively navigate rule-constrained environments by adhering to mutually known navigational patterns, which may be represented as directional pathways or road lanes. Inferring these navigational patterns from incompletely observed environments is required for intelligent mobile robots operating in unmapped locations. However, algorithmically defining these navigational patterns is nontrivial. This paper presents the first self-supervised learning (SSL) method for learning to infer navigational patterns in real-world environments from partial observations only. We explain how geometric data augmentation, predictive world modeling, and an information-theoretic regularizer enables our model to predict an unbiased local directional soft lane probability (DSLP) field in the limit of infinite data. We demonstrate how to infer global navigational patterns by fitting a maximum likelihood graph to the DSLP field. Experiments show that our SSL model outperforms two SOTA supervised lane graph prediction models on the nuScenes dataset. We propose our SSL method as a scalable and interpretable continual learning paradigm for navigation by perception. Code is available at https://github.com/robin-karlsson0/dslp. △ Less

Submitted 5 July, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: Accepted to IEEE Robotics and Automation Letters (RA-L) 2023

ACM Class: I.2.10; I.2.9

arXiv:2304.05005 [pdf, ps, other]

Bayes correlated equilibria and no-regret dynamics

Authors: Kaito Fujii

Abstract: This paper explores equilibrium concepts for Bayesian games, which are fundamental models of games with incomplete information. We aim at three desirable properties of equilibria. First, equilibria can be naturally realized by introducing a mediator into games. Second, an equilibrium can be computed efficiently in a distributed fashion. Third, any equilibrium in that class approximately maximizes… ▽ More This paper explores equilibrium concepts for Bayesian games, which are fundamental models of games with incomplete information. We aim at three desirable properties of equilibria. First, equilibria can be naturally realized by introducing a mediator into games. Second, an equilibrium can be computed efficiently in a distributed fashion. Third, any equilibrium in that class approximately maximizes social welfare, as measured by the price of anarchy, for a broad class of games. These three properties allow players to compute an equilibrium and realize it via a mediator, thereby settling into a stable state with approximately optimal social welfare. Our main result is the existence of an equilibrium concept that satisfies these three properties. Toward this goal, we characterize various (non-equivalent) extensions of correlated equilibria, collectively known as Bayes correlated equilibria. In particular, we focus on communication equilibria (also known as coordination mechanisms), which can be realized by a mediator who gathers each player's private information and then sends correlated recommendations to the players. We show that if each player minimizes a variant of regret called untruthful swap regret in repeated play of Bayesian games, the empirical distribution of these dynamics converges to a communication equilibrium. We present an efficient algorithm for minimizing untruthful swap regret with a sublinear upper bound, which we prove to be tight up to a multiplicative constant. As a result, by simulating the dynamics with our algorithm, we can efficiently compute an approximate communication equilibrium. Furthermore, we extend existing lower bounds on the price of anarchy based on the smoothness arguments from Bayes Nash equilibria to equilibria obtained by the proposed dynamics. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2302.09276 [pdf, other]

Transformer-Based Neural Marked Spatio Temporal Point Process Model for Football Match Events Analysis

Authors: Calvin C. K. Yeung, Tony Sit, Keisuke Fujii

Abstract: With recently available football match event data that record the details of football matches, analysts and researchers have a great opportunity to develop new performance metrics, gain insight, and evaluate key performance. However, most sports sequential events modeling methods and performance metrics approaches could be incomprehensive in dealing with such large-scale spatiotemporal data (in pa… ▽ More With recently available football match event data that record the details of football matches, analysts and researchers have a great opportunity to develop new performance metrics, gain insight, and evaluate key performance. However, most sports sequential events modeling methods and performance metrics approaches could be incomprehensive in dealing with such large-scale spatiotemporal data (in particular, temporal process), thereby necessitating a more comprehensive spatiotemporal model and a holistic performance metric. To this end, we proposed the Transformer-Based Neural Marked Spatio Temporal Point Process (NMSTPP) model for football event data based on the neural temporal point processes (NTPP) framework. In the experiments, our model outperformed the prediction performance of the baseline models. Furthermore, we proposed the holistic possession utilization score (HPUS) metric for a more comprehensive football possession analysis. For verification, we examined the relationship with football teams' final ranking, average goal score, and average xG over a season. It was observed that the average HPUS showed significant correlations regardless of not using goal and details of shot information. Furthermore, we show HPUS examples in analyzing possessions, matches, and between matches. △ Less

Submitted 18 February, 2023; originally announced February 2023.

Comments: 36 pages, 25 figures

arXiv:2301.04783 [pdf, other]

Predictive World Models from Real-World Partial Observations

Authors: Robin Karlsson, Alexander Carballo, Keisuke Fujii, Kento Ohtani, Kazuya Takeda

Abstract: Cognitive scientists believe adaptable intelligent agents like humans perform reasoning through learned causal mental simulations of agents and environments. The problem of learning such simulations is called predictive world modeling. Recently, reinforcement learning (RL) agents leveraging world models have achieved SOTA performance in game environments. However, understanding how to apply the wo… ▽ More Cognitive scientists believe adaptable intelligent agents like humans perform reasoning through learned causal mental simulations of agents and environments. The problem of learning such simulations is called predictive world modeling. Recently, reinforcement learning (RL) agents leveraging world models have achieved SOTA performance in game environments. However, understanding how to apply the world modeling approach in complex real-world environments relevant to mobile robots remains an open question. In this paper, we present a framework for learning a probabilistic predictive world model for real-world road environments. We implement the model using a hierarchical VAE (HVAE) capable of predicting a diverse set of fully observed plausible worlds from accumulated sensor observations. While prior HVAE methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only. We experimentally demonstrate accurate spatial structure prediction of deterministic regions achieving 96.21 IoU, and close the gap to perfect prediction by 62% for stochastic regions using the best prediction. By extending HVAEs to cases where complete ground truth states do not exist, we facilitate continual learning of spatial prediction as a step towards realizing explainable and comprehensive predictive world models for real-world mobile robotics applications. Code is available at https://github.com/robin-karlsson0/predictive-world-models. △ Less

Submitted 23 May, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

Comments: Best Paper Award at IEEE MOST 2023

ACM Class: I.2.10; I.2.9

arXiv:2212.00021 [pdf, other]

Location analysis of players in UEFA EURO 2020 and 2022 using generalized valuation of defense by estimating probabilities

Authors: Rikuhei Umemoto, Kazushi Tsutsui, Keisuke Fujii

Abstract: Analyzing defenses in team sports is generally challenging because of the limited event data. Researchers have previously proposed methods to evaluate football team defense by predicting the events of ball gain and being attacked using locations of all players and the ball. However, they did not consider the importance of the events, assumed the perfect observation of all 22 players, and did not f… ▽ More Analyzing defenses in team sports is generally challenging because of the limited event data. Researchers have previously proposed methods to evaluate football team defense by predicting the events of ball gain and being attacked using locations of all players and the ball. However, they did not consider the importance of the events, assumed the perfect observation of all 22 players, and did not fully investigated the influence of the diversity (e.g., nationality and sex). Here, we propose a generalized valuation method of defensive teams by score-scaling the predicted probabilities of the events. Using the open-source location data of all players in broadcast video frames in football games of men's Euro 2020 and women's Euro 2022, we investigated the effect of the number of players on the prediction and validated our approach by analyzing the games. Results show that for the predictions of being attacked, scoring, and conceding, all players' information was not necessary, while that of ball gain required information on three to four offensive and defensive players. With game analyses we explained the excellence in defense of finalist teams in Euro 2020. Our approach might be applicable to location data from broadcast video frames in football games. △ Less

Submitted 30 November, 2022; originally announced December 2022.

Comments: 16 pages, 8 figures

arXiv:2210.09173 [pdf, other]

Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

Authors: Hien Ohnaka, Shinnosuke Takamichi, Keisuke Imoto, Yuki Okamoto, Kazuki Fujii, Hiroshi Saruwatari

Abstract: We propose a method for synthesizing environmental sounds from visually represented onomatopoeias and sound sources. An onomatopoeia is a word that imitates a sound structure, i.e., the text representation of sound. From this perspective, onoma-to-wave has been proposed to synthesize environmental sounds from the desired onomatopoeia texts. Onomatopoeias have another representation: visual-text re… ▽ More We propose a method for synthesizing environmental sounds from visually represented onomatopoeias and sound sources. An onomatopoeia is a word that imitates a sound structure, i.e., the text representation of sound. From this perspective, onoma-to-wave has been proposed to synthesize environmental sounds from the desired onomatopoeia texts. Onomatopoeias have another representation: visual-text representations of sounds in comics, advertisements, and virtual reality. A visual onomatopoeia (visual text of onomatopoeia) contains rich information that is not present in the text, such as a long-short duration of the image, so the use of this representation is expected to synthesize diverse sounds. Therefore, we propose visual onoma-to-wave for environmental sound synthesis from visual onomatopoeia. The method can transfer visual concepts of the visual text and sound-source image to the synthesized sound. We also propose a data augmentation method focusing on the repetition of onomatopoeias to enhance the performance of our method. An experimental evaluation shows that the methods can synthesize diverse environmental sounds from visual text and sound-source images. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: Submitted to ICASSP 2023

arXiv:2208.12646 [pdf, other]

Automatic detection of faults in race walking from a smartphone camera: a comparison of an Olympic medalist and university athletes

Authors: Tomohiro Suzuki, Kazuya Takeda, Keisuke Fujii

Abstract: Automatic fault detection is a major challenge in many sports. In race walking, referees visually judge faults according to the rules. Hence, ensuring objectivity and fairness while judging is important. To address this issue, some studies have attempted to use sensors and machine learning to automatically detect faults. However, there are problems associated with sensor attachments and equipment… ▽ More Automatic fault detection is a major challenge in many sports. In race walking, referees visually judge faults according to the rules. Hence, ensuring objectivity and fairness while judging is important. To address this issue, some studies have attempted to use sensors and machine learning to automatically detect faults. However, there are problems associated with sensor attachments and equipment such as a high-speed camera, which conflict with the visual judgement of referees, and the interpretability of the fault detection models. In this study, we proposed a fault detection system for non-contact measurement. We used pose estimation and machine learning models trained based on the judgements of multiple qualified referees to realize fair fault judgement. We verified them using smartphone videos of normal race walking and walking with intentional faults in several athletes including the medalist of the Tokyo Olympics. The validation results show that the proposed system detected faults with an average accuracy of over 90%. We also revealed that the machine learning model detects faults according to the rules of race walking. In addition, the intentional faulty walking movement of the medalist was different from that of university walkers. This finding informs realization of a more general fault detection model. The code and data are available at https://github.com/SZucchini/racewalk-aijudge. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: 16 pages, 9 figures

arXiv:2206.05947 [pdf, other]

Lazy and Fast Greedy MAP Inference for Determinantal Point Process

Authors: Shinichi Hemmi, Taihei Oki, Shinsaku Sakaue, Kaito Fujii, Satoru Iwata

Abstract: The maximum a posteriori (MAP) inference for determinantal point processes (DPPs) is crucial for selecting diverse items in many machine learning applications. Although DPP MAP inference is NP-hard, the greedy algorithm often finds high-quality solutions, and many researchers have studied its efficient implementation. One classical and practical method is the lazy greedy algorithm, which is applic… ▽ More The maximum a posteriori (MAP) inference for determinantal point processes (DPPs) is crucial for selecting diverse items in many machine learning applications. Although DPP MAP inference is NP-hard, the greedy algorithm often finds high-quality solutions, and many researchers have studied its efficient implementation. One classical and practical method is the lazy greedy algorithm, which is applicable to general submodular function maximization, while a recent fast greedy algorithm based on the Cholesky factorization is more efficient for DPP MAP inference. This paper presents how to combine the ideas of "lazy" and "fast", which have been considered incompatible in the literature. Our lazy and fast greedy algorithm achieves almost the same time complexity as the current best one and runs faster in practice. The idea of "lazy + fast" is extendable to other greedy-type algorithms. We also give a fast version of the double greedy algorithm for unconstrained DPP MAP inference. Experiments validate the effectiveness of our acceleration ideas. △ Less

Submitted 13 June, 2022; originally announced June 2022.

arXiv:2206.01900 [pdf, other]

Estimating counterfactual treatment outcomes over time in complex multiagent scenarios

Authors: Keisuke Fujii, Koh Takeuchi, Atsushi Kuribayashi, Naoya Takeishi, Yoshinobu Kawahara, Kazuya Takeda

Abstract: Evaluation of intervention in a multiagent system, e.g., when humans should intervene in autonomous driving systems and when a player should pass to teammates for a good shot, is challenging in various engineering and scientific fields. Estimating the individual treatment effect (ITE) using counterfactual long-term prediction is practical to evaluate such interventions. However, most of the conven… ▽ More Evaluation of intervention in a multiagent system, e.g., when humans should intervene in autonomous driving systems and when a player should pass to teammates for a good shot, is challenging in various engineering and scientific fields. Estimating the individual treatment effect (ITE) using counterfactual long-term prediction is practical to evaluate such interventions. However, most of the conventional frameworks did not consider the time-varying complex structure of multiagent relationships and covariate counterfactual prediction. This may lead to erroneous assessments of ITE and difficulty in interpretation. Here we propose an interpretable, counterfactual recurrent network in multiagent systems to estimate the effect of the intervention. Our model leverages graph variational recurrent neural networks and theory-based computation with domain knowledge for the ITE estimation framework based on long-term prediction of multiagent covariates and outcomes, which can confirm the circumstances under which the intervention is effective. On simulated models of an automated vehicle and biological agents with time-varying confounders, we show that our methods achieved lower estimation errors in counterfactual covariates and the most effective treatment timing than the baselines. Furthermore, using real basketball data, our methods performed realistic counterfactual predictions and evaluated the counterfactual passes in shot scenarios. △ Less

Submitted 17 February, 2024; v1 submitted 4 June, 2022; originally announced June 2022.

Comments: 16 pages, 11 figures. Accepted in IEEE Transactions on Neural Networks and Learning Systems

arXiv:2206.01899 [pdf, other]

Evaluation of creating scoring opportunities for teammates in soccer via trajectory prediction

Authors: Masakiyo Teranishi, Kazushi Tsutsui, Kazuya Takeda, Keisuke Fujii

Abstract: Evaluating the individual movements for teammates in soccer players is crucial for assessing teamwork, scouting, and fan engagement. It has been said that players in a 90-min game do not have the ball for about 87 minutes on average. However, it has remained difficult to evaluate an attacking player without receiving the ball, and to reveal how movement contributes to the creation of scoring oppor… ▽ More Evaluating the individual movements for teammates in soccer players is crucial for assessing teamwork, scouting, and fan engagement. It has been said that players in a 90-min game do not have the ball for about 87 minutes on average. However, it has remained difficult to evaluate an attacking player without receiving the ball, and to reveal how movement contributes to the creation of scoring opportunities for teammates. In this paper, we evaluate players who create off-ball scoring opportunities by comparing actual movements with the reference movements generated via trajectory prediction. First, we predict the trajectories of players using a graph variational recurrent neural network that can accurately model the relationship between players and predict the long-term trajectory. Next, based on the difference in the modified off-ball evaluation index between the actual and the predicted trajectory as a reference, we evaluate how the actual movement contributes to scoring opportunity compared to the predicted movement. For verification, we examined the relationship with the annual salary, the goals, and the rating in the game by experts for all games of a team in a professional soccer league in a year. The results show that the annual salary and the proposed indicator correlated significantly, which could not be explained by the existing indicators and goals. Our results suggest the effectiveness of the proposed method as an indicator for a player without the ball to create a scoring chance for teammates. △ Less

Submitted 26 July, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: 22 pages, 8 figures, accepted in 9th Workshop on Machine Learning and Data Mining for Sports Analytics 2022 (MLSA'22) co-located with ECML-PKDD'22

arXiv:2206.01871 [pdf, other]

Estimating the Effect of Team Hitting Strategies Using Counterfactual Virtual Simulation in Baseball

Authors: Hiroshi Nakahara, Kazuya Takeda, Keisuke Fujii

Abstract: In baseball, every play on the field is quantitatively evaluated and has an effect on individual and team strategies. The weighted on base average (wOBA) is well known as a measure of an batter's hitting contribution. However, this measure ignores the game situation, such as the runners on base, which coaches and batters are known to consider when employing multiple hitting strategies, yet, the ef… ▽ More In baseball, every play on the field is quantitatively evaluated and has an effect on individual and team strategies. The weighted on base average (wOBA) is well known as a measure of an batter's hitting contribution. However, this measure ignores the game situation, such as the runners on base, which coaches and batters are known to consider when employing multiple hitting strategies, yet, the effectiveness of these strategies is unknown. This is probably because (1) we cannot obtain the batter's strategy and (2) it is difficult to estimate the effect of the strategies. Here, we propose a new method for estimating the effect using counterfactual batting simulation. To this end, we propose a deep learning model that transforms batting ability when batting strategy is changed. This method can estimate the effects of various strategies, which has been traditionally difficult with actual game data. We found that, when the switching cost of batting strategies can be ignored, the use of different strategies increased runs. When the switching cost is considered, the conditions for increasing runs were limited. Our validation results suggest that our simulation could clarify the effect of using multiple batting strategies. △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: 14 pages, 6 figures

arXiv:2202.04238 [pdf, other]

Parametric t-Stochastic Neighbor Embedding With Quantum Neural Network

Authors: Yoshiaki Kawase, Kosuke Mitarai, Keisuke Fujii

Abstract: t-Stochastic Neighbor Embedding (t-SNE) is a non-parametric data visualization method in classical machine learning. It maps the data from the high-dimensional space into a low-dimensional space, especially a two-dimensional plane, while maintaining the relationship, or similarities, between the surrounding points. In t-SNE, the initial position of the low-dimensional data is randomly determined,… ▽ More t-Stochastic Neighbor Embedding (t-SNE) is a non-parametric data visualization method in classical machine learning. It maps the data from the high-dimensional space into a low-dimensional space, especially a two-dimensional plane, while maintaining the relationship, or similarities, between the surrounding points. In t-SNE, the initial position of the low-dimensional data is randomly determined, and the visualization is achieved by moving the low-dimensional data to minimize a cost function. Its variant called parametric t-SNE uses neural networks for this map**. In this paper, we propose to use quantum neural networks for parametric t-SNE to reflect the characteristics of high-dimensional quantum data on low-dimensional data. We use fidelity-based metrics instead of Euclidean distance in calculating high-dimensional data similarity. We visualize both classical (Iris dataset) and quantum (time-depending Hamiltonian dynamics) data for classification tasks. Since this method allows us to represent a quantum dataset in a higher dimensional Hilbert space by a quantum dataset in a lower dimension while kee** their similarity, the proposed method can also be used to compress quantum data for further quantum machine learning. △ Less

Submitted 8 February, 2022; originally announced February 2022.

Comments: 9 pages, 7 figures

arXiv:2112.06282 [pdf, ps, other]

Algorithmic Bayesian persuasion with combinatorial actions

Authors: Kaito Fujii, Shinsaku Sakaue

Abstract: Bayesian persuasion is a model for understanding strategic information revelation: an agent with an informational advantage, called a sender, strategically discloses information by sending signals to another agent, called a receiver. In algorithmic Bayesian persuasion, we are interested in efficiently designing the sender's signaling schemes that lead the receiver to take action in favor of the se… ▽ More Bayesian persuasion is a model for understanding strategic information revelation: an agent with an informational advantage, called a sender, strategically discloses information by sending signals to another agent, called a receiver. In algorithmic Bayesian persuasion, we are interested in efficiently designing the sender's signaling schemes that lead the receiver to take action in favor of the sender. This paper studies algorithmic Bayesian-persuasion settings where the receiver's feasible actions are specified by combinatorial constraints, e.g., matroids or paths in graphs. We first show that constant-factor approximation is NP-hard even in some special cases of matroids or paths. We then propose a polynomial-time algorithm for general matroids by assuming the number of states of nature to be a constant. We finally consider a relaxed notion of persuasiveness, called CCE-persuasiveness, and present a sufficient condition for polynomial-time approximability. △ Less

Submitted 12 December, 2021; originally announced December 2021.

arXiv:2111.12460 [pdf, other]

ViCE: Improving Dense Representation Learning by Superpixelization and Contrasting Cluster Assignment

Authors: Robin Karlsson, Tomoki Hayashi, Keisuke Fujii, Alexander Carballo, Kento Ohtani, Kazuya Takeda

Abstract: Recent self-supervised models have demonstrated equal or better performance than supervised methods, opening for AI systems to learn visual representations from practically unlimited data. However, these methods are typically classification-based and thus ineffective for learning high-resolution feature maps that preserve precise spatial information. This work introduces superpixels to improve sel… ▽ More Recent self-supervised models have demonstrated equal or better performance than supervised methods, opening for AI systems to learn visual representations from practically unlimited data. However, these methods are typically classification-based and thus ineffective for learning high-resolution feature maps that preserve precise spatial information. This work introduces superpixels to improve self-supervised learning of dense semantically rich visual concept embeddings. Decomposing images into a small set of visually coherent regions reduces the computational complexity by $\mathcal{O}(1000)$ while preserving detail. We experimentally show that contrasting over regions improves the effectiveness of contrastive learning methods, extends their applicability to high-resolution images, improves overclustering performance, superpixels are better than grids, and regional masking improves performance. The expressiveness of our dense embeddings is demonstrated by improving the SOTA unsupervised semantic segmentation benchmark on Cityscapes, and for convolutional models on COCO. △ Less

Submitted 7 October, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

Comments: Accepted for BMVC 2022

ACM Class: I.2.10; I.2.9

arXiv:2111.12340 [pdf, other]

How does AI play football? An analysis of RL and real-world football strategies

Authors: Atom Scott, Keisuke Fujii, Masaki Onishi

Abstract: Recent advances in reinforcement learning (RL) have made it possible to develop sophisticated agents that excel in a wide range of applications. Simulations using such agents can provide valuable information in scenarios that are difficult to scientifically experiment in the real world. In this paper, we examine the play-style characteristics of football RL agents and uncover how strategies may de… ▽ More Recent advances in reinforcement learning (RL) have made it possible to develop sophisticated agents that excel in a wide range of applications. Simulations using such agents can provide valuable information in scenarios that are difficult to scientifically experiment in the real world. In this paper, we examine the play-style characteristics of football RL agents and uncover how strategies may develop during training. The learnt strategies are then compared with those of real football players. We explore what can be learnt from the use of simulated environments by using aggregated statistics and social network analysis (SNA). As a result, we found that (1) there are strong correlations between the competitiveness of an agent and various SNA metrics and (2) aspects of the RL agents play style become similar to real world footballers as the agent becomes more competitive. We discuss further advances that may be necessary to improve our understanding necessary to fully utilise RL for the analysis of football. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: 11 pages, 7 figures; accepted as a full paper for a 25 minutes oral presentation at ICAART 2022 (URL will be updated when available)

arXiv:2109.05751 [pdf, other]

doi 10.1109/ACCESS.2022.3180344

Adversarially Trained Object Detector for Unsupervised Domain Adaptation

Authors: Kazuma Fujii, Hiroshi Kera, Kazuhiko Kawamoto

Abstract: Unsupervised domain adaptation, which involves transferring knowledge from a label-rich source domain to an unlabeled target domain, can be used to substantially reduce annotation costs in the field of object detection. In this study, we demonstrate that adversarial training in the source domain can be employed as a new approach for unsupervised domain adaptation. Specifically, we establish that a… ▽ More Unsupervised domain adaptation, which involves transferring knowledge from a label-rich source domain to an unlabeled target domain, can be used to substantially reduce annotation costs in the field of object detection. In this study, we demonstrate that adversarial training in the source domain can be employed as a new approach for unsupervised domain adaptation. Specifically, we establish that adversarially trained detectors achieve improved detection performance in target domains that are significantly shifted from source domains. This phenomenon is attributed to the fact that adversarially trained detectors can be used to extract robust features that are in alignment with human perception and worth transferring across domains while discarding domain-specific non-robust features. In addition, we propose a method that combines adversarial training and feature alignment to ensure the improved alignment of robust features with the target domain. We conduct experiments on four benchmark datasets and confirm the effectiveness of our proposed approach on large domain shifts from real to artistic images. Compared to the baseline models, the adversarially trained detectors improve the mean average precision by up to 7.7%, and further by up to 11.8% when feature alignments are incorporated. Although our method degrades performance for small domain shifts, quantification of the domain shift based on the Frechet distance allows us to determine whether adversarial training should be conducted. △ Less

Submitted 25 November, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: 10 pages, 6 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2107.10764 [pdf, other]

Nonlinear transformation of complex amplitudes via quantum singular value transformation

Authors: Naixu Guo, Kosuke Mitarai, Keisuke Fujii

Abstract: Due to the linearity of quantum operations, it is not straightforward to implement nonlinear transformations on a quantum computer, making some practical tasks like a neural network hard to be achieved. In this work, we define a task called nonlinear transformation of complex amplitudes and provide an algorithm to achieve this task. Specifically, we construct a block-encoding of complex amplitudes… ▽ More Due to the linearity of quantum operations, it is not straightforward to implement nonlinear transformations on a quantum computer, making some practical tasks like a neural network hard to be achieved. In this work, we define a task called nonlinear transformation of complex amplitudes and provide an algorithm to achieve this task. Specifically, we construct a block-encoding of complex amplitudes from a state preparation unitary. This allows us to transform the complex amplitudes by using quantum singular value transformation. We evaluate the required overhead in terms of input dimension and precision, which reveals that the algorithm depends on the roughly square root of input dimension and achieves an exponential speedup on precision compared with previous work. We also discuss its possible applications to quantum machine learning, where complex amplitudes encoding classical or quantum data are processed by the proposed method. This paper provides a promising way to introduce highly complex nonlinearity of the quantum states, which is essentially missing in quantum mechanics. △ Less

Submitted 17 May, 2024; v1 submitted 22 July, 2021; originally announced July 2021.

arXiv:2107.09289 [pdf, other]

Cell Detection from Imperfect Annotation by Pseudo Label Selection Using P-classification

Authors: Kazuma Fujii, Daiki Suehiro, Kazuya Nishimura, Ryoma Bise

Abstract: Cell detection is an essential task in cell image analysis. Recent deep learning-based detection methods have achieved very promising results. In general, these methods require exhaustively annotating the cells in an entire image. If some of the cells are not annotated (imperfect annotation), the detection performance significantly degrades due to noisy labels. This often occurs in real collaborat… ▽ More Cell detection is an essential task in cell image analysis. Recent deep learning-based detection methods have achieved very promising results. In general, these methods require exhaustively annotating the cells in an entire image. If some of the cells are not annotated (imperfect annotation), the detection performance significantly degrades due to noisy labels. This often occurs in real collaborations with biologists and even in public data-sets. Our proposed method takes a pseudo labeling approach for cell detection from imperfect annotated data. A detection convolutional neural network (CNN) trained using such missing labeled data often produces over-detection. We treat partially labeled cells as positive samples and the detected positions except for the labeled cell as unlabeled samples. Then we select reliable pseudo labels from unlabeled data using recent machine learning techniques; positive-and-unlabeled (PU) learning and P-classification. Experiments using microscopy images for five different conditions demonstrate the effectiveness of the proposed method. △ Less

Submitted 21 July, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

Comments: 10 pages, 3 figures, Accepted in MICCAI2021

arXiv:2107.05326 [pdf, other]

Learning interaction rules from multi-animal trajectories via augmented behavioral models

Authors: Keisuke Fujii, Naoya Takeishi, Kazushi Tsutsui, Emyo Fujioka, Nozomi Nishiumi, Ryoya Tanaka, Mika Fukushiro, Kaoru Ide, Hiroyoshi Kohno, Ken Yoda, Susumu Takahashi, Shizuko Hiryu, Yoshinobu Kawahara

Abstract: Extracting the interaction rules of biological agents from movement sequences pose challenges in various domains. Granger causality is a practical framework for analyzing the interactions from observed time-series data; however, this framework ignores the structures and assumptions of the generative process in animal behaviors, which may lead to interpretational problems and sometimes erroneous as… ▽ More Extracting the interaction rules of biological agents from movement sequences pose challenges in various domains. Granger causality is a practical framework for analyzing the interactions from observed time-series data; however, this framework ignores the structures and assumptions of the generative process in animal behaviors, which may lead to interpretational problems and sometimes erroneous assessments of causality. In this paper, we propose a new framework for learning Granger causality from multi-animal trajectories via augmented theory-based behavioral models with interpretable data-driven models. We adopt an approach for augmenting incomplete multi-agent behavioral models described by time-varying dynamical systems with neural networks. For efficient and interpretable learning, our model leverages theory-based architectures separating navigation and motion processes, and the theory-guided regularization for reliable behavioral modeling. This can provide interpretable signs of Granger-causal effects over time, i.e., when specific others cause the approach or separation. In experiments using synthetic datasets, our method achieved better performance than various baselines. We then analyzed multi-animal datasets of mice, flies, birds, and bats, which verified our method and obtained novel biological insights. △ Less

Submitted 25 October, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

Comments: 24 pages, 5 figures, to appear in NeurIPS 2021

arXiv:2103.09627 [pdf, other]

doi 10.1371/journal.pone.0263051

Evaluation of soccer team defense based on prediction models of ball recovery and being attacked: A pilot study

Authors: Kosuke Toda, Masakiyo Teranishi, Keisuke Kushiro, Keisuke Fujii

Abstract: With the development of measurement technology, data on the movements of actual games in various sports can be obtained and used for planning and evaluating the tactics and strategy. Defense in team sports is generally difficult to be evaluated because of the lack of statistical data. Conventional evaluation methods based on predictions of scores are considered unreliable because they predict rare… ▽ More With the development of measurement technology, data on the movements of actual games in various sports can be obtained and used for planning and evaluating the tactics and strategy. Defense in team sports is generally difficult to be evaluated because of the lack of statistical data. Conventional evaluation methods based on predictions of scores are considered unreliable because they predict rare events throughout the game. Besides, it is difficult to evaluate various plays leading up to a score. In this study, we propose a method to evaluate team defense from a comprehensive perspective related to team performance by predicting ball recovery and being attacked, which occur more frequently than goals, using player actions and positional data of all players and the ball. Using data from 45 soccer matches, we examined the relationship between the proposed index and team performance in actual matches and throughout a season. Results show that the proposed classifiers predicted the true events (mean F1 score $>$ 0.483) better than the existing classifiers which were based on rare events or goals (mean F1 score $<$ 0.201). Also, the proposed index had a moderate correlation with the long-term outcomes of the season ($r =$ 0.397). These results suggest that the proposed index might be a more reliable indicator rather than winning or losing with the inclusion of accidental factors. △ Less

Submitted 7 May, 2022; v1 submitted 17 March, 2021; originally announced March 2021.

Comments: 15 pages, 5 figures

Journal ref: PLoS One, 17(1) e0263051, 2022

arXiv:2103.09211 [pdf, other]

doi 10.22331/q-2022-01-24-631

Computational power of one- and two-dimensional dual-unitary quantum circuits

Authors: Ryotaro Suzuki, Kosuke Mitarai, Keisuke Fujii

Abstract: Quantum circuits that are classically simulatable tell us when quantum computation becomes less powerful than or equivalent to classical computation. Such classically simulatable circuits are of importance because they illustrate what makes universal quantum computation different from classical computers. In this work, we propose a novel family of classically simulatable circuits by making use of… ▽ More Quantum circuits that are classically simulatable tell us when quantum computation becomes less powerful than or equivalent to classical computation. Such classically simulatable circuits are of importance because they illustrate what makes universal quantum computation different from classical computers. In this work, we propose a novel family of classically simulatable circuits by making use of dual-unitary quantum circuits (DUQCs), which have been recently investigated as exactly solvable models of non-equilibrium physics, and we characterize their computational power. Specifically, we investigate the computational complexity of the problem of calculating local expectation values and the sampling problem of one-dimensional DUQCs, and we generalize them to two spatial dimensions. We reveal that a local expectation value of a DUQC is classically simulatable at an early time, which is linear in a system length. In contrast, in a late time, they can perform universal quantum computation, and the problem becomes a BQP-complete problem. Moreover, classical simulation of sampling from a DUQC turns out to be hard. △ Less

Submitted 18 January, 2022; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: 14 pages, 6 figures

Journal ref: Quantum 6, 631 (2022)

arXiv:2102.09973 [pdf, other]

Discriminant Dynamic Mode Decomposition for Labeled Spatio-Temporal Data Collections

Authors: Naoya Takeishi, Keisuke Fujii, Koh Takeuchi, Yoshinobu Kawahara

Abstract: Extracting coherent patterns is one of the standard approaches towards understanding spatio-temporal data. Dynamic mode decomposition (DMD) is a powerful tool for extracting coherent patterns, but the original DMD and most of its variants do not consider label information, which is often available as side information of spatio-temporal data. In this work, we propose a new method for extracting dis… ▽ More Extracting coherent patterns is one of the standard approaches towards understanding spatio-temporal data. Dynamic mode decomposition (DMD) is a powerful tool for extracting coherent patterns, but the original DMD and most of its variants do not consider label information, which is often available as side information of spatio-temporal data. In this work, we propose a new method for extracting distinctive coherent patterns from labeled spatio-temporal data collections, such that they contribute to major differences in a labeled set of dynamics. We achieve such pattern extraction by incorporating discriminant analysis into DMD. To this end, we define a kernel function on subspaces spanned by sets of dynamic modes and develop an objective to take both reconstruction goodness as DMD and class-separation goodness as discriminant analysis into account. We illustrate our method using a synthetic dataset and several real-world datasets. The proposed method can be a useful tool for exploratory data analysis for understanding spatio-temporal data. △ Less

Submitted 19 February, 2021; originally announced February 2021.

arXiv:2102.07545 [pdf, ps, other]

doi 10.20965/jrm.2021.p0505

Data-driven Analysis for Understanding Team Sports Behaviors

Authors: Keisuke Fujii

Abstract: Understanding the principles of real-world biological multi-agent behaviors is a current challenge in various scientific and engineering fields. The rules regarding the real-world biological multi-agent behaviors such as team sports are often largely unknown due to their inherently higher-order interactions, cognition, and body dynamics. Estimation of the rules from data, i.e., data-driven approac… ▽ More Understanding the principles of real-world biological multi-agent behaviors is a current challenge in various scientific and engineering fields. The rules regarding the real-world biological multi-agent behaviors such as team sports are often largely unknown due to their inherently higher-order interactions, cognition, and body dynamics. Estimation of the rules from data, i.e., data-driven approaches such as machine learning, provides an effective way for the analysis of such behaviors. Although most data-driven models have non-linear structures and high prediction performances, it is sometimes hard to interpret them. This survey focuses on data-driven analysis for quantitative understanding of invasion team sports behaviors such as basketball and football, and introduces two main approaches for understanding such multi-agent behaviors: (1) extracting easily interpretable features or rules from data and (2) generating and controlling behaviors in visually-understandable ways. The first approach involves the visualization of learned representations and the extraction of mathematical structures behind the behaviors. The second approach can be used to test hypotheses by simulating and controlling future and counterfactual behaviors. Lastly, the potential practical applications of extracted rules, features, and generated behaviors are discussed. These approaches can contribute to a better understanding of multi-agent behaviors in the real world. △ Less

Submitted 28 February, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

Comments: 9 pages, 2 figures. This is the first draft and the final version will be published in the Journal of Robotics and Mechatronics

Journal ref: J. Robot. Mechatron., Vol.33, No.3, pp. 505-514, 2021

arXiv:2102.04051 [pdf, other]

HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception

Authors: Yota Ueda, Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari

Abstract: We propose a conditional generative adversarial network (GAN) incorporating humans' perceptual evaluations. A deep neural network (DNN)-based generator of a GAN can represent a real-data distribution accurately but can never represent a human-acceptable distribution, which are ranges of data in which humans accept the naturalness regardless of whether the data are real or not. A HumanGAN was propo… ▽ More We propose a conditional generative adversarial network (GAN) incorporating humans' perceptual evaluations. A deep neural network (DNN)-based generator of a GAN can represent a real-data distribution accurately but can never represent a human-acceptable distribution, which are ranges of data in which humans accept the naturalness regardless of whether the data are real or not. A HumanGAN was proposed to model the human-acceptable distribution. A DNN-based generator is trained using a human-based discriminator, i.e., humans' perceptual evaluations, instead of the GAN's DNN-based discriminator. However, the HumanGAN cannot represent conditional distributions. This paper proposes the HumanACGAN, a theoretical extension of the HumanGAN, to deal with conditional human-acceptable distributions. Our HumanACGAN trains a DNN-based conditional generator by regarding humans as not only a discriminator but also an auxiliary classifier. The generator is trained by deceiving the human-based discriminator that scores the unconditioned naturalness and the human-based classifier that scores the class-conditioned perceptual acceptability. The training can be executed using the backpropagation algorithm involving humans' perceptual evaluations. Our experimental results in phoneme perception demonstrate that our HumanACGAN can successfully train this conditional generator. △ Less

Submitted 8 February, 2021; originally announced February 2021.

Comments: 5 pages, 6 figures, to be published in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2012.09265 [pdf, other]

doi 10.1038/s42254-021-00348-9

Variational Quantum Algorithms

Authors: M. Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C. Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R. McClean, Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, Patrick J. Coles

Abstract: Applications such as simulating complicated quantum systems or solving large-scale linear algebra problems are very challenging for classical computers due to the extremely high computational cost. Quantum computers promise a solution, although fault-tolerant quantum computers will likely not be available in the near future. Current quantum devices have serious constraints, including limited numbe… ▽ More Applications such as simulating complicated quantum systems or solving large-scale linear algebra problems are very challenging for classical computers due to the extremely high computational cost. Quantum computers promise a solution, although fault-tolerant quantum computers will likely not be available in the near future. Current quantum devices have serious constraints, including limited numbers of qubits and noise processes that limit circuit depth. Variational Quantum Algorithms (VQAs), which use a classical optimizer to train a parametrized quantum circuit, have emerged as a leading strategy to address these constraints. VQAs have now been proposed for essentially all applications that researchers have envisioned for quantum computers, and they appear to the best hope for obtaining quantum advantage. Nevertheless, challenges remain including the trainability, accuracy, and efficiency of VQAs. Here we overview the field of VQAs, discuss strategies to overcome their challenges, and highlight the exciting prospects for using them to obtain quantum advantage. △ Less

Submitted 4 October, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

Comments: Review Article. 33 pages, 7 figures. Updated to published version

Report number: LA-UR-20-30142

Journal ref: Nature Reviews Physics 3, 625-644 (2021)

arXiv:2010.15377 [pdf, other]

doi 10.1371/journal.pone.0256329

Supervised sequential pattern mining of event sequences in sport to identify important patterns of play: an application to rugby union

Authors: Rory Bunker, Keisuke Fujii, Hiroyuki Hanada, Ichiro Takeuchi

Abstract: Given a set of sequences comprised of time-ordered events, sequential pattern mining is useful to identify frequent subsequences from different sequences or within the same sequence. However, in sport, these techniques cannot determine the importance of particular patterns of play to good or bad outcomes, which is often of greater interest to coaches and performance analysts. In this study, we app… ▽ More Given a set of sequences comprised of time-ordered events, sequential pattern mining is useful to identify frequent subsequences from different sequences or within the same sequence. However, in sport, these techniques cannot determine the importance of particular patterns of play to good or bad outcomes, which is often of greater interest to coaches and performance analysts. In this study, we apply a recently proposed supervised sequential pattern mining algorithm called safe pattern pruning (SPP) to 490 labelled event sequences representing passages of play from one rugby team's matches from the 2018 Japan Top League. We compare the SPP-obtained patterns that are the most discriminative between scoring and non-scoring outcomes from both the team's and opposition teams' perspectives, with the most frequent patterns obtained with well-known unsupervised sequential pattern mining algorithms when applied to subsets of the original dataset, split on the label. Our obtained results found that linebreaks, successful lineouts, regained kicks in play, repeated phase-breakdown play, and failed exit plays by the opposition team were identified as as the patterns that discriminated most between the team scoring and not scoring. Opposition team linebreaks, errors made by the team, opposition team lineouts, and repeated phase-breakdown play by the opposition team were identified as the patterns that discriminated most between the opposition team scoring and not scoring. It was also found that, by virtue of its supervised nature as well as its pruning and safe-screening properties, SPP obtained a greater variety of generally more sophisticated patterns than the unsupervised models, which are likely to be of more utility to coaches and performance analysts. △ Less

Submitted 18 May, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

Showing 1–50 of 71 results for author: Fujii, K