-
Player Pressure Map -- A Novel Representation of Pressure in Soccer for Evaluating Player Performance in Different Game Contexts
Authors:
Chaoyi Gu,
Jiaming Na,
Yisheng Pei,
Varuna De Silva
Abstract:
In soccer, contextual player performance metrics are invaluable to coaches. For example, the ability to perform under pressure during matches distinguishes the elite from the average. Appropriate pressure metric enables teams to assess players' performance accurately under pressure and design targeted training scenarios to address their weaknesses. The primary objective of this paper is to leverag…
▽ More
In soccer, contextual player performance metrics are invaluable to coaches. For example, the ability to perform under pressure during matches distinguishes the elite from the average. Appropriate pressure metric enables teams to assess players' performance accurately under pressure and design targeted training scenarios to address their weaknesses. The primary objective of this paper is to leverage both tracking and event data and game footage to capture the pressure experienced by the possession team in a soccer game scene. We propose a player pressure map to represent a given game scene, which lowers the dimension of raw data and still contains rich contextual information. Not only does it serve as an effective tool for visualizing and evaluating the pressure on the team and each individual, but it can also be utilized as a backbone for accessing players' performance. Overall, our model provides coaches and analysts with a deeper understanding of players' performance under pressure so that they make data-oriented tactical decisions.
△ Less
Submitted 7 March, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Discovering Causality for Efficient Cooperation in Multi-Agent Environments
Authors:
Rafael Pina,
Varuna De Silva,
Corentin Artaud
Abstract:
In cooperative Multi-Agent Reinforcement Learning (MARL) agents are required to learn behaviours as a team to achieve a common goal. However, while learning a task, some agents may end up learning sub-optimal policies, not contributing to the objective of the team. Such agents are called lazy agents due to their non-cooperative behaviours that may arise from failing to understand whether they caus…
▽ More
In cooperative Multi-Agent Reinforcement Learning (MARL) agents are required to learn behaviours as a team to achieve a common goal. However, while learning a task, some agents may end up learning sub-optimal policies, not contributing to the objective of the team. Such agents are called lazy agents due to their non-cooperative behaviours that may arise from failing to understand whether they caused the rewards. As a consequence, we observe that the emergence of cooperative behaviours is not necessarily a byproduct of being able to solve a task as a team. In this paper, we investigate the applications of causality in MARL and how it can be applied in MARL to penalise these lazy agents. We observe that causality estimations can be used to improve the credit assignment to the agents and show how it can be leveraged to improve independent learning in MARL. Furthermore, we investigate how Amortized Causal Discovery can be used to automate causality detection within MARL environments. The results demonstrate that causality relations between individual observations and the team reward can be used to detect and punish lazy agents, making them develop more intelligent behaviours. This results in improvements not only in the overall performances of the team but also in their individual capabilities. In addition, results show that Amortized Causal Discovery can be used efficiently to find causal relations in MARL.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Deep Generative Multi-Agent Imitation Model as a Computational Benchmark for Evaluating Human Performance in Complex Interactive Tasks: A Case Study in Football
Authors:
Chaoyi Gu,
Varuna De Silva
Abstract:
Evaluating the performance of human is a common need across many applications, such as in engineering and sports. When evaluating human performance in completing complex and interactive tasks, the most common way is to use a metric having been proved efficient for that context, or to use subjective measurement techniques. However, this can be an error prone and unreliable process since static metr…
▽ More
Evaluating the performance of human is a common need across many applications, such as in engineering and sports. When evaluating human performance in completing complex and interactive tasks, the most common way is to use a metric having been proved efficient for that context, or to use subjective measurement techniques. However, this can be an error prone and unreliable process since static metrics cannot capture all the complex contexts associated with such tasks and biases exist in subjective measurement. The objective of our research is to create data-driven AI agents as computational benchmarks to evaluate human performance in solving difficult tasks involving multiple humans and contextual factors. We demonstrate this within the context of football performance analysis. We train a generative model based on Conditional Variational Recurrent Neural Network (VRNN) Model on a large player and ball tracking dataset. The trained model is used to imitate the interactions between two teams and predict the performance from each team. Then the trained Conditional VRNN Model is used as a benchmark to evaluate team performance. The experimental results on Premier League football dataset demonstrates the usefulness of our method to existing state-of-the-art static metric used in football analytics.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
Probabilistic Model Incorporating Auxiliary Covariates to Control FDR
Authors:
Lin Qiu,
Nils Murrugarra-Llerena,
Vítor Silva,
Lin Lin,
Vernon M. Chinchilli
Abstract:
Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring metrics about test-level covariates. This strategy may not be optimal for complex large-scale problems, where indirect relations often exist among test-level covariates and…
▽ More
Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring metrics about test-level covariates. This strategy may not be optimal for complex large-scale problems, where indirect relations often exist among test-level covariates and auxiliary metrics or covariates. We incorporate auxiliary covariates among test-level covariates in a deep Black-Box framework controlling FDR (named as NeurT-FDR) which boosts statistical power and controls FDR for multiple-hypothesis testing. Our method parametrizes the test-level covariates as a neural network and adjusts the auxiliary covariates through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR makes substantially more discoveries in three real datasets compared to competitive baselines.
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling
Authors:
Marília Costa Rosendo Silva,
Felipe Alves Siqueira,
João Pedro Mantovani Tarrega,
João Vitor Pataca Beinotti,
Augusto Sousa Nunes,
Miguel de Mattos Gardini,
Vinícius Adolfo Pereira da Silva,
Nádia Félix Felipe da Silva,
André Carlos Ponce de Leon Ferreira de Carvalho
Abstract:
Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variabi…
▽ More
Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variability depending on the machine learning algorithm. Furthermore, the distortions can be misleading when regarding cluster geometry. Amongst the causes, the presence of outliers and anomalies can be a determining factor. Despite the relevance of initialization and outlier issues for text clustering and topic modeling, the authors did not find an in-depth analysis of them. This survey provides a systematic literature review (2011-2022) of these subareas and proposes a common terminology since similar procedures have different terms. The authors describe research opportunities, trends, and open issues. The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Generative Network-Based Reduced-Order Model for Prediction, Data Assimilation and Uncertainty Quantification
Authors:
Vinicius L. S. Silva,
Claire E. Heaney,
Nenko Nenov,
Christopher C. Pain
Abstract:
We propose a new method in which a generative network (GN) integrate into a reduced-order model (ROM) framework is used to solve inverse problems for partial differential equations (PDE). The aim is to match available measurements and estimate the corresponding uncertainties associated with the states and parameters of a numerical physical simulation. The GN is trained using only unconditional sim…
▽ More
We propose a new method in which a generative network (GN) integrate into a reduced-order model (ROM) framework is used to solve inverse problems for partial differential equations (PDE). The aim is to match available measurements and estimate the corresponding uncertainties associated with the states and parameters of a numerical physical simulation. The GN is trained using only unconditional simulations of the discretized PDE model. We compare the proposed method with the golden standard Markov chain Monte Carlo. We apply the proposed approaches to a spatio-temporal compartmental model in epidemiology. The results show that the proposed GN-based ROM can efficiently quantify uncertainty and accurately match the measurements and the golden standard, using only a few unconditional simulations of the full-order numerical PDE model.
△ Less
Submitted 5 September, 2023; v1 submitted 28 May, 2021;
originally announced May 2021.
-
Data Assimilation Predictive GAN (DA-PredGAN): applied to determine the spread of COVID-19
Authors:
Vinicius L. S. Silva,
Claire E. Heaney,
Yaqi Li,
Christopher C. Pain
Abstract:
We propose the novel use of a generative adversarial network (GAN) (i) to make predictions in time (PredGAN) and (ii) to assimilate measurements (DA-PredGAN). In the latter case, we take advantage of the natural adjoint-like properties of generative models and the ability to simulate forwards and backwards in time. GANs have received much attention recently, after achieving excellent results for t…
▽ More
We propose the novel use of a generative adversarial network (GAN) (i) to make predictions in time (PredGAN) and (ii) to assimilate measurements (DA-PredGAN). In the latter case, we take advantage of the natural adjoint-like properties of generative models and the ability to simulate forwards and backwards in time. GANs have received much attention recently, after achieving excellent results for their generation of realistic-looking images. We wish to explore how this property translates to new applications in computational modelling and to exploit the adjoint-like properties for efficient data assimilation. To predict the spread of COVID-19 in an idealised town, we apply these methods to a compartmental model in epidemiology that is able to model space and time variations. To do this, the GAN is set within a reduced-order model (ROM), which uses a low-dimensional space for the spatial distribution of the simulation states. Then the GAN learns the evolution of the low-dimensional states over time. The results show that the proposed methods can accurately predict the evolution of the high-fidelity numerical simulation, and can efficiently assimilate observed data and determine the corresponding model parameters.
△ Less
Submitted 18 June, 2021; v1 submitted 17 May, 2021;
originally announced May 2021.
-
NeurT-FDR: Controlling FDR by Incorporating Feature Hierarchy
Authors:
Lin Qiu,
Nils Murrugarra-Llerena,
Vítor Silva,
Lin Lin,
Vernon M. Chinchilli
Abstract:
Controlling false discovery rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring possible hierarchy among the covariates. This strategy may not be optimal for complex large-scale problems, where hierarchical information often exists among those test-lev…
▽ More
Controlling false discovery rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring possible hierarchy among the covariates. This strategy may not be optimal for complex large-scale problems, where hierarchical information often exists among those test-level covariates. We propose NeurT-FDR which boosts statistical power and controls FDR for multiple hypothesis testing while leveraging the hierarchy among test-level covariates. Our method parametrizes the test-level covariates as a neural network and adjusts the feature hierarchy through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR has strong FDR guarantees and makes substantially more discoveries in synthetic and real datasets compared to competitive baselines.
△ Less
Submitted 24 January, 2021;
originally announced January 2021.
-
Randomization and Fair Judgment in Law and Science
Authors:
Julio Michael Stern,
Marcos Antonio Simplicio,
Marcos Vinicius M. Silva,
Roberto A. Castellanos Pfeiffer
Abstract:
Randomization procedures are used in legal and statistical applications, aiming to shield important decisions from spurious influences. This article gives an intuitive introduction to randomization and examines some intended consequences of its use related to truthful statistical inference and fair legal judgment. This article also presents an open-code Java implementation for a cryptographically…
▽ More
Randomization procedures are used in legal and statistical applications, aiming to shield important decisions from spurious influences. This article gives an intuitive introduction to randomization and examines some intended consequences of its use related to truthful statistical inference and fair legal judgment. This article also presents an open-code Java implementation for a cryptographically secure, statistically reliable, transparent, traceable, and fully auditable randomization tool.
△ Less
Submitted 18 December, 2020; v1 submitted 15 August, 2020;
originally announced August 2020.
-
A Fair, Traceable, Auditable and Participatory Randomization Tool for Legal Systems
Authors:
Marcos Vinicius M. Silva,
Marcos Antonio Simplicio Jr.,
Roberto Augusto Castellanos Pfeiffer,
Julio Michael Stern
Abstract:
Many real-world scenarios require the random selection of one or more individuals from a pool of eligible candidates. One example of especial social relevance refers to the legal system, in which the jurors and judges are commonly picked according to some probability distribution aiming to avoid biased decisions. In this scenario, ensuring auditability of the random drawing procedure is imperative…
▽ More
Many real-world scenarios require the random selection of one or more individuals from a pool of eligible candidates. One example of especial social relevance refers to the legal system, in which the jurors and judges are commonly picked according to some probability distribution aiming to avoid biased decisions. In this scenario, ensuring auditability of the random drawing procedure is imperative to promote confidence in its fairness. With this goal in mind, this article describes a protocol for random drawings specially designed for use in legal systems. The proposed design combines the following properties: security by design, ensuring the fairness of the random draw as long as at least one participant behaves honestly; auditability by any interested party, even those having no technical background, using only public information; and statistical robustness, supporting drawings where candidates may have distinct probability distributions. Moreover, it is capable of inviting and engaging as participating stakeholders the main interested parties of a legal process, in a way that promotes process transparency, public trust and institutional resilience. An open-source implementation is also provided as supplementary material.
△ Less
Submitted 4 June, 2020;
originally announced June 2020.
-
Bayesian factor models for multivariate categorical data obtained from questionnaires
Authors:
Vitor G. C. da Silva,
Kelly C. M. Gonçalves,
João B. M. Pereira
Abstract:
Factor analysis is a flexible technique for assessment of multivariate dependence and codependence. Besides being an exploratory tool used to reduce the dimensionality of multivariate data, it allows estimation of common factors that often have an interesting theoretical interpretation in real problems. However, standard factor analysis is only applicable when the variables are scaled, which is of…
▽ More
Factor analysis is a flexible technique for assessment of multivariate dependence and codependence. Besides being an exploratory tool used to reduce the dimensionality of multivariate data, it allows estimation of common factors that often have an interesting theoretical interpretation in real problems. However, standard factor analysis is only applicable when the variables are scaled, which is often inappropriate, for example, in data obtained from questionnaires in the field of psychology,where the variables are often categorical. In this framework, we propose a factor model for the analysis of multivariate ordered and non-ordered polychotomous data. The inference procedure is done under the Bayesian approach via Markov chain Monte Carlo methods. Two Monte-Carlo simulation studies are presented to investigate the performance of this approach in terms of estimation bias, precision and assessment of the number of factors. We also illustrate the proposed method to analyze participants' responses to the Motivational State Questionnaire dataset, developed to study emotions in laboratory and field settings.
△ Less
Submitted 7 May, 2020; v1 submitted 9 October, 2019;
originally announced October 2019.
-
Dynamics of Snoring Sounds and Its Connection with Obstructive Sleep Apnea
Authors:
Adriano M. Alencar,
Diego Greatti Vaz da Silva,
Carolina Beatriz Oliveira,
Andre P. Vieira,
Henrique T. Moriya,
Geraldo Lorenzi-Filho
Abstract:
Snoring is extremely common in the general population and when irregular may indicate the presence of obstructive sleep apnea. We analyze the overnight sequence of wave packets --- the snore sound --- recorded during full polysomnography in patients referred to the sleep laboratory due to suspected obstructive sleep apnea. We hypothesize that irregular snore, with duration in the range between 10…
▽ More
Snoring is extremely common in the general population and when irregular may indicate the presence of obstructive sleep apnea. We analyze the overnight sequence of wave packets --- the snore sound --- recorded during full polysomnography in patients referred to the sleep laboratory due to suspected obstructive sleep apnea. We hypothesize that irregular snore, with duration in the range between 10 and 100 seconds, correlates with respiratory obstructive events. We find that the number of irregular snores --- easily accessible, and quantified by what we call the snore time interval index (STII) --- is in good agreement with the well-known apnea-hypopnea index, which expresses the severity of obstructive sleep apnea and is extracted only from polysomnography. In addition, the Hurst analysis of the snore sound itself, which calculates the fluctuations in the signal as a function of time interval, is used to build a classifier that is able to distinguish between patients with no or mild apnea and patients with moderate or severe apnea.
△ Less
Submitted 10 August, 2012;
originally announced August 2012.