Search | arXiv e-print repository

Why Online Reinforcement Learning is Causal

Abstract: Reinforcement learning (RL) and causal modelling naturally complement each other. The goal of causal modelling is to predict the effects of interventions in an environment, while the goal of reinforcement learning is to select interventions that maximize the rewards the agent receives from the environment. Reinforcement learning includes the two most powerful sources of information for estimating… ▽ More Reinforcement learning (RL) and causal modelling naturally complement each other. The goal of causal modelling is to predict the effects of interventions in an environment, while the goal of reinforcement learning is to select interventions that maximize the rewards the agent receives from the environment. Reinforcement learning includes the two most powerful sources of information for estimating causal relationships: temporal ordering and the ability to act on an environment. This paper examines which reinforcement learning settings we can expect to benefit from causal modelling, and how. In online learning, the agent has the ability to interact directly with their environment, and learn from exploring it. Our main argument is that in online learning, conditional probabilities are causal, and therefore offline RL is the setting where causal learning has the most potential to make a difference. Essentially, the reason is that when an agent learns from their {\em own} experience, there are no unobserved confounders that influence both the agent's own exploratory actions and the rewards they receive. Our paper formalizes this argument. For offline RL, where an agent may and typically does learn from the experience of {\em others}, we describe previous and new methods for leveraging a causal model, including support for counterfactual queries. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 27 pages

ACM Class: I.2.6

arXiv:2402.11124 [pdf, other]

Implicit Causal Representation Learning via Switchable Mechanisms

Authors: Shayan Shirahmad Gale Bagi, Zahra Gharaee, Oliver Schulte, Mark Crowley

Abstract: Learning causal representations from observational and interventional data in the absence of known ground-truth graph structures necessitates implicit latent causal representation learning. Implicit learning of causal mechanisms typically involves two categories of interventional data: hard and soft interventions. In real-world scenarios, soft interventions are often more realistic than hard inter… ▽ More Learning causal representations from observational and interventional data in the absence of known ground-truth graph structures necessitates implicit latent causal representation learning. Implicit learning of causal mechanisms typically involves two categories of interventional data: hard and soft interventions. In real-world scenarios, soft interventions are often more realistic than hard interventions, as the latter require fully controlled environments. Unlike hard interventions, which directly force changes in a causal variable, soft interventions exert influence indirectly by affecting the causal mechanism. However, the subtlety of soft interventions impose several challenges for learning causal models. One challenge is that soft intervention's effects are ambiguous, since parental relations remain intact. In this paper, we tackle the challenges of learning causal models using soft interventions while retaining implicit modeling. Our approach models the effects of soft interventions by employing a \textit{causal mechanism switch variable} designed to toggle between different causal mechanisms. In our experiments, we consistently observe improved learning of identifiable, causal representations, compared to baseline approaches. △ Less

Submitted 28 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2305.01089 [pdf, other]

Computing Expected Motif Counts for Exchangeable Graph Generative Models

Authors: Oliver Schulte

Abstract: Estimating the expected value of a graph statistic is an important inference task for using and learning graph models. This note presents a scalable estimation procedure for expected motif counts, a widely used type of graph statistic. The procedure applies for generative mixture models of the type used in neural and Bayesian approaches to graph data. Estimating the expected value of a graph statistic is an important inference task for using and learning graph models. This note presents a scalable estimation procedure for expected motif counts, a widely used type of graph statistic. The procedure applies for generative mixture models of the type used in neural and Bayesian approaches to graph data. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: 8 pages

MSC Class: 60G09 ACM Class: I.2.6

arXiv:2302.08635 [pdf, other]

Generative Causal Representation Learning for Out-of-Distribution Motion Forecasting

Authors: Shayan Shirahmad Gale Bagi, Zahra Gharaee, Oliver Schulte, Mark Crowley

Abstract: Conventional supervised learning methods typically assume i.i.d samples and are found to be sensitive to out-of-distribution (OOD) data. We propose Generative Causal Representation Learning (GCRL) which leverages causality to facilitate knowledge transfer under distribution shifts. While we evaluate the effectiveness of our proposed method in human trajectory prediction models, GCRL can be applied… ▽ More Conventional supervised learning methods typically assume i.i.d samples and are found to be sensitive to out-of-distribution (OOD) data. We propose Generative Causal Representation Learning (GCRL) which leverages causality to facilitate knowledge transfer under distribution shifts. While we evaluate the effectiveness of our proposed method in human trajectory prediction models, GCRL can be applied to other domains as well. First, we propose a novel causal model that explains the generative factors in motion forecasting datasets using features that are common across all environments and with features that are specific to each environment. Selection variables are used to determine which parts of the model can be directly transferred to a new environment without fine-tuning. Second, we propose an end-to-end variational learning paradigm to learn the causal mechanisms that generate observations from features. GCRL is supported by strong theoretical results that imply identifiability of the causal model under certain assumptions. Experimental results on synthetic and real-world motion forecasting datasets show the robustness and effectiveness of our proposed method for knowledge transfer under zero-shot and low-shot settings by substantially outperforming the prior motion forecasting models on out-of-distribution prediction. Our code is available at https://github.com/sshirahmad/GCRL. △ Less

Submitted 25 April, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

arXiv:2302.07989 [pdf, ps, other]

From Graph Generation to Graph Classification

Authors: Oliver Schulte

Abstract: This note describes a new approach to classifying graphs that leverages graph generative models (GGM). Assuming a GGM that defines a joint probability distribution over graphs and their class labels, I derive classification formulas for the probability of a class label given a graph. A new conditional ELBO can be used to train a generative graph auto-encoder model for discrimination. While leverag… ▽ More This note describes a new approach to classifying graphs that leverages graph generative models (GGM). Assuming a GGM that defines a joint probability distribution over graphs and their class labels, I derive classification formulas for the probability of a class label given a graph. A new conditional ELBO can be used to train a generative graph auto-encoder model for discrimination. While leveraging generative models for classification has been well explored for non-relational i.i.d. data, to our knowledge it is a novel approach to graph classification. △ Less

Submitted 23 July, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

Comments: I welcome suggestions, comments, and proposals for collaboration to develop further the ideas in this paper. Please email [email protected]. I am grateful to Renjie Liao for helpful comments

ACM Class: I.2.6

arXiv:2301.12930 [pdf, other]

Cause-Effect Inference in Location-Scale Noise Models: Maximum Likelihood vs. Independence Testing

Authors: Xiangyu Sun, Oliver Schulte

Abstract: A fundamental problem of causal discovery is cause-effect inference, learning the correct causal direction between two random variables. Significant progress has been made through modelling the effect as a function of its cause and a noise term, which allows us to leverage assumptions about the generating function class. The recently introduced heteroscedastic location-scale noise functional model… ▽ More A fundamental problem of causal discovery is cause-effect inference, learning the correct causal direction between two random variables. Significant progress has been made through modelling the effect as a function of its cause and a noise term, which allows us to leverage assumptions about the generating function class. The recently introduced heteroscedastic location-scale noise functional models (LSNMs) combine expressive power with identifiability guarantees. LSNM model selection based on maximizing likelihood achieves state-of-the-art accuracy, when the noise distributions are correctly specified. However, through an extensive empirical evaluation, we demonstrate that the accuracy deteriorates sharply when the form of the noise distribution is misspecified by the user. Our analysis shows that the failure occurs mainly when the conditional variance in the anti-causal direction is smaller than that in the causal direction. As an alternative, we find that causal model selection through residual independence testing is much more robust to noise misspecification and misleading conditional variance. △ Less

Submitted 25 October, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: NeurIPS 2023

arXiv:2210.16844 [pdf, other]

Micro and Macro Level Graph Modeling for Graph Variational Auto-Encoders

Authors: Kiarash Zahirnia, Oliver Schulte, Parmis Naddaf, Ke Li

Abstract: Generative models for graph data are an important research topic in machine learning. Graph data comprise two levels that are typically analyzed separately: node-level properties such as the existence of a link between a pair of nodes, and global aggregate graph-level statistics, such as motif counts. This paper proposes a new multi-level framework that jointly models node-level properties and gra… ▽ More Generative models for graph data are an important research topic in machine learning. Graph data comprise two levels that are typically analyzed separately: node-level properties such as the existence of a link between a pair of nodes, and global aggregate graph-level statistics, such as motif counts. This paper proposes a new multi-level framework that jointly models node-level properties and graph-level statistics, as mutually reinforcing sources of information. We introduce a new micro-macro training objective for graph generation that combines node-level and graph-level losses. We utilize the micro-macro objective to improve graph generation with a GraphVAE, a well-established model based on graph-level latent variables, that provides fast training and generation time for medium-sized graphs. Our experiments show that adding micro-macro modeling to the GraphVAE model improves graph quality scores up to 2 orders of magnitude on five benchmark datasets, while maintaining the GraphVAE generation speed advantage. △ Less

Submitted 13 January, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

Comments: Thirty-sixth Conference on Neural Information Processing Systems, 2022

arXiv:2110.09767 [pdf, other]

Pre and Post Counting for Scalable Statistical-Relational Model Discovery

Authors: Richard Mar, Oliver Schulte

Abstract: Statistical-Relational Model Discovery aims to find statistically relevant patterns in relational data. For example, a relational dependency pattern may stipulate that a user's gender is associated with the gender of their friends. As with propositional (non-relational) graphical models, the major scalability bottleneck for model discovery is computing instantiation counts: the number of times a r… ▽ More Statistical-Relational Model Discovery aims to find statistically relevant patterns in relational data. For example, a relational dependency pattern may stipulate that a user's gender is associated with the gender of their friends. As with propositional (non-relational) graphical models, the major scalability bottleneck for model discovery is computing instantiation counts: the number of times a relational pattern is instantiated in a database. Previous work on propositional learning utilized pre-counting or post-counting to solve this task. This paper takes a detailed look at the memory and speed trade-offs between pre-counting and post-counting strategies for relational learning. A pre-counting approach computes and caches instantiation counts for a large set of relational patterns before model search. A post-counting approach computes an instantiation count dynamically on-demand for each candidate pattern generated during the model search. We describe a novel hybrid approach, tailored to relational data, that achieves a sweet spot with pre-counting for patterns involving positive relationships (e.g. pairs of users who are friends) and post-counting for patterns involving negative relationships (e.g. pairs of users who are not friends). Our hybrid approach scales model discovery to millions of data facts. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: Presented at the Tenth International Workshop on Statistical Relational AI at the 1st International Joint Conference on Learning & Reasoning

MSC Class: 68T05 ACM Class: I.2.6

arXiv:2109.04286 [pdf, other]

NTS-NOTEARS: Learning Nonparametric DBNs With Prior Knowledge

Authors: Xiangyu Sun, Oliver Schulte, Guiliang Liu, Pascal Poupart

Abstract: We describe NTS-NOTEARS, a score-based structure learning method for time-series data to learn dynamic Bayesian networks (DBNs) that captures nonlinear, lagged (inter-slice) and instantaneous (intra-slice) relations among variables. NTS-NOTEARS utilizes 1D convolutional neural networks (CNNs) to model the dependence of child variables on their parents; 1D CNN is a neural function approximation mod… ▽ More We describe NTS-NOTEARS, a score-based structure learning method for time-series data to learn dynamic Bayesian networks (DBNs) that captures nonlinear, lagged (inter-slice) and instantaneous (intra-slice) relations among variables. NTS-NOTEARS utilizes 1D convolutional neural networks (CNNs) to model the dependence of child variables on their parents; 1D CNN is a neural function approximation model well-suited for sequential data. DBN-CNN structure learning is formulated as a continuous optimization problem with an acyclicity constraint, following the NOTEARS DAG learning approach. We show how prior knowledge of dependencies (e.g., forbidden and required edges) can be included as additional optimization constraints. Empirical evaluation on simulated and benchmark data show that NTS-NOTEARS achieves state-of-the-art DAG structure quality compared to both parametric and nonparametric baseline methods, with improvement in the range of 10-20% on the F1-score. We also evaluate NTS-NOTEARS on complex real-world data acquired from professional ice hockey games that contain a mixture of continuous and discrete variables. The code is available online. △ Less

Submitted 1 March, 2023; v1 submitted 9 September, 2021; originally announced September 2021.

Comments: AISTATS 2023

arXiv:2106.15239 [pdf, other]

Generating the Graph Gestalt: Kernel-Regularized Graph Representation Learning

Authors: Kiarash Zahirnia, Ankita Sakhuja, Oliver Schulte, Parmis Nadaf, Ke Li, Xia Hu

Abstract: Recent work on graph generative models has made remarkable progress towards generating increasingly realistic graphs, as measured by global graph features such as degree distribution, density, and clustering coefficients. Deep generative models have also made significant advances through better modelling of the local correlations in the graph topology, which have been very useful for predicting un… ▽ More Recent work on graph generative models has made remarkable progress towards generating increasingly realistic graphs, as measured by global graph features such as degree distribution, density, and clustering coefficients. Deep generative models have also made significant advances through better modelling of the local correlations in the graph topology, which have been very useful for predicting unobserved graph components, such as the existence of a link or the class of a node, from nearby observed graph components. A complete scientific understanding of graph data should address both global and local structure. In this paper, we propose a joint model for both as complementary objectives in a graph VAE framework. Global structure is captured by incorporating graph kernels in a probabilistic model whose loss function is closely related to the maximum mean discrepancy(MMD) between the global structures of the reconstructed and the input graphs. The ELBO objective derived from the model regularizes a standard local link reconstruction term with an MMD term. Our experiments demonstrate a significant improvement in the realism of the generated graph structures, typically by 1-2 orders of magnitude of graph structure metrics, compared to leading graph VAEand GAN models. Local link reconstruction improves as well in many cases. △ Less

Submitted 29 June, 2021; originally announced June 2021.

arXiv:2006.04551 [pdf, other]

doi 10.1145/3394486.3403367

Cracking the Black Box: Distilling Deep Sports Analytics

Authors: Xiangyu Sun, Jack Davis, Oliver Schulte, Guiliang Liu

Abstract: This paper addresses the trade-off between Accuracy and Transparency for deep learning applied to sports analytics. Neural nets achieve great predictive accuracy through deep learning, and are popular in sports analytics. But it is hard to interpret a neural net model and harder still to extract actionable insights from the knowledge implicit in it. Therefore, we built a simple and transparent mod… ▽ More This paper addresses the trade-off between Accuracy and Transparency for deep learning applied to sports analytics. Neural nets achieve great predictive accuracy through deep learning, and are popular in sports analytics. But it is hard to interpret a neural net model and harder still to extract actionable insights from the knowledge implicit in it. Therefore, we built a simple and transparent model that mimics the output of the original deep learning model and represents the learned knowledge in an explicit interpretable way. Our mimic model is a linear model tree, which combines a collection of linear models with a regression-tree structure. The tree version of a neural network achieves high fidelity, explains itself, and produces insights for expert stakeholders such as athletes and coaches. We propose and compare several scalable model tree learning heuristics to address the computational challenge from datasets with millions of data points. △ Less

Submitted 29 June, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: Accepted by the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2020); Added the tenth feature to Table 3 for soccer;

arXiv:2004.10984 [pdf, other]

A Complete Characterization of Projectivity for Statistical Relational Models

Authors: Manfred Jaeger, Oliver Schulte

Abstract: A generative probabilistic model for relational data consists of a family of probability distributions for relational structures over domains of different sizes. In most existing statistical relational learning (SRL) frameworks, these models are not projective in the sense that the marginal of the distribution for size-$n$ structures on induced sub-structures of size $k<n$ is equal to the given di… ▽ More A generative probabilistic model for relational data consists of a family of probability distributions for relational structures over domains of different sizes. In most existing statistical relational learning (SRL) frameworks, these models are not projective in the sense that the marginal of the distribution for size-$n$ structures on induced sub-structures of size $k<n$ is equal to the given distribution for size-$k$ structures. Projectivity is very beneficial in that it directly enables lifted inference and statistically consistent learning from sub-sampled relational structures. In earlier work some simple fragments of SRL languages have been identified that represent projective models. However, no complete characterization of, and representation framework for projective models has been given. In this paper we fill this gap: exploiting representation theorems for infinite exchangeable arrays we introduce a class of directed graphical latent variable models that precisely correspond to the class of projective relational models. As a by-product we also obtain a characterization for when a given distribution over size-$k$ structures is the statistical frequency distribution of size-$k$ sub-structures in much larger size-$n$ structures. These results shed new light onto the old open problem of how to apply Halpern et al.'s "random worlds approach" for probabilistic inference to general relational signatures. △ Less

Submitted 22 June, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

Comments: Extended version (with proof appendix) of paper that is too appear in Proceedings of IJCAI 2020

MSC Class: 60G09 ACM Class: I.2.6

arXiv:1902.09711 [pdf, ps, other]

Detecting Data Errors with Statistical Constraints

Authors: **g Nathan Yan, Oliver Schulte, Jiannan Wang, Reynold Cheng

Abstract: A powerful approach to detecting erroneous data is to check which potentially dirty data records are incompatible with a user's domain knowledge. Previous approaches allow the user to specify domain knowledge in the form of logical constraints (e.g., functional dependency and denial constraints). We extend the constraint-based approach by introducing a novel class of statistical constraints (SCs).… ▽ More A powerful approach to detecting erroneous data is to check which potentially dirty data records are incompatible with a user's domain knowledge. Previous approaches allow the user to specify domain knowledge in the form of logical constraints (e.g., functional dependency and denial constraints). We extend the constraint-based approach by introducing a novel class of statistical constraints (SCs). An SC treats each column as a random variable, and enforces an independence or dependence relationship between two (or a few) random variables. Statistical constraints are expressive, allowing the user to specify a wide range of domain knowledge, beyond traditional integrity constraints. Furthermore, they work harmoniously with downstream statistical modeling. We develop CODED, an SC-Oriented Data Error Detection system that supports three key tasks: (1) Checking whether an SC is violated or not on a given dataset, (2) Identify the top-k records that contribute the most to the violation of an SC, and (3) Checking whether a set of input SCs have conflicts or not. We present effective solutions for each task. Experiments on synthetic and real-world data illustrate how SCs apply to error detection, and provide evidence that CODED performs better than state-of-the-art approaches. △ Less

Submitted 25 February, 2019; originally announced February 2019.

arXiv:1807.05887 [pdf, other]

Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees

Authors: Guiliang Liu, Oliver Schulte, Wang Zhu, Qingcan Li

Abstract: Deep Reinforcement Learning (DRL) has achieved impressive success in many applications. A key component of many DRL models is a neural network representing a Q function, to estimate the expected cumulative reward following a state-action pair. The Q function neural network contains a lot of implicit knowledge about the RL problems, but often remains unexamined and uninterpreted. To our knowledge,… ▽ More Deep Reinforcement Learning (DRL) has achieved impressive success in many applications. A key component of many DRL models is a neural network representing a Q function, to estimate the expected cumulative reward following a state-action pair. The Q function neural network contains a lot of implicit knowledge about the RL problems, but often remains unexamined and uninterpreted. To our knowledge, this work develops the first mimic learning framework for Q functions in DRL. We introduce Linear Model U-trees (LMUTs) to approximate neural network predictions. An LMUT is learned using a novel on-line algorithm that is well-suited for an active play setting, where the mimic learner observes an ongoing interaction between the neural net and the environment. Empirical evaluation shows that an LMUT mimics a Q function substantially better than five baseline methods. The transparent tree structure of an LMUT facilitates understanding the network's learned knowledge by analyzing feature influence, extracting rules, and highlighting the super-pixels in image inputs. △ Less

Submitted 16 July, 2018; originally announced July 2018.

Comments: This paper is accepted by ECML-PKDD 2018

arXiv:1807.00564 [pdf, ps, other]

Inference, Learning, and Population Size: Projectivity for SRL Models

Authors: Manfred Jaeger, Oliver Schulte

Abstract: A subtle difference between propositional and relational data is that in many relational models, marginal probabilities depend on the population or domain size. This paper connects the dependence on population size to the classic notion of projectivity from statistical theory: Projectivity implies that relational predictions are robust with respect to changes in domain size. We discuss projectivit… ▽ More A subtle difference between propositional and relational data is that in many relational models, marginal probabilities depend on the population or domain size. This paper connects the dependence on population size to the classic notion of projectivity from statistical theory: Projectivity implies that relational predictions are robust with respect to changes in domain size. We discuss projectivity for a number of common SRL systems, and identify syntactic fragments that are guaranteed to yield projective models. The syntactic conditions are restrictive, which suggests that projectivity is difficult to achieve in SRL, and care must be taken when working with different domain sizes. △ Less

Submitted 2 July, 2018; originally announced July 2018.

arXiv:1807.00381 [pdf, other]

Model-based Exception Mining for Object-Relational Data

Authors: Fatemeh Riahi, Oliver Schulte

Abstract: This paper is based on a previous publication [29]. Our work extends exception mining and outlier detection to the case of object-relational data. Object-relational data represent a complex heterogeneous network [12], which comprises objects of different types, links among these objects, also of different types, and attributes of these links. This special structure prohibits a direct vectorial dat… ▽ More This paper is based on a previous publication [29]. Our work extends exception mining and outlier detection to the case of object-relational data. Object-relational data represent a complex heterogeneous network [12], which comprises objects of different types, links among these objects, also of different types, and attributes of these links. This special structure prohibits a direct vectorial data representation. We follow the well-established Exceptional Model Mining framework, which leverages machine learning models for exception mining: A object is exceptional to the extent that a model learned for the object data differs from a model learned for the general population. Exceptional objects can be viewed as outliers. We apply state of-the-art probabilistic modelling techniques for object-relational data that construct a graphical model (Bayesian network), which compactly represents probabilistic associations in the data. A new metric, derived from the learned object-relational model, quantifies the extent to which the individual association pattern of a potential outlier deviates from that of the whole population. The metric is based on the likelihood ratio of two parameter vectors: One that represents the population associations, and another that represents the individual associations. Our method is validated on synthetic datasets and on real-world data sets about soccer matches and movies. Compared to baseline methods, our novel transformed likelihood ratio achieved the best detection accuracy on all datasets. △ Less

Submitted 1 July, 2018; originally announced July 2018.

Comments: StarAI 2018

arXiv:1805.11088 [pdf, other]

doi 10.24963/ijcai.2018/478

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Authors: Guiliang Liu, Oliver Schulte

Abstract: A variety of machine learning models have been proposed to assess the performance of players in professional sports. However, they have only a limited ability to model how player performance depends on the game context. This paper proposes a new approach to capturing game context: we apply Deep Reinforcement Learning (DRL) to learn an action-value Q function from 3M play-by-play events in the Nati… ▽ More A variety of machine learning models have been proposed to assess the performance of players in professional sports. However, they have only a limited ability to model how player performance depends on the game context. This paper proposes a new approach to capturing game context: we apply Deep Reinforcement Learning (DRL) to learn an action-value Q function from 3M play-by-play events in the National Hockey League (NHL). The neural network representation integrates both continuous context signals and game history, using a possession-based LSTM. The learned Q-function is used to value players' actions under different game contexts. To assess a player's overall performance, we introduce a novel Game Impact Metric (GIM) that aggregates the values of the player's actions. Empirical Evaluation shows GIM is consistent throughout a play season, and correlates highly with standard success measures and future salary. △ Less

Submitted 16 July, 2018; v1 submitted 26 May, 2018; originally announced May 2018.

Comments: This paper has been accepted by IJCAI 2018

arXiv:1802.08765 [pdf, other]

Model Trees for Identifying Exceptional Players in the NHL Draft

Authors: Oliver Schulte, Yejia Liu, Chao Li

Abstract: Drafting strong players is crucial for the team success. We describe a new data-driven interpretable approach for assessing draft prospects in the National Hockey League. Successful previous approaches have built a predictive model based on player features, or derived performance predictions from the observed performance of comparable players in a cohort. This paper develops model tree learning, w… ▽ More Drafting strong players is crucial for the team success. We describe a new data-driven interpretable approach for assessing draft prospects in the National Hockey League. Successful previous approaches have built a predictive model based on player features, or derived performance predictions from the observed performance of comparable players in a cohort. This paper develops model tree learning, which incorporates strengths of both model-based and cohort-based approaches. A model tree partitions the feature space according to the values of discrete features, or learned thresholds for continuous features. Each leaf node in the tree defines a group of players, easily described to hockey experts, with its own group regression model. Compared to a single model, the model tree forms an ensemble that increases predictive power. Compared to cohort-based approaches, the groups of comparables are discovered from the data, without requiring a similarity metric. The performance predictions of the model tree are competitive with the state-of-the-art methods, which validates our model empirically. We show in case studies that the model tree player ranking can be used to highlight strong and weak points of players. △ Less

Submitted 23 February, 2018; originally announced February 2018.

Comments: 14 pages

arXiv:1511.03086 [pdf, ps, other]

The CTU Prague Relational Learning Repository

Authors: Jan Motl, Oliver Schulte

Abstract: The aim of the Prague Relational Learning Repository is to support machine learning research with multi-relational data. The repository currently contains 148 SQL databases hosted on a public MySQL server located at \url{https://relational-data.org}. The server is provided by getML to support the relational machine learning community (\url{www.getml.com}). A searchable meta-database provides metad… ▽ More The aim of the Prague Relational Learning Repository is to support machine learning research with multi-relational data. The repository currently contains 148 SQL databases hosted on a public MySQL server located at \url{https://relational-data.org}. The server is provided by getML to support the relational machine learning community (\url{www.getml.com}). A searchable meta-database provides metadata (e.g., the number of tables in the database, the number of rows and columns in the tables, the number of self-relationships). △ Less

Submitted 11 March, 2024; v1 submitted 10 November, 2015; originally announced November 2015.

Comments: 7 pages

ACM Class: I.2.6; H.2.8

arXiv:1508.02428 [pdf, other]

FactorBase: SQL for Learning A Multi-Relational Graphical Model

Authors: Oliver Schulte, Zhensong Qian

Abstract: We describe FactorBase, a new SQL-based framework that leverages a relational database management system to support multi-relational model discovery. A multi-relational statistical model provides an integrated analysis of the heterogeneous and interdependent data resources in the database. We adopt the BayesStore design philosophy: statistical models are stored and managed as first-class citizens… ▽ More We describe FactorBase, a new SQL-based framework that leverages a relational database management system to support multi-relational model discovery. A multi-relational statistical model provides an integrated analysis of the heterogeneous and interdependent data resources in the database. We adopt the BayesStore design philosophy: statistical models are stored and managed as first-class citizens inside a database. Whereas previous systems like BayesStore support multi-relational inference, FactorBase supports multi-relational learning. A case study on six benchmark databases evaluates how our system supports a challenging machine learning application, namely learning a first-order Bayesian network model for an entire database. Model learning in this setting has to examine a large number of potential statistical associations across data tables. Our implementation shows how the SQL constructs in FactorBase facilitate the fast, modular, and reliable development of highly scalable model learning systems. △ Less

Submitted 10 August, 2015; originally announced August 2015.

Comments: 14 pages, 10 figures, 10 tables, Published on 2015 IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA'2015), Oct 19-21, 2015, Paris, France

ACM Class: H.2.8; H.2.4

arXiv:1507.00646 [pdf, other]

SQL for SRL: Structure Learning Inside a Database System

Authors: Oliver Schulte, Zhensong Qian

Abstract: The position we advocate in this paper is that relational algebra can provide a unified language for both representing and computing with statistical-relational objects, much as linear algebra does for traditional single-table machine learning. Relational algebra is implemented in the Structured Query Language (SQL), which is the basis of relational database management systems. To support our posi… ▽ More The position we advocate in this paper is that relational algebra can provide a unified language for both representing and computing with statistical-relational objects, much as linear algebra does for traditional single-table machine learning. Relational algebra is implemented in the Structured Query Language (SQL), which is the basis of relational database management systems. To support our position, we have developed the FACTORBASE system, which uses SQL as a high-level scripting language for statistical-relational learning of a graphical model structure. The design philosophy of FACTORBASE is to manage statistical models as first-class citizens inside a database. Our implementation shows how our SQL constructs in FACTORBASE facilitate fast, modular, and reliable program development. Empirical evidence from six benchmark databases indicates that leveraging database system capabilities achieves scalable model structure learning. △ Less

Submitted 2 July, 2015; originally announced July 2015.

Comments: 3 pages, 1 figure, Position Paper of the Fifth International Workshop on Statistical Relational AI at UAI 2015

ACM Class: H.2.8; H.2.4

arXiv:1410.7835 [pdf, other]

Fast Learning of Relational Dependency Networks

Authors: Oliver Schulte, Zhensong Qian, Arthur E. Kirkpatrick, Xiaoqian Yin, Yan Sun

Abstract: A Relational Dependency Network (RDN) is a directed graphical model widely used for multi-relational data. These networks allow cyclic dependencies, necessary to represent relational autocorrelations. We describe an approach for learning both the RDN's structure and its parameters, given an input relational database: First learn a Bayesian network (BN), then transform the Bayesian network to an RD… ▽ More A Relational Dependency Network (RDN) is a directed graphical model widely used for multi-relational data. These networks allow cyclic dependencies, necessary to represent relational autocorrelations. We describe an approach for learning both the RDN's structure and its parameters, given an input relational database: First learn a Bayesian network (BN), then transform the Bayesian network to an RDN. Thus fast Bayes net learning can provide fast RDN learning. The BN-to-RDN transform comprises a simple, local adjustment of the Bayes net structure and a closed-form transform of the Bayes net parameters. This method can learn an RDN for a dataset with a million tuples in minutes. We empirically compare our approach to state-of-the art RDN learning methods that use functional gradient boosting, on five benchmark datasets. Learning RDNs via BNs scales much better to large datasets than learning RDNs with boosting, and provides competitive accuracy in predictions. △ Less

Submitted 8 December, 2014; v1 submitted 28 October, 2014; originally announced October 2014.

Comments: 17 pages, 2 figures, 3 tables, Accepted as long paper by ILP 2014, September 14- 16th, Nancy, France. Added the Appendix: Proof of Consistency Characterization

arXiv:1408.5389 [pdf, other]

doi 10.1145/2661829.2662010

Computing Multi-Relational Sufficient Statistics for Large Databases

Authors: Zhensong Qian, Oliver Schulte, Yan Sun

Abstract: Databases contain information about which relationships do and do not hold among entities. To make this information accessible for statistical analysis requires computing sufficient statistics that combine information from different database tables. Such statistics may involve any number of {\em positive and negative} relationships. With a naive enumeration approach, computing sufficient statistic… ▽ More Databases contain information about which relationships do and do not hold among entities. To make this information accessible for statistical analysis requires computing sufficient statistics that combine information from different database tables. Such statistics may involve any number of {\em positive and negative} relationships. With a naive enumeration approach, computing sufficient statistics for negative relationships is feasible only for small databases. We solve this problem with a new dynamic programming algorithm that performs a virtual join, where the requisite counts are computed without materializing join tables. Contingency table algebra is a new extension of relational algebra, that facilitates the efficient implementation of this Möbius virtual join operation. The Möbius Join scales to large datasets (over 1M tuples) with complex schemas. Empirical evaluation with seven benchmark datasets showed that information about the presence and absence of links can be exploited in feature selection, association rule mining, and Bayesian network learning. △ Less

Submitted 22 August, 2014; originally announced August 2014.

Comments: 11pages, 8 figures, 8 tables, CIKM'14,November 3--7, 2014, Shanghai, China

ACM Class: H.2.8; H.2.4

arXiv:0811.4458 [pdf, other]

Learning Class-Level Bayes Nets for Relational Data

Authors: Oliver Schulte, Hassan Khosravi, Flavia Moser, Martin Ester

Abstract: Many databases store data in relational format, with different types of entities and information about links between the entities. The field of statistical-relational learning (SRL) has developed a number of new statistical models for such data. In this paper we focus on learning class-level or first-order dependencies, which model the general database statistics over attributes of linked object… ▽ More Many databases store data in relational format, with different types of entities and information about links between the entities. The field of statistical-relational learning (SRL) has developed a number of new statistical models for such data. In this paper we focus on learning class-level or first-order dependencies, which model the general database statistics over attributes of linked objects and links (e.g., the percentage of A grades given in computer science classes). Class-level statistical relationships are important in themselves, and they support applications like policy making, strategic planning, and query optimization. Most current SRL methods find class-level dependencies, but their main task is to support instance-level predictions about the attributes or links of specific entities. We focus only on class-level prediction, and describe algorithms for learning class-level models that are orders of magnitude faster for this task. Our algorithms learn Bayes nets with relational structure, leveraging the efficiency of single-table nonrelational Bayes net learners. An evaluation of our methods on three data sets shows that they are computationally feasible for realistic table sizes, and that the learned structures represent the statistical information in the databases well. After learning compiles the database statistics into a Bayes net, querying these statistics via Bayes net inference is faster than with SQL queries, and does not depend on the size of the database. △ Less

Submitted 20 October, 2009; v1 submitted 26 November, 2008; originally announced November 2008.

Comments: 14 pages (2 column)

Report number: TR 2008-17, School of Computing Science, Simon Fraser University ACM Class: I.2.6

arXiv:0710.2083 [pdf, ps, other]

Association Rules in the Relational Calculus

Authors: Oliver Schulte, Flavia Moser, Martin Ester, Zhiyong Lu

Abstract: One of the most utilized data mining tasks is the search for association rules. Association rules represent significant relationships between items in transactions. We extend the concept of association rule to represent a much broader class of associations, which we refer to as \emph{entity-relationship rules.} Semantically, entity-relationship rules express associations between properties of re… ▽ More One of the most utilized data mining tasks is the search for association rules. Association rules represent significant relationships between items in transactions. We extend the concept of association rule to represent a much broader class of associations, which we refer to as \emph{entity-relationship rules.} Semantically, entity-relationship rules express associations between properties of related objects. Syntactically, these rules are based on a broad subclass of safe domain relational calculus queries. We propose a new definition of support and confidence for entity-relationship rules and for the frequency of entity-relationship queries. We prove that the definition of frequency satisfies standard probability axioms and the Apriori property. △ Less

Submitted 10 October, 2007; originally announced October 2007.

Comments: 16 pages, 13 tables

Report number: SFU School of Computing Science, TR 2007-23

Showing 1–25 of 25 results for author: Schulte, O