-
A modern approach to transition analysis and process mining with Markov models: A tutorial with R
Authors:
Jouni Helske,
Satu Helske,
Mohammed Saqr,
Sonsoles López-Pernas,
Keefe Murphy
Abstract:
This chapter presents an introduction to Markovian modeling for the analysis of sequence data. Contrary to the deterministic approach seen in the previous sequence analysis chapters, Markovian models are probabilistic models, focusing on the transitions between states instead of studying sequences as a whole. The chapter provides an introduction to this method and differentiates between its most c…
▽ More
This chapter presents an introduction to Markovian modeling for the analysis of sequence data. Contrary to the deterministic approach seen in the previous sequence analysis chapters, Markovian models are probabilistic models, focusing on the transitions between states instead of studying sequences as a whole. The chapter provides an introduction to this method and differentiates between its most common variations: first-order Markov models, hidden Markov models, mixture Markov models, and mixture hidden Markov models. In addition to a thorough explanation and contextualization within the existing literature, the chapter provides a step-by-step tutorial on how to implement each type of Markovian model using the R package seqHMM. The chaper also provides a complete guide to performing stochastic process mining with Markovian models as well as plotting, comparing and clustering different process models.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
Clustering and Structural Robustness in Causal Diagrams
Authors:
Santtu Tikka,
Jouni Helske,
Juha Karvanen
Abstract:
Graphs are commonly used to represent and visualize causal relations. For a small number of variables, this approach provides a succinct and clear view of the scenario at hand. As the number of variables under study increases, the graphical approach may become impractical, and the clarity of the representation is lost. Clustering of variables is a natural way to reduce the size of the causal diagr…
▽ More
Graphs are commonly used to represent and visualize causal relations. For a small number of variables, this approach provides a succinct and clear view of the scenario at hand. As the number of variables under study increases, the graphical approach may become impractical, and the clarity of the representation is lost. Clustering of variables is a natural way to reduce the size of the causal diagram, but it may erroneously change the essential properties of the causal relations if implemented arbitrarily. We define a specific type of cluster, called transit cluster, that is guaranteed to preserve the identifiability properties of causal effects under certain conditions. We provide a sound and complete algorithm for finding all transit clusters in a given graph and demonstrate how clustering can simplify the identification of causal effects. We also study the inverse problem, where one starts with a clustered graph and looks for extended graphs where the identifiability properties of causal effects remain unchanged. We show that this kind of structural robustness is closely related to transit clusters.
△ Less
Submitted 15 August, 2023; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Can visualization alleviate dichotomous thinking? Effects of visual representations on the cliff effect
Authors:
Jouni Helske,
Satu Helske,
Matthew Cooper,
Anders Ynnerman,
Lonni Besançon
Abstract:
Common reporting styles for statistical results in scientific articles, such as p-values and confidence intervals (CI), have been reported to be prone to dichotomous interpretations, especially with respect to the null hypothesis significance testing framework. For example when the p-value is small enough or the CIs of the mean effects of a studied drug and a placebo are not overlap**, scientist…
▽ More
Common reporting styles for statistical results in scientific articles, such as p-values and confidence intervals (CI), have been reported to be prone to dichotomous interpretations, especially with respect to the null hypothesis significance testing framework. For example when the p-value is small enough or the CIs of the mean effects of a studied drug and a placebo are not overlap**, scientists tend to claim significant differences while often disregarding the magnitudes and absolute differences in the effect sizes. This type of reasoning has been shown to be potentially harmful to science. Techniques relying on the visual estimation of the strength of evidence have been recommended to reduce such dichotomous interpretations but their effectiveness has also been challenged. We ran two experiments on researchers with expertise in statistical analysis to compare several alternative representations of confidence intervals and used Bayesian multilevel models to estimate the effects of the representation styles on differences in researchers' subjective confidence in the results. We also asked the respondents' opinions and preferences in representation styles. Our results suggest that adding visual information to classic CI representation can decrease the tendency towards dichotomous interpretations - measured as the `cliff effect': the sudden drop in confidence around p-value 0.05 - compared with classic CI visualization and textual representation of the CI with p-values. All data and analyses are publicly available at https://github.com/helske/statvis.
△ Less
Submitted 28 May, 2021; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Graphical model inference: Sequential Monte Carlo meets deterministic approximations
Authors:
Fredrik Lindsten,
Jouni Helske,
Matti Vihola
Abstract:
Approximate inference in probabilistic graphical models (PGMs) can be grouped into deterministic methods and Monte-Carlo-based methods. The former can often provide accurate and rapid inferences, but are typically associated with biases that are hard to quantify. The latter enjoy asymptotic consistency, but can suffer from high computational costs. In this paper we present a way of bridging the ga…
▽ More
Approximate inference in probabilistic graphical models (PGMs) can be grouped into deterministic methods and Monte-Carlo-based methods. The former can often provide accurate and rapid inferences, but are typically associated with biases that are hard to quantify. The latter enjoy asymptotic consistency, but can suffer from high computational costs. In this paper we present a way of bridging the gap between deterministic and stochastic inference. Specifically, we suggest an efficient sequential Monte Carlo (SMC) algorithm for PGMs which can leverage the output from deterministic inference methods. While generally applicable, we show explicitly how this can be done with loopy belief propagation, expectation propagation, and Laplace approximations. The resulting algorithm can be viewed as a post-correction of the biases associated with these methods and, indeed, numerical results show clear improvements over the baseline deterministic methods as well as over "plain" SMC.
△ Less
Submitted 8 January, 2019;
originally announced January 2019.