-
A modern approach to transition analysis and process mining with Markov models: A tutorial with R
Authors:
Jouni Helske,
Satu Helske,
Mohammed Saqr,
Sonsoles López-Pernas,
Keefe Murphy
Abstract:
This chapter presents an introduction to Markovian modeling for the analysis of sequence data. Contrary to the deterministic approach seen in the previous sequence analysis chapters, Markovian models are probabilistic models, focusing on the transitions between states instead of studying sequences as a whole. The chapter provides an introduction to this method and differentiates between its most c…
▽ More
This chapter presents an introduction to Markovian modeling for the analysis of sequence data. Contrary to the deterministic approach seen in the previous sequence analysis chapters, Markovian models are probabilistic models, focusing on the transitions between states instead of studying sequences as a whole. The chapter provides an introduction to this method and differentiates between its most common variations: first-order Markov models, hidden Markov models, mixture Markov models, and mixture hidden Markov models. In addition to a thorough explanation and contextualization within the existing literature, the chapter provides a step-by-step tutorial on how to implement each type of Markovian model using the R package seqHMM. The chaper also provides a complete guide to performing stochastic process mining with Markovian models as well as plotting, comparing and clustering different process models.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
Can visualization alleviate dichotomous thinking? Effects of visual representations on the cliff effect
Authors:
Jouni Helske,
Satu Helske,
Matthew Cooper,
Anders Ynnerman,
Lonni Besançon
Abstract:
Common reporting styles for statistical results in scientific articles, such as p-values and confidence intervals (CI), have been reported to be prone to dichotomous interpretations, especially with respect to the null hypothesis significance testing framework. For example when the p-value is small enough or the CIs of the mean effects of a studied drug and a placebo are not overlap**, scientist…
▽ More
Common reporting styles for statistical results in scientific articles, such as p-values and confidence intervals (CI), have been reported to be prone to dichotomous interpretations, especially with respect to the null hypothesis significance testing framework. For example when the p-value is small enough or the CIs of the mean effects of a studied drug and a placebo are not overlap**, scientists tend to claim significant differences while often disregarding the magnitudes and absolute differences in the effect sizes. This type of reasoning has been shown to be potentially harmful to science. Techniques relying on the visual estimation of the strength of evidence have been recommended to reduce such dichotomous interpretations but their effectiveness has also been challenged. We ran two experiments on researchers with expertise in statistical analysis to compare several alternative representations of confidence intervals and used Bayesian multilevel models to estimate the effects of the representation styles on differences in researchers' subjective confidence in the results. We also asked the respondents' opinions and preferences in representation styles. Our results suggest that adding visual information to classic CI representation can decrease the tendency towards dichotomous interpretations - measured as the `cliff effect': the sudden drop in confidence around p-value 0.05 - compared with classic CI visualization and textual representation of the CI with p-values. All data and analyses are publicly available at https://github.com/helske/statvis.
△ Less
Submitted 28 May, 2021; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R
Authors:
Satu Helske,
Jouni Helske
Abstract:
Sequence analysis is being more and more widely used for the analysis of social sequences and other multivariate categorical time series data. However, it is often complex to describe, visualize, and compare large sequence data, especially when there are multiple parallel sequences per subject. Hidden (latent) Markov models (HMMs) are able to detect underlying latent structures and they can be use…
▽ More
Sequence analysis is being more and more widely used for the analysis of social sequences and other multivariate categorical time series data. However, it is often complex to describe, visualize, and compare large sequence data, especially when there are multiple parallel sequences per subject. Hidden (latent) Markov models (HMMs) are able to detect underlying latent structures and they can be used in various longitudinal settings: to account for measurement error, to detect unobservable states, or to compress information across several types of observations. Extending to mixture hidden Markov models (MHMMs) allows clustering data into homogeneous subsets, with or without external covariates.
The seqHMM package in R is designed for the efficient modeling of sequences and other categorical time series data containing one or multiple subjects with one or multiple interdependent sequences using HMMs and MHMMs. Also other restricted variants of the MHMM can be fitted, e.g., latent class models, Markov models, mixture Markov models, or even ordinary multinomial regression models with suitable parameterization of the HMM. Good graphical presentations of data and models are useful during the whole analysis process from the first glimpse at the data to model fitting and presentation of results. The package provides easy options for plotting parallel sequence data, and proposes visualizing HMMs as directed graphs.
△ Less
Submitted 26 January, 2019; v1 submitted 3 April, 2017;
originally announced April 2017.