-
Generalization in Automated Process Discovery: A Framework based on Event Log Patterns
Authors:
Daniel Reißner,
Abel Armas-Cervantes,
Marcello La Rosa
Abstract:
The importance of quality measures in process mining has increased. One of the key quality aspects, generalization, is concerned with measuring the degree of overfitting of a process model w.r.t. an event log, since the recorded behavior is just an example of the true behavior of the underlying business process. Existing generalization measures exhibit several shortcomings that severely hinder the…
▽ More
The importance of quality measures in process mining has increased. One of the key quality aspects, generalization, is concerned with measuring the degree of overfitting of a process model w.r.t. an event log, since the recorded behavior is just an example of the true behavior of the underlying business process. Existing generalization measures exhibit several shortcomings that severely hinder their applicability in practice. For example, they assume the event log fully fits the discovered process model, and cannot deal with large real-life event logs and complex process models. More significantly, current measures neglect generalizations for clear patterns that demand a certain construct in the model. For example, a repeating sequence in an event log should be generalized with a loop structure in the model. We address these shortcomings by proposing a framework of measures that generalize a set of patterns discovered from an event log with representative traces and check the corresponding control-flow structures in the process model via their trace alignment. We instantiate the framework with a generalization measure that uses tandem repeats to identify repetitive patterns that are compared to the loop structures and a concurrency oracle to identify concurrent patterns that are compared to the parallel structures of the process model. In an extensive qualitative and quantitative evaluation using 74 log-model pairs using against two baseline generalization measures, we show that the proposed generalization measure consistently ranks process models that fulfil the observed patterns with generalizing control-flow structures higher than those which do not, while the baseline measures disregard those patterns. Further, we show that our measure can be efficiently computed for datasets two orders of magnitude larger than the largest dataset the baseline generalization measures can handle.
△ Less
Submitted 26 March, 2022;
originally announced March 2022.
-
Efficient Conformance Checking using Approximate Alignment Computation with Tandem Repeats
Authors:
Daniel Reißner,
Abel Armas-Cervantes,
Marcello La Rosa
Abstract:
Conformance checking encompasses a body of process mining techniques which aim to find and describe the differences between a process model capturing the expected process behavior and a corresponding event log recording the observed behavior. Alignments are an established technique to compute the distance between a trace in the event log and the closest execution trace of a corresponding process m…
▽ More
Conformance checking encompasses a body of process mining techniques which aim to find and describe the differences between a process model capturing the expected process behavior and a corresponding event log recording the observed behavior. Alignments are an established technique to compute the distance between a trace in the event log and the closest execution trace of a corresponding process model. Given a cost function, an alignment is optimal when it contains the least number of mismatches between a log trace and a model trace. Determining optimal alignments, however, is computationally expensive, especially in light of the growing size and complexity of event logs from practice, which can easily exceed one million events with traces of several hundred activities. A common limitation of existing alignment techniques is the inability to exploit repetitions in the log. By exploiting a specific form of sequential pattern in traces, namely tandem repeats, we propose a novel approximate technique that uses pre- and post-processing steps to compress the length of a trace and recomputes the alignment cost while guaranteeing that the cost result never under-approximates the optimal cost. In an extensive empirical evaluation with 50 real-life model log pairs and against six state-of-the-art alignment techniques, we show that the proposed compression approach systematically outperforms the baselines by up to an order of magnitude in the presence of traces with repetitions, and that the cost over-approximation, when it occurs, is negligible.
△ Less
Submitted 26 March, 2022; v1 submitted 1 April, 2020;
originally announced April 2020.
-
Scalable Alignment of Process Models and Event Logs: An Approach Based on Automata and S-Components
Authors:
Daniel Reißner,
Abel Armas-Cervantes,
Raffaele Conforti,
Marlon Dumas,
Dirk Fahland,
Marcello La Rosa
Abstract:
Given a model of the expected behavior of a business process and an event log recording its observed behavior, the problem of business process conformance checking is that of identifying and describing the differences between the model and the log. A desirable feature of a conformance checking technique is to identify a minimal yet complete set of differences. Existing conformance checking techniq…
▽ More
Given a model of the expected behavior of a business process and an event log recording its observed behavior, the problem of business process conformance checking is that of identifying and describing the differences between the model and the log. A desirable feature of a conformance checking technique is to identify a minimal yet complete set of differences. Existing conformance checking techniques that fulfil this property exhibit limited scalability when confronted to large and complex models and logs. This paper presents two complementary techniques to address these shortcomings. The first technique transforms the model and log into two automata. These automata are compared using an error-correcting synchronized product, computed via an A* that guarantees the resulting automaton captures all differences with a minimal amount of error corrections. The synchronized product is used to extract minimal-length alignments between each trace of the log and the closest corresponding trace of the model. A limitation of the first technique is that as the level of concurrency in the model increases, the size of the automaton of the model grows exponentially, thus hampering scalability. To address this limitation, the paper proposes a second technique wherein the process model is first decomposed into a set of automata, known as S-components, such that the product of these automata is equal to the automaton of the whole process model. An error-correcting product is computed for each S-component separately and the resulting automata are recomposed into a single product automaton capturing all differences without minimality guarantees. An empirical evaluation shows that the proposed techniques outperform state-of-the-art baselines in terms of computational efficiency. Moreover, the decomposition-based technique is optimal for the vast majority of datasets and quasi-optimal for the remaining ones.
△ Less
Submitted 4 March, 2020; v1 submitted 22 October, 2019;
originally announced October 2019.
-
Discovering Process Maps from Event Streams
Authors:
Volodymyr Leno,
Abel Armas-Cervantes,
Marlon Dumas,
Marcello La Rosa,
Fabrizio M. Maggi
Abstract:
Automated process discovery is a class of process mining methods that allow analysts to extract business process models from event logs. Traditional process discovery methods extract process models from a snapshot of an event log stored in its entirety. In some scenarios, however, events keep coming with a high arrival rate to the extent that it is impractical to store the entire event log and to…
▽ More
Automated process discovery is a class of process mining methods that allow analysts to extract business process models from event logs. Traditional process discovery methods extract process models from a snapshot of an event log stored in its entirety. In some scenarios, however, events keep coming with a high arrival rate to the extent that it is impractical to store the entire event log and to continuously re-discover a process model from scratch. Such scenarios require online process discovery approaches. Given an event stream produced by the execution of a business process, the goal of an online process discovery method is to maintain a continuously updated model of the process with a bounded amount of memory while at the same time achieving similar accuracy as offline methods. However, existing online discovery approaches require relatively large amounts of memory to achieve levels of accuracy comparable to that of offline methods. Therefore, this paper proposes an approach that addresses this limitation by map** the problem of online process discovery to that of cache memory management, and applying well-known cache replacement policies to the problem of online process discovery. The approach has been implemented in .NET, experimentally integrated with the Minit process mining tool and comparatively evaluated against an existing baseline using real-life datasets.
△ Less
Submitted 8 April, 2018;
originally announced April 2018.
-
Reduction of Event Structures under History Preserving Bisimulation
Authors:
Abel Armas-Cervantes,
Paolo Baldan,
Luciano Garcia-Bañuelos
Abstract:
Event structures represent concurrent processes in terms of events and dependencies between events modelling behavioural relations like causality and conflict. Since the introduction of prime event structures, many variants of event structures have been proposed with different behavioural relations and, hence, with differences in their expressive power. One of the possible benefits of using a more…
▽ More
Event structures represent concurrent processes in terms of events and dependencies between events modelling behavioural relations like causality and conflict. Since the introduction of prime event structures, many variants of event structures have been proposed with different behavioural relations and, hence, with differences in their expressive power. One of the possible benefits of using a more expressive event structure is that of having a more compact representation for the same behaviour when considering the number of events used in a prime event structure. Therefore, this article addresses the problem of reducing the size of an event structure while preserving behaviour under a well-known notion of equivalence, namely history preserving bisimulation. In particular, we investigate this problem on two generalisations of the prime event structures. The first one, known as asymmetric event structure, relies on a asymmetric form of the conflict relation. The second one, known as flow event structure, supports a form of disjunctive causality. More specifically, we describe the conditions under which a set of events in an event structure can be folded into a single event while preserving the original behaviour. The successive application of this folding operation leads to a minimal size event structure. However, the order on which the folding operation is applied may lead to different minimal size event structures. The latter has a negative implication on the potential use of a minimal size event structure as a canonical representation for behaviour.
△ Less
Submitted 30 June, 2014; v1 submitted 27 March, 2014;
originally announced March 2014.