Search | arXiv e-print repository

Challenges of Anomaly Detection in the Object-Centric Setting: Dimensions and the Role of Domain Knowledge

Authors: Alessandro Berti, Urszula Jessen, Wil M. P. van der Aalst, Dirk Fahland

Abstract: Object-centric event logs, allowing events related to different objects of different object types, represent naturally the execution of business processes, such as ERP (O2C and P2P) and CRM. However, modeling such complex information requires novel process mining techniques and might result in complex sets of constraints. Object-centric anomaly detection exploits both the lifecycle and the interac… ▽ More Object-centric event logs, allowing events related to different objects of different object types, represent naturally the execution of business processes, such as ERP (O2C and P2P) and CRM. However, modeling such complex information requires novel process mining techniques and might result in complex sets of constraints. Object-centric anomaly detection exploits both the lifecycle and the interactions between the different objects. Therefore, anomalous patterns are proposed to the user without requiring the definition of object-centric process models. This paper proposes different methodologies for object-centric anomaly detection and discusses the role of domain knowledge for these methodologies. We discuss the advantages and limitations of Large Language Models (LLMs) in the provision of such domain knowledge. Following our experience in a real-life P2P process, we also discuss the role of algorithms (dimensionality reduction+anomaly detection), suggest some pre-processing steps, and discuss the role of feature propagation. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2401.12846 [pdf, other]

How well can large language models explain business processes?

Authors: Dirk Fahland, Fabiana Fournier, Lior Limonad, Inna Skarbovsky, Ava J. E. Swevels

Abstract: Large Language Models (LLMs) are likely to play a prominent role in future AI-augmented business process management systems (ABPMSs) catering functionalities across all system lifecycle stages. One such system's functionality is Situation-Aware eXplainability (SAX), which relates to generating causally sound and yet human-interpretable explanations that take into account the process context in whi… ▽ More Large Language Models (LLMs) are likely to play a prominent role in future AI-augmented business process management systems (ABPMSs) catering functionalities across all system lifecycle stages. One such system's functionality is Situation-Aware eXplainability (SAX), which relates to generating causally sound and yet human-interpretable explanations that take into account the process context in which the explained condition occurred. In this paper, we present the SAX4BPM framework developed to generate SAX explanations. The SAX4BPM suite consists of a set of services and a central knowledge repository. The functionality of these services is to elicit the various knowledge ingredients that underlie SAX explanations. A key innovative component among these ingredients is the causal process execution view. In this work, we integrate the framework with an LLM to leverage its power to synthesize the various input ingredients for the sake of improved SAX explanations. Since the use of LLMs for SAX is also accompanied by a certain degree of doubt related to its capacity to adequately fulfill SAX along with its tendency for hallucination and lack of inherent capacity to reason, we pursued a methodological evaluation of the quality of the generated explanations. To this aim, we developed a designated scale and conducted a rigorous user study. Our findings show that the input presented to the LLMs aided with the guard-railing of its performance, yielding SAX explanations having better-perceived fidelity. This improvement is moderated by the perception of trust and curiosity. More so, this improvement comes at the cost of the perceived interpretability of the explanation. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 39 pages, 12 figures

MSC Class: 68T01

arXiv:2309.01571 [pdf, other]

The Interplay Between High-Level Problems and The Process Instances That Give Rise To Them

Authors: Bianka Bakullari, Jules van Thoor, Dirk Fahland, Wil M. P. van der Aalst

Abstract: Business processes may face a variety of problems due to the number of tasks that need to be handled within short time periods, resources' workload and working patterns, as well as bottlenecks. These problems may arise locally and be short-lived, but as the process is forced to operate outside its standard capacity, the effect on the underlying process instances can be costly. We use the term high… ▽ More Business processes may face a variety of problems due to the number of tasks that need to be handled within short time periods, resources' workload and working patterns, as well as bottlenecks. These problems may arise locally and be short-lived, but as the process is forced to operate outside its standard capacity, the effect on the underlying process instances can be costly. We use the term high-level behavior to cover all process behavior which can not be captured in terms of the individual process instances. %Whenever such behavior emerges, we call the cases which are involved in it participating cases. The natural question arises as to how the characteristics of cases relate to the high-level behavior they give rise to. In this work, we first show how to detect and correlate observations of high-level problems, as well as determine the corresponding (non-)participating cases. Then we show how to assess the connection between any case-level characteristic and any given detected sequence of high-level problems. Applying our method on the event data of a real loan application process revealed which specific combinations of delays, batching and busy resources at which particular parts of the process correlate with an application's duration and chance of a positive outcome. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2307.09909 [pdf, other]

Chit-Chat or Deep Talk: Prompt Engineering for Process Mining

Authors: Urszula Jessen, Michal Sroka, Dirk Fahland

Abstract: This research investigates the application of Large Language Models (LLMs) to augment conversational agents in process mining, aiming to tackle its inherent complexity and diverse skill requirements. While LLM advancements present novel opportunities for conversational process mining, generating efficient outputs is still a hurdle. We propose an innovative approach that amend many issues in existi… ▽ More This research investigates the application of Large Language Models (LLMs) to augment conversational agents in process mining, aiming to tackle its inherent complexity and diverse skill requirements. While LLM advancements present novel opportunities for conversational process mining, generating efficient outputs is still a hurdle. We propose an innovative approach that amend many issues in existing solutions, informed by prior research on Natural Language Processing (NLP) for conversational agents. Leveraging LLMs, our framework improves both accessibility and agent performance, as demonstrated by experiments on public question and data sets. Our research sets the stage for future explorations into LLMs' role in process mining and concludes with propositions for enhancing LLM memory, implementing real-time user testing, and examining diverse data sets. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 11 pages, 3 figures

arXiv:2211.04338 [pdf, other]

Extracting and Pre-Processing Event Logs

Authors: Dirk Fahland

Abstract: Event data is the basis for all process mining analysis. Most process mining techniques assume their input to be an event log. However, event data is rarely recorded in an event log format, but has to be extracted from raw data. Event log extraction itself is an act of modeling as the analyst has to consciously choose which features of the raw data are used for describing which behavior of which e… ▽ More Event data is the basis for all process mining analysis. Most process mining techniques assume their input to be an event log. However, event data is rarely recorded in an event log format, but has to be extracted from raw data. Event log extraction itself is an act of modeling as the analyst has to consciously choose which features of the raw data are used for describing which behavior of which entities. Being aware of these choices and subtle but important differences in concepts such as trace, case, activity, event, table, and log is crucial for mastering advanced process mining analyses. This text provides fundamental concepts and formalizations and discusses design decisions in event log extraction from a raw event table and for event log pre-processing. It is intended as study material for an advanced lecture in a process mining course. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: This text is intended as study material for an advanced lecture in a process mining course

arXiv:2201.12855 [pdf, ps, other]

doi 10.1145/3576047

AI-Augmented Business Process Management Systems: A Research Manifesto

Authors: Marlon Dumas, Fabiana Fournier, Lior Limonad, Andrea Marrella, Marco Montali, Jana-Rebecca Rehse, Rafael Accorsi, Diego Calvanese, Giuseppe De Giacomo, Dirk Fahland, Avigdor Gal, Marcello La Rosa, Hagen Völzer, Ingo Weber

Abstract: AI-Augmented Business Process Management Systems (ABPMSs) are an emerging class of process-aware information systems, empowered by trustworthy AI technology. An ABPMS enhances the execution of business processes with the aim of making these processes more adaptable, proactive, explainable, and context-sensitive. This manifesto presents a vision for ABPMSs and discusses research challenges that nee… ▽ More AI-Augmented Business Process Management Systems (ABPMSs) are an emerging class of process-aware information systems, empowered by trustworthy AI technology. An ABPMS enhances the execution of business processes with the aim of making these processes more adaptable, proactive, explainable, and context-sensitive. This manifesto presents a vision for ABPMSs and discusses research challenges that need to be surmounted to realize this vision. To this end, we define the concept of ABPMS, we outline the lifecycle of processes within an ABPMS, we discuss core characteristics of an ABPMS, and we derive a set of challenges to realize systems with these characteristics. △ Less

Submitted 4 November, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

Comments: 19 pages, 1 figure

Journal ref: ACM Transactions on Management Information Systems, 31 January 2023 Volume 14, Issue 1, Article No.: 11, pp 1-19

arXiv:2109.06288 [pdf, other]

Striking a new balance in accuracy and simplicity with the Probabilistic Inductive Miner

Authors: Dennis Brons, Roeland Scheepens, Dirk Fahland

Abstract: Numerous process discovery techniques exist for generating process models that describe recorded executions of business processes. The models are meant to generalize executions into human-understandable modeling patterns, notably parallelism, and enable rigorous analysis of process deviations. However, well-defined models with parallelism returned by existing techniques are often too complex or ge… ▽ More Numerous process discovery techniques exist for generating process models that describe recorded executions of business processes. The models are meant to generalize executions into human-understandable modeling patterns, notably parallelism, and enable rigorous analysis of process deviations. However, well-defined models with parallelism returned by existing techniques are often too complex or generalize the recorded behavior too strongly to be trusted in a practical business context. We bridge this gap by introducing the Probabilistic Inductive Miner (PIM) based on the Inductive Miner framework. PIM compares in each step the most probable operators and structures based on frequency information in the data, which results in block-structured models with significantly higher accuracy. All design choices in PIM are based on business context requirements obtained through a user study with industrial process mining experts. PIM is evaluated quantitatively and in an novel kind of empirical study comparing users' trust in discovered model structures. The evaluations show that PIM strikes a unique trade-off between model accuracy and model complexity, that is conclusively preferred by users over all state-of-the-art process discovery methods. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: accepted at IEEE International Conference on Process Mining (ICPM) 2021, submitted version

arXiv:2109.05835 [pdf, other]

Process Discovery Using Graph Neural Networks

Authors: Dominique Sommers, Vlado Menkovski, Dirk Fahland

Abstract: Automatically discovering a process model from an event log is the prime problem in process mining. This task is so far approached as an unsupervised learning problem through graph synthesis algorithms. Algorithmic design decisions and heuristics allow for efficiently finding models in a reduced search space. However, design decisions and heuristics are derived from assumptions about how a given b… ▽ More Automatically discovering a process model from an event log is the prime problem in process mining. This task is so far approached as an unsupervised learning problem through graph synthesis algorithms. Algorithmic design decisions and heuristics allow for efficiently finding models in a reduced search space. However, design decisions and heuristics are derived from assumptions about how a given behavioral description - an event log - translates into a process model and were not learned from actual models which introduce biases in the solutions. In this paper, we explore the problem of supervised learning of a process discovery technique D. We introduce a technique for training an ML-based model D using graph convolutional neural networks; D translates a given input event log into a sound Petri net. We show that training D on synthetically generated pairs of input logs and output models allows D to translate previously unseen synthetic and several real-life event logs into sound, arbitrarily structured models of comparable accuracy and simplicity as existing state of the art techniques for discovering imperative process models. We analyze the limitations of the proposed technique and outline alleys for future work. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: accepted at IEEE International Conference on Process Mining (ICPM) 2021, submitted version

arXiv:2103.00167 [pdf, other]

Inferring Unobserved Events in Systems With Shared Resources and Queues

Authors: Dirk Fahland, Vadim Denisov, Wil. M. P. van der Aalst

Abstract: To identify the causes of performance problems or to predict process behavior, it is essential to have correct and complete event data. This is particularly important for distributed systems with shared resources, e.g., one case can block another case competing for the same machine, leading to inter-case dependencies in performance. However, due to a variety of reasons, real-life systems often rec… ▽ More To identify the causes of performance problems or to predict process behavior, it is essential to have correct and complete event data. This is particularly important for distributed systems with shared resources, e.g., one case can block another case competing for the same machine, leading to inter-case dependencies in performance. However, due to a variety of reasons, real-life systems often record only a subset of all events taking place. To understand and analyze the behavior and performance of processes with shared resources, we aim to reconstruct bounds for timestamps of events in a case that must have happened but were not recorded by inference over events in other cases in the system. We formulate and solve the problem by systematically introducing multi-entity concepts in event logs and process models. We introduce a partial-order based model of a multi-entity event log and a corresponding compositional model for multi-entity processes. We define PQR-systems as a special class of multi-entity processes with shared resources and queues. We then study the problem of inferring from an incomplete event log unobserved events and their timestamps that are globally consistent with a PQR-system. We solve the problem by reconstructing unobserved traces of resources and queues according to the PQR-model and derive bounds for their timestamps using a linear program. While the problem is illustrated for material handling systems like baggage handling systems in airports, the approach can be applied to other settings where recording is incomplete. The ideas have been implemented in ProM and were evaluated using both synthetic and real-life event logs. △ Less

Submitted 9 December, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

Comments: Final formatted version at Fundamenta Informatica

Journal ref: Fundamenta Informaticae, Volume 183, Issues 3-4: Petri Nets 2020 (December 23, 2021) fi:7232

arXiv:2005.14552 [pdf, other]

Multi-Dimensional Event Data in Graph Databases

Authors: Stefan Esser, Dirk Fahland

Abstract: Process event data is usually stored either in a sequential process event log or in a relational database. While the sequential, single-dimensional nature of event logs aids querying for (sub)sequences of events based on temporal relations such as "directly/eventually-follows", it does not support querying multi-dimensional event data of multiple related entities. Relational databases allow storin… ▽ More Process event data is usually stored either in a sequential process event log or in a relational database. While the sequential, single-dimensional nature of event logs aids querying for (sub)sequences of events based on temporal relations such as "directly/eventually-follows", it does not support querying multi-dimensional event data of multiple related entities. Relational databases allow storing multi-dimensional event data but existing query languages do not support querying for sequences or paths of events in terms of temporal relations. In this paper, we propose a general data model for multi-dimensional event data based on labeled property graphs that allows storing structural and temporal relations in a single, integrated graph-based data structure in a systematic way. We provide semantics for all concepts of our data model, and generic queries for modeling event data over multiple entities that interact synchronously and asynchronously . The queries allow for efficiently converting large real-life event data sets into our data model and we provide 5 converted data sets for further research. We show that typical and advanced queries for retrieving and aggregating such multidimensional event data can be formulated and executed efficiently in the existing query language Cypher, giving rise to several new research questions. Specifically aggregation queries on our data model enable process mining over multiple interrelated entities using off-the-shelf technology. △ Less

Submitted 3 April, 2021; v1 submitted 29 May, 2020; originally announced May 2020.

Comments: Accepted at Journal of Data Semantics. Final reviewed manuscript

arXiv:1910.09767 [pdf, other]

Scalable Alignment of Process Models and Event Logs: An Approach Based on Automata and S-Components

Authors: Daniel Reißner, Abel Armas-Cervantes, Raffaele Conforti, Marlon Dumas, Dirk Fahland, Marcello La Rosa

Abstract: Given a model of the expected behavior of a business process and an event log recording its observed behavior, the problem of business process conformance checking is that of identifying and describing the differences between the model and the log. A desirable feature of a conformance checking technique is to identify a minimal yet complete set of differences. Existing conformance checking techniq… ▽ More Given a model of the expected behavior of a business process and an event log recording its observed behavior, the problem of business process conformance checking is that of identifying and describing the differences between the model and the log. A desirable feature of a conformance checking technique is to identify a minimal yet complete set of differences. Existing conformance checking techniques that fulfil this property exhibit limited scalability when confronted to large and complex models and logs. This paper presents two complementary techniques to address these shortcomings. The first technique transforms the model and log into two automata. These automata are compared using an error-correcting synchronized product, computed via an A* that guarantees the resulting automaton captures all differences with a minimal amount of error corrections. The synchronized product is used to extract minimal-length alignments between each trace of the log and the closest corresponding trace of the model. A limitation of the first technique is that as the level of concurrency in the model increases, the size of the automaton of the model grows exponentially, thus hampering scalability. To address this limitation, the paper proposes a second technique wherein the process model is first decomposed into a set of automata, known as S-components, such that the product of these automata is equal to the automaton of the whole process model. An error-correcting product is computed for each S-component separately and the resulting automata are recomposed into a single product automaton capturing all differences without minimality guarantees. An empirical evaluation shows that the proposed techniques outperform state-of-the-art baselines in terms of computational efficiency. Moreover, the decomposition-based technique is optimal for the vast majority of datasets and quasi-optimal for the remaining ones. △ Less

Submitted 4 March, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

arXiv:1705.03303 [pdf, other]

doi 10.1016/j.ipl.2018.01.013

The Imprecisions of Precision Measures in Process Mining

Authors: Niek Tax, Xixi Lu, Natalia Sidorova, Dirk Fahland, Wil M. P. van der Aalst

Abstract: In process mining, precision measures are used to quantify how much a process model overapproximates the behavior seen in an event log. Although several measures have been proposed throughout the years, no research has been done to validate whether these measures achieve the intended aim of quantifying over-approximation in a consistent way for all models and logs. This paper fills this gap by pos… ▽ More In process mining, precision measures are used to quantify how much a process model overapproximates the behavior seen in an event log. Although several measures have been proposed throughout the years, no research has been done to validate whether these measures achieve the intended aim of quantifying over-approximation in a consistent way for all models and logs. This paper fills this gap by postulating a number of axioms for quantifying precision consistently for any log and any model. Further, we show through counter-examples that none of the existing measures consistently quantifies precision. △ Less

Submitted 16 May, 2017; v1 submitted 3 May, 2017; originally announced May 2017.

Journal ref: Information Processing Letters, 135 (2018), 1-8

arXiv:1511.04057 [pdf, other]

doi 10.1007/978-3-642-31072-0_11

Modeling Styles in Business Process Modeling

Authors: Jakob **gera, Pnina Soffer, Stefan Zugal, Barbara Weber, Matthias Weidlich, Dirk Fahland, Hajo A. Reijers, Jan Mendling

Abstract: Research on quality issues of business process models has recently begun to explore the process of creating process models. As a consequence, the question arises whether different ways of creating process models exist. In this vein, we observed 115 students engaged in the act of modeling, recording all their interactions with the modeling environment using a specialized tool. The recordings of pro… ▽ More Research on quality issues of business process models has recently begun to explore the process of creating process models. As a consequence, the question arises whether different ways of creating process models exist. In this vein, we observed 115 students engaged in the act of modeling, recording all their interactions with the modeling environment using a specialized tool. The recordings of process modeling were subsequently clustered. Results presented in this paper suggest the existence of three distinct modeling styles, exhibiting significantly different characteristics. We believe that this finding constitutes another building block toward a more comprehensive understanding of the process of process modeling that will ultimately enable us to support modelers in creating better business process models. △ Less

Submitted 11 November, 2015; originally announced November 2015.

Journal ref: Proc. BPMDS'12, pp. 151-166, 2012

arXiv:1511.04052 [pdf]

doi 10.1007/978-3-642-32885-5_3

Tying Process Model Quality to the Modeling Process: The Impact of Structuring, Movement, and Speed

Authors: Jan Claes, Irene Vanderfeesten, Hajo A. Reijers, Jakob **gera, Matthias Weidlich, Stefan Zugal, Dirk Fahland, Barbara Weber, Jan Mendling, Geert Poels

Abstract: In an investigation into the process of process modeling, we examined how modeling behavior relates to the quality of the process model that emerges from that. Specifically, we considered whether (i) a modeler's structured modeling style, (ii) the frequency of moving existing objects over the modeling canvas, and (iii) the overall modeling speed is in any way connected to the ease with which the r… ▽ More In an investigation into the process of process modeling, we examined how modeling behavior relates to the quality of the process model that emerges from that. Specifically, we considered whether (i) a modeler's structured modeling style, (ii) the frequency of moving existing objects over the modeling canvas, and (iii) the overall modeling speed is in any way connected to the ease with which the resulting process model can be understood. In this paper, we describe the exploratory study to build these three conjectures, clarify the experimental set-up and infrastructure that was used to collect data, and explain the used metrics for the various concepts to test the conjectures empirically. We discuss various implications for research and practice from the conjectures, all of which were confirmed by the experiment. △ Less

Submitted 11 November, 2015; originally announced November 2015.

arXiv:1303.2554 [pdf, other]

Artifact Lifecycle Discovery

Authors: Viara Popova, Dirk Fahland, Marlon Dumas

Abstract: Artifact-centric modeling is a promising approach for modeling business processes based on the so-called business artifacts - key entities driving the company's operations and whose lifecycles define the overall business process. While artifact-centric modeling shows significant advantages, the overwhelming majority of existing process mining methods cannot be applied (directly) as they are tailor… ▽ More Artifact-centric modeling is a promising approach for modeling business processes based on the so-called business artifacts - key entities driving the company's operations and whose lifecycles define the overall business process. While artifact-centric modeling shows significant advantages, the overwhelming majority of existing process mining methods cannot be applied (directly) as they are tailored to discover monolithic process models. This paper addresses the problem by proposing a chain of methods that can be applied to discover artifact lifecycle models in Guard-Stage-Milestone notation. We decompose the problem in such a way that a wide range of existing (non-artifact-centric) process discovery and analysis methods can be reused in a flexible manner. The methods presented in this paper are implemented as software plug-ins for ProM, a generic open-source framework and architecture for implementing process mining tools. △ Less

Submitted 11 March, 2013; originally announced March 2013.

arXiv:1108.2384 [pdf, other]

Maximal Structuring of Acyclic Process Models

Authors: Artem Polyvyanyy, Luciano García-Bañuelos, Dirk Fahland, Mathias Weske

Abstract: This paper contributes to the solution of the problem of transforming a process model with an arbitrary topology into an equivalent structured process model. In particular, this paper addresses the subclass of process models that have no equivalent well-structured representation but which, nevertheless, can be partially structured into their maximally-structured representation. The structuring is… ▽ More This paper contributes to the solution of the problem of transforming a process model with an arbitrary topology into an equivalent structured process model. In particular, this paper addresses the subclass of process models that have no equivalent well-structured representation but which, nevertheless, can be partially structured into their maximally-structured representation. The structuring is performed under a behavioral equivalence notion that preserves observed concurrency of tasks in equivalent process models. The paper gives a full characterization of the subclass of acyclic process models that have no equivalent well-structured representation but do have an equivalent maximally-structured one, as well as proposes a complete structuring method. △ Less

Submitted 11 August, 2011; originally announced August 2011.

Showing 1–16 of 16 results for author: Fahland, D