-
Clustering of timed sequences -- Application to the analysis of care pathways
Authors:
Thomas Guyet,
Pierre Pinson,
Enoal Gesny
Abstract:
Improving the future of healthcare starts by better understanding the current actual practices in hospitals. This motivates the objective of discovering typical care pathways from patient data. Revealing homogeneous groups of care pathways can be achieved through clustering. The difficulty in clustering care pathways, represented by sequences of timestamped events, lies in defining a semantically…
▽ More
Improving the future of healthcare starts by better understanding the current actual practices in hospitals. This motivates the objective of discovering typical care pathways from patient data. Revealing homogeneous groups of care pathways can be achieved through clustering. The difficulty in clustering care pathways, represented by sequences of timestamped events, lies in defining a semantically appropriate metric and clustering algorithms.
In this article, we adapt two methods developed for time series to time sequences: the drop-DTW metric and the DBA approach for the construction of averaged time sequences. These methods are then applied in clustering algorithms to propose original and sound clustering algorithms for timed sequences.
This approach is experimented with and evaluated on synthetic and real use cases.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
SWoTTeD: An Extension of Tensor Decomposition to Temporal Phenoty**
Authors:
Hana Sebia,
Thomas Guyet,
Etienne Audureau
Abstract:
Tensor decomposition has recently been gaining attention in the machine learning community for the analysis of individual traces, such as Electronic Health Records (EHR). However, this task becomes significantly more difficult when the data follows complex temporal patterns. This paper introduces the notion of a temporal phenotype as an arrangement of features over time and it proposes SWoTTeD (Sl…
▽ More
Tensor decomposition has recently been gaining attention in the machine learning community for the analysis of individual traces, such as Electronic Health Records (EHR). However, this task becomes significantly more difficult when the data follows complex temporal patterns. This paper introduces the notion of a temporal phenotype as an arrangement of features over time and it proposes SWoTTeD (Sliding Window for Temporal Tensor Decomposition), a novel method to discover hidden temporal patterns. SWoTTeD integrates several constraints and regularizations to enhance the interpretability of the extracted phenotypes. We validate our proposal using both synthetic and real-world datasets, and we present an original usecase using data from the Greater Paris University Hospital. The results show that SWoTTeD achieves at least as accurate reconstruction as recent state-of-the-art tensor decomposition models, and extracts temporal phenotypes that are meaningful for clinicians.
△ Less
Submitted 28 March, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
A survey on the semantics of sequential patterns with negation
Authors:
Thomas Guyet
Abstract:
A sequential pattern with negation, or negative sequential pattern, takes the form of a sequential pattern for which the negation symbol may be used in front of some of the pattern's itemsets. Intuitively, such a pattern occurs in a sequence if negated itemsets are absent in the sequence. Recent work has shown that different semantics can be attributed to these pattern forms, and that state-of-the…
▽ More
A sequential pattern with negation, or negative sequential pattern, takes the form of a sequential pattern for which the negation symbol may be used in front of some of the pattern's itemsets. Intuitively, such a pattern occurs in a sequence if negated itemsets are absent in the sequence. Recent work has shown that different semantics can be attributed to these pattern forms, and that state-of-the-art algorithms do not extract the same sets of patterns. This raises the important question of the interpretability of sequential pattern with negation. In this study, our focus is on exploring how potential users perceive negation in sequential patterns. Our aim is to determine whether specific semantics are more "intuitive" than others and whether these align with the semantics employed by one or more state-of-the-art algorithms. To achieve this, we designed a questionnaire to reveal the semantics' intuition of each user. This article presents both the design of the questionnaire and an in-depth analysis of the 124 responses obtained. The outcomes indicate that two of the semantics are predominantly intuitive; however, neither of them aligns with the semantics of the primary state-of-the-art algorithms. As a result, we provide recommendations to account for this disparity in the conclusions drawn.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Generating robust counterfactual explanations
Authors:
Victor Guyomard,
Françoise Fessant,
Thomas Guyet,
Tassadit Bouadi,
Alexandre Termier
Abstract:
Counterfactual explanations have become a mainstay of the XAI field. This particularly intuitive statement allows the user to understand what small but necessary changes would have to be made to a given situation in order to change a model prediction. The quality of a counterfactual depends on several criteria: realism, actionability, validity, robustness, etc. In this paper, we are interested in…
▽ More
Counterfactual explanations have become a mainstay of the XAI field. This particularly intuitive statement allows the user to understand what small but necessary changes would have to be made to a given situation in order to change a model prediction. The quality of a counterfactual depends on several criteria: realism, actionability, validity, robustness, etc. In this paper, we are interested in the notion of robustness of a counterfactual. More precisely, we focus on robustness to counterfactual input changes. This form of robustness is particularly challenging as it involves a trade-off between the robustness of the counterfactual and the proximity with the example to explain. We propose a new framework, CROCO, that generates robust counterfactuals while managing effectively this trade-off, and guarantees the user a minimal robustness. An empirical evaluation on tabular datasets confirms the relevance and effectiveness of our approach.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Temporal Disaggregation of the Cumulative Grass Growth
Authors:
Thomas Guyet,
Laurent Spillemaecker,
Simon Malinowski,
Anne-Isabelle Graux
Abstract:
Information on the grass growth over a year is essential for some models simulating the use of this resource to feed animals on pasture or at barn with hay or grass silage. Unfortunately, this information is rarely available. The challenge is to reconstruct grass growth from two sources of information: usual daily climate data (rainfall, radiation, etc.) and cumulative growth over the year. We ha…
▽ More
Information on the grass growth over a year is essential for some models simulating the use of this resource to feed animals on pasture or at barn with hay or grass silage. Unfortunately, this information is rarely available. The challenge is to reconstruct grass growth from two sources of information: usual daily climate data (rainfall, radiation, etc.) and cumulative growth over the year. We have to be able to capture the effect of seasonal climatic events which are known to distort the growth curve within the year. In this paper, we formulate this challenge as a problem of disaggregating the cumulative growth into a time series. To address this problem, our method applies time series forecasting using climate information and grass growth from previous time steps. Several alternatives of the method are proposed and compared experimentally using a database generated from a grassland process-based model. The results show that our method can accurately reconstruct the time series, independently of the use of the cumulative growth information.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
VCNet: A self-explaining model for realistic counterfactual generation
Authors:
Victor Guyomard,
Françoise Fessant,
Thomas Guyet,
Tassadit Bouadi,
Alexandre Termier
Abstract:
Counterfactual explanation is a common class of methods to make local explanations of machine learning decisions. For a given instance, these methods aim to find the smallest modification of feature values that changes the predicted decision made by a machine learning model. One of the challenges of counterfactual explanation is the efficient generation of realistic counterfactuals. To address thi…
▽ More
Counterfactual explanation is a common class of methods to make local explanations of machine learning decisions. For a given instance, these methods aim to find the smallest modification of feature values that changes the predicted decision made by a machine learning model. One of the challenges of counterfactual explanation is the efficient generation of realistic counterfactuals. To address this challenge, we propose VCNet-Variational Counter Net-a model architecture that combines a predictor and a counterfactual generator that are jointly trained, for regression or classification tasks. VCNet is able to both generate predictions, and to generate counterfactual explanations without having to solve another minimisation problem. Our contribution is the generation of counterfactuals that are close to the distribution of the predicted class. This is done by learning a variational autoencoder conditionally to the output of the predictor in a join-training fashion. We present an empirical evaluation on tabular datasets and across several interpretability metrics. The results are competitive with the state-of-the-art method.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Experimental study of time series forecasting methods for groundwater level prediction
Authors:
Michael Franklin Mbouopda,
Thomas Guyet,
Nicolas Labroche,
Abel Henriot
Abstract:
Groundwater level prediction is an applied time series forecasting task with important social impacts to optimize water management as well as preventing some natural disasters: for instance, floods or severe droughts. Machine learning methods have been reported in the literature to achieve this task, but they are only focused on the forecast of the groundwater level at a single location. A global…
▽ More
Groundwater level prediction is an applied time series forecasting task with important social impacts to optimize water management as well as preventing some natural disasters: for instance, floods or severe droughts. Machine learning methods have been reported in the literature to achieve this task, but they are only focused on the forecast of the groundwater level at a single location. A global forecasting method aims at exploiting the groundwater level time series from a wide range of locations to produce predictions at a single place or at several places at a time. Given the recent success of global forecasting methods in prestigious competitions, it is meaningful to assess them on groundwater level prediction and see how they are compared to local methods. In this work, we created a dataset of 1026 groundwater level time series. Each time series is made of daily measurements of groundwater levels and two exogenous variables, rainfall and evapotranspiration. This dataset is made available to the communities for reproducibility and further evaluation. To identify the best configuration to effectively predict groundwater level for the complete set of time series, we compared different predictors including local and global time series forecasting methods. We assessed the impact of exogenous variables. Our result analysis shows that the best predictions are obtained by training a global method on past groundwater levels and rainfall data.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Incremental Mining of Frequent Serial Episodes Considering Multiple Occurrences
Authors:
Thomas Guyet,
Wenbin Zhang,
Albert Bifet
Abstract:
The need to analyze information from streams arises in a variety of applications. One of its fundamental research directions is to mine sequential patterns over data streams. Current studies mine series of items based on the presence of the pattern in transactions but pay no attention to the series of itemsets and their multiple occurrences. The pattern over a window of itemsets stream and their m…
▽ More
The need to analyze information from streams arises in a variety of applications. One of its fundamental research directions is to mine sequential patterns over data streams. Current studies mine series of items based on the presence of the pattern in transactions but pay no attention to the series of itemsets and their multiple occurrences. The pattern over a window of itemsets stream and their multiple occurrences, however, provides additional capability to recognize the essential characteristics of the patterns and the inter-relationships among them that are unidentifiable by the existing presence-based studies. In this paper, we study such a new sequential pattern mining problem and propose a corresponding sequential miner with novel strategies to prune the search space efficiently. Experiments on both real and synthetic data show the utility of our approach.
△ Less
Submitted 9 April, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Semantics of negative sequential patterns
Authors:
Thomas Guyet,
Philippe Besnard
Abstract:
In the field of pattern mining, a negative sequential pattern is specified by means of a sequence consisting of events to occur and of other events, called negative events, to be absent. For instance, containment of the pattern $\langle a\ \neg b\ c\rangle$ arises with an occurrence of a and a subsequent occurrence of c but no occurrence of b in between. This article is to shed light on the ambigu…
▽ More
In the field of pattern mining, a negative sequential pattern is specified by means of a sequence consisting of events to occur and of other events, called negative events, to be absent. For instance, containment of the pattern $\langle a\ \neg b\ c\rangle$ arises with an occurrence of a and a subsequent occurrence of c but no occurrence of b in between. This article is to shed light on the ambiguity of such a seemingly intuitive notation and we identify eight possible semantics for the containment relation between a pattern and a sequence. These semantics are illustrated and formally studied, in particular we propose dominance and equivalence relations between them. Also we prove that support is anti-monotonic for some of these semantics. Some of the results are discussed with the aim of develo** algorithms to extract efficiently frequent negative patterns.
△ Less
Submitted 21 February, 2020; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Day-ahead time series forecasting: application to capacity planning
Authors:
Colin Leverger,
Vincent Lemaire,
Simon Malinowski,
Thomas Guyet,
Laurence Rozé
Abstract:
In the context of capacity planning, forecasting the evolution of informatics servers usage enables companies to better manage their computational resources. We address this problem by collecting key indicator time series and propose to forecast their evolution a day-ahead. Our method assumes that data is structured by a daily seasonality, but also that there is typical evolution of indicators wit…
▽ More
In the context of capacity planning, forecasting the evolution of informatics servers usage enables companies to better manage their computational resources. We address this problem by collecting key indicator time series and propose to forecast their evolution a day-ahead. Our method assumes that data is structured by a daily seasonality, but also that there is typical evolution of indicators within a day. Then, it uses the combination of a clustering algorithm and Markov Models to produce day-ahead forecasts. Our experiments on real datasets show that the data satisfies our assumption and that, in the case study, our method outperforms classical approaches (AR, Holt-Winters).
△ Less
Submitted 6 November, 2018;
originally announced November 2018.
-
NegPSpan: efficient extraction of negative sequential patterns with embedding constraints
Authors:
Thomas Guyet,
René Quiniou
Abstract:
Mining frequent sequential patterns consists in extracting recurrent behaviors, modeled as patterns, in a big sequence dataset. Such patterns inform about which events are frequently observed in sequences, i.e. what does really happen. Sometimes, knowing that some specific event does not happen is more informative than extracting a lot of observed events. Negative sequential patterns (NSP) formula…
▽ More
Mining frequent sequential patterns consists in extracting recurrent behaviors, modeled as patterns, in a big sequence dataset. Such patterns inform about which events are frequently observed in sequences, i.e. what does really happen. Sometimes, knowing that some specific event does not happen is more informative than extracting a lot of observed events. Negative sequential patterns (NSP) formulate recurrent behaviors by patterns containing both observed events and absent events. Few approaches have been proposed to mine such NSPs. In addition, the syntax and semantics of NSPs differ in the different methods which makes it difficult to compare them. This article provides a unified framework for the formulation of the syntax and the semantics of NSPs. Then, we introduce a new algorithm, NegPSpan, that extracts NSPs using a PrefixSpan depth-first scheme and enabling maxgap constraints that other approaches do not take into account. The formal framework allows for highlighting the differences between the proposed approach wrt to the methods from the literature, especially wrt the state of the art approach eNSP. Intensive experiments on synthetic and real datasets show that NegPSpan can extract meaningful NSPs and that it can process bigger datasets than eNSP thanks to significantly lower memory requirements and better computation times.
△ Less
Submitted 25 July, 2018; v1 submitted 4 April, 2018;
originally announced April 2018.
-
Efficiency Analysis of ASP Encodings for Sequential Pattern Mining Tasks
Authors:
Thomas Guyet,
Yves Moinard,
René Quiniou,
Torsten Schaub
Abstract:
This article presents the use of Answer Set Programming (ASP) to mine sequential patterns. ASP is a high-level declarative logic programming paradigm for high level encoding combinatorial and optimization problem solving as well as knowledge representation and reasoning. Thus, ASP is a good candidate for implementing pattern mining with background knowledge, which has been a data mining issue for…
▽ More
This article presents the use of Answer Set Programming (ASP) to mine sequential patterns. ASP is a high-level declarative logic programming paradigm for high level encoding combinatorial and optimization problem solving as well as knowledge representation and reasoning. Thus, ASP is a good candidate for implementing pattern mining with background knowledge, which has been a data mining issue for a long time. We propose encodings of the classical sequential pattern mining tasks within two representations of embeddings (fill-gaps vs skip-gaps) and for various kinds of patterns: frequent, constrained and condensed. We compare the computational performance of these encodings with each other to get a good insight into the efficiency of ASP encodings. The results show that the fill-gaps strategy is better on real problems due to lower memory consumption. Finally, compared to a constraint programming approach (CPSM), another declarative programming paradigm, our proposal showed comparable performance.
△ Less
Submitted 14 November, 2017;
originally announced November 2017.
-
Discriminant chronicles mining: Application to care pathways analytics
Authors:
Yann Dauxais,
Thomas Guyet,
David Gross-Amblard,
André Happe
Abstract:
Pharmaco-epidemiology (PE) is the study of uses and effects of drugs in well defined populations. As medico-administrative databases cover a large part of the population, they have become very interesting to carry PE studies. Such databases provide longitudinal care pathways in real condition containing timestamped care events, especially drug deliveries. Temporal pattern mining becomes a strategi…
▽ More
Pharmaco-epidemiology (PE) is the study of uses and effects of drugs in well defined populations. As medico-administrative databases cover a large part of the population, they have become very interesting to carry PE studies. Such databases provide longitudinal care pathways in real condition containing timestamped care events, especially drug deliveries. Temporal pattern mining becomes a strategic choice to gain valuable insights about drug uses. In this paper we propose DCM, a new discriminant temporal pattern mining algorithm. It extracts chronicle patterns that occur more in a studied population than in a control population. We present results on the identification of possible associations between hospitalizations for seizure and anti-epileptic drug switches in care pathway of epileptic patients.
△ Less
Submitted 11 September, 2017;
originally announced September 2017.
-
Expert Opinion Extraction from a Biomedical Database
Authors:
Ahmed Samet,
Thomas Guyet,
Benjamin Negrevergne,
Tien-Tuan Dao,
Tuan Nha Hoang,
Marie-Christine Ho Ba Tho
Abstract:
In this paper, we tackle the problem of extracting frequent opinions from uncertain databases. We introduce the foundation of an opinion mining approach with the definition of pattern and support measure. The support measure is derived from the commitment definition. A new algorithm called OpMiner that extracts the set of frequent opinions modelled as a mass functions is detailed. Finally, we appl…
▽ More
In this paper, we tackle the problem of extracting frequent opinions from uncertain databases. We introduce the foundation of an opinion mining approach with the definition of pattern and support measure. The support measure is derived from the commitment definition. A new algorithm called OpMiner that extracts the set of frequent opinions modelled as a mass functions is detailed. Finally, we apply our approach on a real-world biomedical database that stores opinions of experts to evaluate the reliability level of biomedical data. Performance analysis showed a better quality patterns for our proposed model in comparison with literature-based methods.
△ Less
Submitted 11 September, 2017;
originally announced September 2017.
-
Mining relevant interval rules
Authors:
Thomas Guyet,
René Quiniou,
Véronique Masson
Abstract:
This article extends the method of Garriga et al. for mining relevant rules to numerical attributes by extracting interval-based pattern rules. We propose an algorithm that extracts such rules from numerical datasets using the interval-pattern approach from Kaytoue et al. This algorithm has been implemented and evaluated on real datasets.
This article extends the method of Garriga et al. for mining relevant rules to numerical attributes by extracting interval-based pattern rules. We propose an algorithm that extracts such rules from numerical datasets using the interval-pattern approach from Kaytoue et al. This algorithm has been implemented and evaluated on real datasets.
△ Less
Submitted 11 September, 2017;
originally announced September 2017.
-
Declarative Sequential Pattern Mining of Care Pathways
Authors:
Thomas Guyet,
André Happe,
Yann Dauxais
Abstract:
Sequential pattern mining algorithms are widely used to explore care pathways database, but they generate a deluge of patterns, mostly redundant or useless. Clinicians need tools to express complex mining queries in order to generate less but more significant patterns. These algorithms are not versatile enough to answer complex clinician queries. This article proposes to apply a declarative patter…
▽ More
Sequential pattern mining algorithms are widely used to explore care pathways database, but they generate a deluge of patterns, mostly redundant or useless. Clinicians need tools to express complex mining queries in order to generate less but more significant patterns. These algorithms are not versatile enough to answer complex clinician queries. This article proposes to apply a declarative pattern mining approach based on Answer Set Programming paradigm. It is exemplified by a pharmaco-epidemiological study investigating the possible association between hospitalization for seizure and antiepileptic drug switch from a french medico-administrative database.
△ Less
Submitted 26 July, 2017;
originally announced July 2017.
-
Dense Bag-of-Temporal-SIFT-Words for Time Series Classification
Authors:
Adeline Bailly,
Simon Malinowski,
Romain Tavenard,
Thomas Guyet,
Laetitia Chapel
Abstract:
Time series classification is an application of particular interest with the increase of data to monitor. Classical techniques for time series classification rely on point-to-point distances. Recently, Bag-of-Words approaches have been used in this context. Words are quantized versions of simple features extracted from sliding windows. The SIFT framework has proved efficient for image classificati…
▽ More
Time series classification is an application of particular interest with the increase of data to monitor. Classical techniques for time series classification rely on point-to-point distances. Recently, Bag-of-Words approaches have been used in this context. Words are quantized versions of simple features extracted from sliding windows. The SIFT framework has proved efficient for image classification. In this paper, we design a time series classification scheme that builds on the SIFT framework adapted to time series to feed a Bag-of-Words. We then refine our method by studying the impact of normalized Bag-of-Words, as well as densely extract point descriptors. Proposed adjustements achieve better performance. The evaluation shows that our method outperforms classical techniques in terms of classification.
△ Less
Submitted 13 January, 2016; v1 submitted 8 January, 2016;
originally announced January 2016.
-
Using Answer Set Programming for pattern mining
Authors:
Thomas Guyet,
Yves Moinard,
René Quiniou
Abstract:
Serial pattern mining consists in extracting the frequent sequential patterns from a unique sequence of itemsets. This paper explores the ability of a declarative language, such as Answer Set Programming (ASP), to solve this issue efficiently. We propose several ASP implementations of the frequent sequential pattern mining task: a non-incremental and an incremental resolution. The results show tha…
▽ More
Serial pattern mining consists in extracting the frequent sequential patterns from a unique sequence of itemsets. This paper explores the ability of a declarative language, such as Answer Set Programming (ASP), to solve this issue efficiently. We propose several ASP implementations of the frequent sequential pattern mining task: a non-incremental and an incremental resolution. The results show that the incremental resolution is more efficient than the non-incremental one, but both ASP programs are less efficient than dedicated algorithms. Nonetheless, this approach can be seen as a first step toward a generic framework for sequential pattern mining with constraints.
△ Less
Submitted 27 September, 2014;
originally announced September 2014.
-
Self-adaptive web intrusion detection system
Authors:
Thomas Guyet,
René Quiniou,
Wei Wang,
Marie-Odile Cordier
Abstract:
The evolution of the web server contents and the emergence of new kinds of intrusions make necessary the adaptation of the intrusion detection systems (IDS). Nowadays, the adaptation of the IDS requires manual -- tedious and unreactive -- actions from system administrators. In this paper, we present a self-adaptive intrusion detection system which relies on a set of local model-based diagnosers.…
▽ More
The evolution of the web server contents and the emergence of new kinds of intrusions make necessary the adaptation of the intrusion detection systems (IDS). Nowadays, the adaptation of the IDS requires manual -- tedious and unreactive -- actions from system administrators. In this paper, we present a self-adaptive intrusion detection system which relies on a set of local model-based diagnosers. The redundancy of diagnoses is exploited, online, by a meta-diagnoser to check the consistency of computed partial diagnoses, and to trigger the adaptation of defective diagnoser models (or signatures) in case of inconsistency. This system is applied to the intrusion detection from a stream of HTTP requests. Our results show that our system 1) detects intrusion occurrences sensitively and precisely, 2) accurately self-adapts diagnoser model, thus improving its detection accuracy.
△ Less
Submitted 22 July, 2009;
originally announced July 2009.