-
Anomaly Detection for Bivariate Signals
Authors:
Marie Cottrell,
Cynthia Faure,
Jérôme Lacaille,
Madalina Olteanu
Abstract:
The anomaly detection problem for univariate or multivariate time series is a critical question in many practical applications as industrial processes control, biological measures, engine monitoring, supervision of all kinds of behavior. In this paper we propose a simple and empirical approach to detect anomalies in the behavior of multivariate time series. The approach is based on the empirical e…
▽ More
The anomaly detection problem for univariate or multivariate time series is a critical question in many practical applications as industrial processes control, biological measures, engine monitoring, supervision of all kinds of behavior. In this paper we propose a simple and empirical approach to detect anomalies in the behavior of multivariate time series. The approach is based on the empirical estimation of the conditional quantiles of the data, which provides upper and lower bounds for the confidence tubes. The method is tested on artificial data and its effectiveness has been proven in a real framework such as that of the monitoring of aircraft engines.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Multidimensional Urban Segregation - Toward A Neural Network Measure
Authors:
Madalina Olteanu,
Aurélien Hazan,
Marie Cottrell,
Julien Randon-Furling
Abstract:
We introduce a multidimensional, neural-network approach to reveal and measure urban segregation phenomena, based on the Self-Organizing Map algorithm (SOM). The multidimensionality of SOM allows one to apprehend a large number of variables simultaneously, defined on census or other types of statistical blocks, and to perform clustering along them. Levels of segregation are then measured through c…
▽ More
We introduce a multidimensional, neural-network approach to reveal and measure urban segregation phenomena, based on the Self-Organizing Map algorithm (SOM). The multidimensionality of SOM allows one to apprehend a large number of variables simultaneously, defined on census or other types of statistical blocks, and to perform clustering along them. Levels of segregation are then measured through correlations between distances on the neural network and distances on the actual geographical map. Further, the stochasticity of SOM enables one to quantify levels of heterogeneity across census blocks. We illustrate this new method on data available for the city of Paris.
△ Less
Submitted 5 June, 2018; v1 submitted 9 May, 2017;
originally announced May 2017.
-
Anomaly Detection Based on Confidence Intervals Using SOM with an Application to Health Monitoring
Authors:
Anastasios Bellas,
Charles Bouveyron,
Marie Cottrell,
Jerome Lacaille
Abstract:
We develop an application of SOM for the task of anomaly detection and visualization. To remove the effect of exogenous independent variables, we use a correction model which is more accurate than the usual one, since we apply different linear models in each cluster of context. We do not assume any particular probability distribution of the data and the detection method is based on the distance of…
▽ More
We develop an application of SOM for the task of anomaly detection and visualization. To remove the effect of exogenous independent variables, we use a correction model which is more accurate than the usual one, since we apply different linear models in each cluster of context. We do not assume any particular probability distribution of the data and the detection method is based on the distance of new data to the Kohonen map learned with corrected healthy data. We apply the proposed method to the detection of aircraft engine anomalies.
△ Less
Submitted 30 June, 2015;
originally announced August 2015.
-
Analysis of Professional Trajectories using Disconnected Self-Organizing Maps
Authors:
Etienne Côme,
Marie Cottrell,
Patrice Gaubert
Abstract:
In this paper we address an important economic question. Is there, as mainstream economic theory asserts it, an homogeneous labor market with mechanisms which govern supply and demand for work, producing an equilibrium with its remarkable properties? Using the Panel Study of Income Dynamics (PSID) collected on the period 1984-2003, we study the situations of American workers with respect to employ…
▽ More
In this paper we address an important economic question. Is there, as mainstream economic theory asserts it, an homogeneous labor market with mechanisms which govern supply and demand for work, producing an equilibrium with its remarkable properties? Using the Panel Study of Income Dynamics (PSID) collected on the period 1984-2003, we study the situations of American workers with respect to employment. The data include all heads of household (men or women) as well as the partners who are on the labor market, working or not. They are extracted from the complete survey and we compute a few relevant features which characterize the worker's situations. To perform this analysis, we suggest using a Self-Organizing Map (SOM, Kohonen algorithm) with specific structure based on planar graphs, with disconnected components (called D-SOM), especially interesting for clustering. We compare the results to those obtained with a classical SOM grid and a star-shaped map (called SOS). Each component of D-SOM takes the form of a string and corresponds to an organized cluster. From this clustering, we study the trajectories of the individuals among the classes by using the transition probability matrices for each period and the corresponding stationary distributions. As a matter of fact, we find clear evidence of heterogeneous parts, each one with high homo-geneity, representing situations well identified in terms of activity and wage levels and in degree of stability in the workplace. These results and their interpretation in economic terms contribute to the debate about flexibility which is commonly seen as a way to obtain a better level of equilibrium on the labor market.
△ Less
Submitted 30 June, 2015;
originally announced July 2015.
-
How to improve robustness in Kohonen maps and display additional information in Factorial Analysis: application to text mining
Authors:
Nicolas Bourgeois,
Marie Cottrell,
Benjamin Déruelle,
Stéphane Lamassé,
Patrick Letrémy
Abstract:
This article is an extended version of a paper presented in the WSOM'2012 conference [1]. We display a combination of factorial projections, SOM algorithm and graph techniques applied to a text mining problem. The corpus contains 8 medieval manuscripts which were used to teach arithmetic techniques to merchants. Among the techniques for Data Analysis, those used for Lexicometry (such as Factorial…
▽ More
This article is an extended version of a paper presented in the WSOM'2012 conference [1]. We display a combination of factorial projections, SOM algorithm and graph techniques applied to a text mining problem. The corpus contains 8 medieval manuscripts which were used to teach arithmetic techniques to merchants. Among the techniques for Data Analysis, those used for Lexicometry (such as Factorial Analysis) highlight the discrepancies between manuscripts. The reason for this is that they focus on the deviation from the independence between words and manuscripts. Still, we also want to discover and characterize the common vocabulary among the whole corpus. Using the properties of stochastic Kohonen maps, which define neighborhood between inputs in a non-deterministic way, we highlight the words which seem to play a special role in the vocabulary. We call them fickle and use them to improve both Kohonen map robustness and significance of FCA visualization. Finally we use graph algorithmic to exploit this fickleness for classification of words.
△ Less
Submitted 25 June, 2015;
originally announced June 2015.
-
Search Strategies for Binary Feature Selection for a Naive Bayes Classifier
Authors:
Tsirizo Rabenoro,
Jérôme Lacaille,
Marie Cottrell,
Fabrice Rossi
Abstract:
We compare in this paper several feature selection methods for the Naive Bayes Classifier (NBC) when the data under study are described by a large number of redundant binary indicators. Wrapper approaches guided by the NBC estimation of the classification error probability out-perform filter approaches while retaining a reasonable computational cost.
We compare in this paper several feature selection methods for the Naive Bayes Classifier (NBC) when the data under study are described by a large number of redundant binary indicators. Wrapper approaches guided by the NBC estimation of the classification error probability out-perform filter approaches while retaining a reasonable computational cost.
△ Less
Submitted 12 June, 2015;
originally announced June 2015.
-
Interpretable Aircraft Engine Diagnostic via Expert Indicator Aggregation
Authors:
Tsirizo Rabenoro,
Jérôme Lacaille,
Marie Cottrell,
Fabrice Rossi
Abstract:
Detecting early signs of failures (anomalies) in complex systems is one of the main goal of preventive maintenance. It allows in particular to avoid actual failures by (re)scheduling maintenance operations in a way that optimizes maintenance costs. Aircraft engine health monitoring is one representative example of a field in which anomaly detection is crucial. Manufacturers collect large amount of…
▽ More
Detecting early signs of failures (anomalies) in complex systems is one of the main goal of preventive maintenance. It allows in particular to avoid actual failures by (re)scheduling maintenance operations in a way that optimizes maintenance costs. Aircraft engine health monitoring is one representative example of a field in which anomaly detection is crucial. Manufacturers collect large amount of engine related data during flights which are used, among other applications, to detect anomalies. This article introduces and studies a generic methodology that allows one to build automatic early signs of anomaly detection in a way that builds upon human expertise and that remains understandable by human operators who make the final maintenance decision. The main idea of the method is to generate a very large number of binary indicators based on parametric anomaly scores designed by experts, complemented by simple aggregations of those scores. A feature selection method is used to keep only the most discriminant indicators which are used as inputs of a Naive Bayes classifier. This give an interpretable classifier based on interpretable anomaly detectors whose parameters have been optimized indirectly by the selection process. The proposed methodology is evaluated on simulated data designed to reproduce some of the anomaly types observed in real world engines.
△ Less
Submitted 18 March, 2015;
originally announced March 2015.
-
Anomaly Detection Based on Indicators Aggregation
Authors:
Tsirizo Rabenoro,
Jérôme Lacaille,
Marie Cottrell,
Fabrice Rossi
Abstract:
Automatic anomaly detection is a major issue in various areas. Beyond mere detection, the identification of the source of the problem that produced the anomaly is also essential. This is particularly the case in aircraft engine health monitoring where detecting early signs of failure (anomalies) and hel** the engine owner to implement efficiently the adapted maintenance operations (fixing the so…
▽ More
Automatic anomaly detection is a major issue in various areas. Beyond mere detection, the identification of the source of the problem that produced the anomaly is also essential. This is particularly the case in aircraft engine health monitoring where detecting early signs of failure (anomalies) and hel** the engine owner to implement efficiently the adapted maintenance operations (fixing the source of the anomaly) are of crucial importance to reduce the costs attached to unscheduled maintenance. This paper introduces a general methodology that aims at classifying monitoring signals into normal ones and several classes of abnormal ones. The main idea is to leverage expert knowledge by generating a very large number of binary indicators. Each indicator corresponds to a fully parametrized anomaly detector built from parametric anomaly scores designed by experts. A feature selection method is used to keep only the most discriminant indicators which are used at inputs of a Naive Bayes classifier. This give an interpretable classifier based on interpretable anomaly detectors whose parameters have been optimized indirectly by the selection process. The proposed methodology is evaluated on simulated data designed to reproduce some of the anomaly types observed in real world engines.
△ Less
Submitted 16 September, 2014;
originally announced September 2014.
-
A Methodology for the Diagnostic of Aircraft Engine Based on Indicators Aggregation
Authors:
Tsirizo Rabenoro,
Jérôme Lacaille,
Marie Cottrell,
Fabrice Rossi
Abstract:
Aircraft engine manufacturers collect large amount of engine related data during flights. These data are used to detect anomalies in the engines in order to help companies optimize their maintenance costs. This article introduces and studies a generic methodology that allows one to build automatic early signs of anomaly detection in a way that is understandable by human operators who make the fina…
▽ More
Aircraft engine manufacturers collect large amount of engine related data during flights. These data are used to detect anomalies in the engines in order to help companies optimize their maintenance costs. This article introduces and studies a generic methodology that allows one to build automatic early signs of anomaly detection in a way that is understandable by human operators who make the final maintenance decision. The main idea of the method is to generate a very large number of binary indicators based on parametric anomaly scores designed by experts, complemented by simple aggregations of those scores. The best indicators are selected via a classical forward scheme, leading to a much reduced number of indicators that are tuned to a data set. We illustrate the interest of the method on simulated data which contain realistic early signs of anomalies.
△ Less
Submitted 26 August, 2014;
originally announced August 2014.
-
Anomaly Detection Based on Aggregation of Indicators
Authors:
Tsirizo Rabenoro,
Jérôme Lacaille,
Marie Cottrell,
Fabrice Rossi
Abstract:
Automatic anomaly detection is a major issue in various areas. Beyond mere detection, the identification of the origin of the problem that produced the anomaly is also essential. This paper introduces a general methodology that can assist human operators who aim at classifying monitoring signals. The main idea is to leverage expert knowledge by generating a very large number of indicators. A featu…
▽ More
Automatic anomaly detection is a major issue in various areas. Beyond mere detection, the identification of the origin of the problem that produced the anomaly is also essential. This paper introduces a general methodology that can assist human operators who aim at classifying monitoring signals. The main idea is to leverage expert knowledge by generating a very large number of indicators. A feature selection method is used to keep only the most discriminant indicators which are used as inputs of a Naive Bayes classifier. The parameters of the classifier have been optimized indirectly by the selection process. Simulated data designed to reproduce some of the anomaly types observed in real world engines.
△ Less
Submitted 16 September, 2014; v1 submitted 3 July, 2014;
originally announced July 2014.
-
On-line relational SOM for dissimilarity data
Authors:
Madalina Olteanu,
Nathalie Villa-Vialaneix,
Marie Cottrell
Abstract:
In some applications and in order to address real world situations better, data may be more complex than simple vectors. In some examples, they can be known through their pairwise dissimilarities only. Several variants of the Self Organizing Map algorithm were introduced to generalize the original algorithm to this framework. Whereas median SOM is based on a rough representation of the prototypes,…
▽ More
In some applications and in order to address real world situations better, data may be more complex than simple vectors. In some examples, they can be known through their pairwise dissimilarities only. Several variants of the Self Organizing Map algorithm were introduced to generalize the original algorithm to this framework. Whereas median SOM is based on a rough representation of the prototypes, relational SOM allows representing these prototypes by a virtual combination of all elements in the data set. However, this latter approach suffers from two main drawbacks. First, its complexity can be large. Second, only a batch version of this algorithm has been studied so far and it often provides results having a bad topographic organization. In this article, an on-line version of relational SOM is described and justified. The algorithm is tested on several datasets, including categorical data and graphs, and compared with the batch version and with other SOM algorithms for non vector data.
△ Less
Submitted 27 December, 2012;
originally announced December 2012.
-
Neural Networks for Complex Data
Authors:
Marie Cottrell,
Madalina Olteanu,
Fabrice Rossi,
Joseph Rynkiewicz,
Nathalie Villa-Vialaneix
Abstract:
Artificial neural networks are simple and efficient machine learning tools. Defined originally in the traditional setting of simple vector data, neural network models have evolved to address more and more difficulties of complex real world problems, ranging from time evolving data to sophisticated data structures such as graphs and functions. This paper summarizes advances on those themes from the…
▽ More
Artificial neural networks are simple and efficient machine learning tools. Defined originally in the traditional setting of simple vector data, neural network models have evolved to address more and more difficulties of complex real world problems, ranging from time evolving data to sophisticated data structures such as graphs and functions. This paper summarizes advances on those themes from the last decade, with a focus on results obtained by members of the SAMM team of Université Paris 1
△ Less
Submitted 24 October, 2012;
originally announced October 2012.
-
Fault prediction in aircraft engines using Self-Organizing Maps
Authors:
Marie Cottrell,
Patrice Gaubert,
Cédric Eloy,
Damien François,
Geoffroy Hallaux,
Jérôme Lacaille,
Michel Verleysen
Abstract:
Aircraft engines are designed to be used during several tens of years. Their maintenance is a challenging and costly task, for obvious security reasons. The goal is to ensure a proper operation of the engines, in all conditions, with a zero probability of failure, while taking into account aging. The fact that the same engine is sometimes used on several aircrafts has to be taken into account to…
▽ More
Aircraft engines are designed to be used during several tens of years. Their maintenance is a challenging and costly task, for obvious security reasons. The goal is to ensure a proper operation of the engines, in all conditions, with a zero probability of failure, while taking into account aging. The fact that the same engine is sometimes used on several aircrafts has to be taken into account too. The maintenance can be improved if an efficient procedure for the prediction of failures is implemented. The primary source of information on the health of the engines comes from measurement during flights. Several variables such as the core speed, the oil pressure and quantity, the fan speed, etc. are measured, together with environmental variables such as the outside temperature, altitude, aircraft speed, etc. In this paper, we describe the design of a procedure aiming at visualizing successive data measured on aircraft engines. The data are multi-dimensional measurements on the engines, which are projected on a self-organizing map in order to allow us to follow the trajectories of these data over time. The trajectories consist in a succession of points on the map, each of them corresponding to the two-dimensional projection of the multi-dimensional vector of engine measurements. Analyzing the trajectories aims at visualizing any deviation from a normal behavior, making it possible to anticipate an operation failure.
△ Less
Submitted 8 July, 2009;
originally announced July 2009.
-
Dynamical Equilibrium, trajectories study in an economical system. The case of the labor market
Authors:
Patrick Letrémy,
Marie Cottrell,
Patrice Gaubert,
Joseph Rynkiewicz
Abstract:
The paper deals with the study of labor market dynamics, and aims to characterize its equilibriums and possible trajectories. The theoretical background is the theory of the segmented labor market. The main idea is that this theory is well adapted to interpret the observed trajectories, due to the heterogeneity of the work situations.
The paper deals with the study of labor market dynamics, and aims to characterize its equilibriums and possible trajectories. The theoretical background is the theory of the segmented labor market. The main idea is that this theory is well adapted to interpret the observed trajectories, due to the heterogeneity of the work situations.
△ Less
Submitted 9 July, 2007; v1 submitted 13 April, 2007;
originally announced April 2007.
-
Traitement Des Donnees Manquantes Au Moyen De L'Algorithme De Kohonen
Authors:
Marie Cottrell,
Smail Ibbou,
Patrick Letrémy
Abstract:
Nous montrons comment il est possible d'utiliser l'algorithme d'auto organisation de Kohonen pour traiter des données avec valeurs manquantes et estimer ces dernières. Après un rappel méthodologique, nous illustrons notre propos à partir de trois applications à des données réelles.
-----
We show how it is possible to use the Kohonen self-organizing algorithm to deal with data which contain m…
▽ More
Nous montrons comment il est possible d'utiliser l'algorithme d'auto organisation de Kohonen pour traiter des données avec valeurs manquantes et estimer ces dernières. Après un rappel méthodologique, nous illustrons notre propos à partir de trois applications à des données réelles.
-----
We show how it is possible to use the Kohonen self-organizing algorithm to deal with data which contain missing values and to estimate them. After a methodological recall, we illustrate our purpose from three real databases applications.
△ Less
Submitted 13 April, 2007;
originally announced April 2007.
-
Theoretical Aspects of the SOM Algorithm
Authors:
Marie Cottrell,
Jean-Claude Fort,
Gilles Pagès
Abstract:
The SOM algorithm is very astonishing. On the one hand, it is very simple to write down and to simulate, its practical properties are clear and easy to observe. But, on the other hand, its theoretical properties still remain without proof in the general case, despite the great efforts of several authors. In this paper, we pass in review the last results and provide some conjectures for the futur…
▽ More
The SOM algorithm is very astonishing. On the one hand, it is very simple to write down and to simulate, its practical properties are clear and easy to observe. But, on the other hand, its theoretical properties still remain without proof in the general case, despite the great efforts of several authors. In this paper, we pass in review the last results and provide some conjectures for the future work.
△ Less
Submitted 13 April, 2007;
originally announced April 2007.
-
Forecasting the CATS benchmark with the Double Vector Quantization method
Authors:
Geoffroy Simon,
John Lee,
Marie Cottrell,
Michel Verleysen
Abstract:
The Double Vector Quantization method, a long-term forecasting method based on the SOM algorithm, has been used to predict the 100 missing values of the CATS competition data set. An analysis of the proposed time series is provided to estimate the dimension of the auto-regressive part of this nonlinear auto-regressive forecasting method. Based on this analysis experimental results using the Doub…
▽ More
The Double Vector Quantization method, a long-term forecasting method based on the SOM algorithm, has been used to predict the 100 missing values of the CATS competition data set. An analysis of the proposed time series is provided to estimate the dimension of the auto-regressive part of this nonlinear auto-regressive forecasting method. Based on this analysis experimental results using the Double Vector Quantization (DVQ) method are presented and discussed. As one of the features of the DVQ method is its ability to predict scalars as well as vectors of values, the number of iterative predictions needed to reach the prediction horizon is further observed. The method stability for the long term allows obtaining reliable values for a rather long-term forecasting horizon.
△ Less
Submitted 6 March, 2007;
originally announced March 2007.
-
Classification of Recurring Unemployed Workers and Unemployment Exits
Authors:
Marie Cottrell,
Patrice Gaubert
Abstract:
This study focuses on recurring unemployment, that is people with two or more spells of unemployment during the period of observation (July 1993 - August 1996). First, a classification is obtained which is then used to examine the specific role of occasional jobs during a spell of unemployment and, in this context, the influence of the received unemployment benefits on the duration of this spell…
▽ More
This study focuses on recurring unemployment, that is people with two or more spells of unemployment during the period of observation (July 1993 - August 1996). First, a classification is obtained which is then used to examine the specific role of occasional jobs during a spell of unemployment and, in this context, the influence of the received unemployment benefits on the duration of this spell. This paper is a continuation of previous analyses of unemployment in France, based on long-term data from the unemployed register held by ANPE (National Employment Bureau). The present analysis conducted using additional information about unemployment benefits received by the unemployed from UNEDIC (Unemployment Benefits Office).
△ Less
Submitted 1 March, 2007;
originally announced March 2007.
-
Neural Network and Segmented Labour Market
Authors:
Patrice Gaubert,
Marie Cottrell
Abstract:
In France, for administrative reasons, unemployed workers may actually be involved in occasional work while remaining identified as unemployed (and receiving the corresponding benefit). This is due to the fact that the unemployed are deemed to be seeking full-time jobs and non-fixed term contracts of employment. This situation may be analysed as evidence of a special type of secondary segment of…
▽ More
In France, for administrative reasons, unemployed workers may actually be involved in occasional work while remaining identified as unemployed (and receiving the corresponding benefit). This is due to the fact that the unemployed are deemed to be seeking full-time jobs and non-fixed term contracts of employment. This situation may be analysed as evidence of a special type of secondary segment of the labour market in a context of massive unemployment. The authors consider the effects of this situation both on the duration of unemployment and its recurrence may be usefully investigated.
△ Less
Submitted 1 March, 2007;
originally announced March 2007.
-
A Dynamic Analysis of Segmented Labor Market
Authors:
Patrice Gaubert,
Marie Cottrell
Abstract:
Using the Panel Study of Income Dynamics data on the period 1982-1992, this paper investigates some mechanisms of the labor market in the United States. This market is analyzed as a stable structure constituted of segments which present contrasted characteristics under the usual distinction between primary and secondary sectors. Using a neural network algorithm applied on quantitative variables…
▽ More
Using the Panel Study of Income Dynamics data on the period 1982-1992, this paper investigates some mechanisms of the labor market in the United States. This market is analyzed as a stable structure constituted of segments which present contrasted characteristics under the usual distinction between primary and secondary sectors. Using a neural network algorithm applied on quantitative variables measured at the level of heads of household, a broad classification in four classes of situations is constructed. It shows a clear hierarchy going from situations of very precarious work or no work at all, to situations of stable jobs with higher wages than the average. A Markov chain, constructed with the trajectories between the different situations of these workers, shows a very stable structure of this segmented labor market. Keywords: segmented labor market, unemployment, trajectories, Kohonen algorithm, Markov chain.
△ Less
Submitted 1 March, 2007;
originally announced March 2007.
-
Use of an Hourglass Model in Neuronal Coding
Authors:
Marie Cottrell,
Tatiana Turova
Abstract:
We study a system of interacting renewal processes which is a model for neuronal activity. We show that the system possesses an exponentially large number (with respect to the number of neurons in the network) of limiting configurations of the "firing neurons". These we call patterns. Furthermore, under certain conditions of symmetry we find an algorithm to control limiting patterns by means of…
▽ More
We study a system of interacting renewal processes which is a model for neuronal activity. We show that the system possesses an exponentially large number (with respect to the number of neurons in the network) of limiting configurations of the "firing neurons". These we call patterns. Furthermore, under certain conditions of symmetry we find an algorithm to control limiting patterns by means of the connection parameters.
△ Less
Submitted 1 March, 2007;
originally announced March 2007.
-
Consumer Profile Identification and Allocation
Authors:
Patrick Letrémy,
Marie Cottrell,
Eric Esposito,
Valérie Laffite,
Sally Showk
Abstract:
We propose an easy-to-use methodology to allocate one of the groups which have been previously built from a complete learning data base, to new individuals. The learning data base contains continuous and categorical variables for each individual. The groups (clusters) are built by using only the continuous variables and described with the help of the categorical ones. For the new individuals, on…
▽ More
We propose an easy-to-use methodology to allocate one of the groups which have been previously built from a complete learning data base, to new individuals. The learning data base contains continuous and categorical variables for each individual. The groups (clusters) are built by using only the continuous variables and described with the help of the categorical ones. For the new individuals, only the categorical variables are available, and it is necessary to define a model which computes the probabilities to belong to each of the clusters, by using only the categorical variables. Then this model provides a decision rule to assign the new individuals and gives an efficient tool to decision-makers. This tool is shown to be very efficient for customers allocation in consumer clusters for marketing purposes, for example.
△ Less
Submitted 3 April, 2007; v1 submitted 28 February, 2007;
originally announced February 2007.
-
Living conditions: classification of households using the Kohonen algorithm
Authors:
Sophie Ponthieux,
Marie Cottrell
Abstract:
In the analysis of poverty and social exclusion, indicators of living conditions are some interesting non-monetary complements to the usual measurements in terms of current or annual income. Living conditions depend in fact on longer term factors than income, and provide further information on households' actual resources that allow to compare more accurately between living standards. But in cou…
▽ More
In the analysis of poverty and social exclusion, indicators of living conditions are some interesting non-monetary complements to the usual measurements in terms of current or annual income. Living conditions depend in fact on longer term factors than income, and provide further information on households' actual resources that allow to compare more accurately between living standards. But in counterpart, a difficulty comes from the qualitative nature of the information, and the large number of dimensions and items that may be taken into account; in other words, living conditions are difficult to "measure". A consequence is that very often, the information is either used only partly, or reduced into a global score of (bad) living conditions, that results from counting "negative" items, and the qualitative dimension is lost. In this paper, we propose to use the Kohonen algorithm first to describe how the elements of living conditions are combined, and secondly to classify households according to their living conditions. The main interest of a classification is to make appear not only quantitative differences in the "levels" of living conditions, but also qualitative differences within similar "levels".
△ Less
Submitted 28 February, 2007;
originally announced February 2007.
-
Efficient estimators : the use of neural networks to construct pseudo panels
Authors:
Marie Cottrell,
Patrice Gaubert
Abstract:
Pseudo panels constituted with repeated cross-sections are good substitutes to true panel data. But individuals grouped in a cohort are not the same for successive periods, and it results in a measurement error and inconsistent estimators. The solution is to constitute cohorts of large numbers of individuals but as homogeneous as possible. This paper explains a new way to do this: by using a sel…
▽ More
Pseudo panels constituted with repeated cross-sections are good substitutes to true panel data. But individuals grouped in a cohort are not the same for successive periods, and it results in a measurement error and inconsistent estimators. The solution is to constitute cohorts of large numbers of individuals but as homogeneous as possible. This paper explains a new way to do this: by using a self-organizing map, whose properties are well suited to achieve these objectives. It is applied to a set of Canadian surveys, in order to estimate income elasticities for 18 consumption functions..
△ Less
Submitted 5 January, 2007;
originally announced January 2007.
-
Missing values : processing with the Kohonen algorithm
Authors:
Marie Cottrell,
Patrick Letrémy
Abstract:
The processing of data which contain missing values is a complicated and always awkward problem, when the data come from real-world contexts. In applications, we are very often in front of observations for which all the values are not available, and this can occur for many reasons: ty** errors, fields left unanswered in surveys, etc. Most of the statistical software (as SAS for example) simply…
▽ More
The processing of data which contain missing values is a complicated and always awkward problem, when the data come from real-world contexts. In applications, we are very often in front of observations for which all the values are not available, and this can occur for many reasons: ty** errors, fields left unanswered in surveys, etc. Most of the statistical software (as SAS for example) simply suppresses incomplete observations. It has no practical consequence when the data are very numerous. But if the number of remaining data is too small, it can remove all significance to the results. To avoid suppressing data in that way, it is possible to replace a missing value with the mean value of the corresponding variable, but this approximation can be very bad when the variable has a large variance. So it is very worthwhile seeing that the Kohonen algorithm (as well as the Forgy algorithm) perfectly deals with data with missing values, without having to estimate them beforehand. We are particularly interested in the Kohonen algorithm for its visualization properties.
△ Less
Submitted 5 January, 2007;
originally announced January 2007.
-
Bootstrap for neural model selection
Authors:
Riadh Kallel,
Marie Cottrell,
Vincent Vigneron
Abstract:
Bootstrap techniques (also called resampling computation techniques) have introduced new advances in modeling and model evaluation. Using resampling methods to construct a series of new samples which are based on the original data set, allows to estimate the stability of the parameters. Properties such as convergence and asymptotic normality can be checked for any particular observed data set. I…
▽ More
Bootstrap techniques (also called resampling computation techniques) have introduced new advances in modeling and model evaluation. Using resampling methods to construct a series of new samples which are based on the original data set, allows to estimate the stability of the parameters. Properties such as convergence and asymptotic normality can be checked for any particular observed data set. In most cases, the statistics computed on the generated data sets give a good idea of the confidence regions of the estimates. In this paper, we debate on the contribution of such methods for model selection, in the case of feedforward neural networks. The method is described and compared with the leave-one-out resampling method. The effectiveness of the bootstrap method, versus the leave-one-out methode, is checked through a number of examples.
△ Less
Submitted 4 January, 2007;
originally announced January 2007.
-
Statistical tools to assess the reliability of self-organizing maps
Authors:
Eric De Bodt,
Marie Cottrell,
Michel Verleysen
Abstract:
Results of neural network learning are always subject to some variability, due to the sensitivity to initial conditions, to convergence to local minima, and, sometimes more dramatically, to sampling variability. This paper presents a set of tools designed to assess the reliability of the results of Self-Organizing Maps (SOM), i.e. to test on a statistical basis the confidence we can have on the…
▽ More
Results of neural network learning are always subject to some variability, due to the sensitivity to initial conditions, to convergence to local minima, and, sometimes more dramatically, to sampling variability. This paper presents a set of tools designed to assess the reliability of the results of Self-Organizing Maps (SOM), i.e. to test on a statistical basis the confidence we can have on the result of a specific SOM. The tools concern the quantization error in a SOM, and the neighborhood relations (both at the level of a specific pair of observations and globally on the map). As a by-product, these measures also allow to assess the adequacy of the number of units chosen in a map. The tools may also be used to measure objectively how the SOM are less sensitive to non-linear optimization problems (local minima, convergence, etc.) than other neural network models.
△ Less
Submitted 4 January, 2007;
originally announced January 2007.
-
On the use of self-organizing maps to accelerate vector quantization
Authors:
Eric De Bodt,
Marie Cottrell,
Patrick Letrémy,
Michel Verleysen
Abstract:
Self-organizing maps (SOM) are widely used for their topology preservation property: neighboring input vectors are quantified (or classified) either on the same location or on neighbor ones on a predefined grid. SOM are also widely used for their more classical vector quantization property. We show in this paper that using SOM instead of the more classical Simple Competitive Learning (SCL) algor…
▽ More
Self-organizing maps (SOM) are widely used for their topology preservation property: neighboring input vectors are quantified (or classified) either on the same location or on neighbor ones on a predefined grid. SOM are also widely used for their more classical vector quantization property. We show in this paper that using SOM instead of the more classical Simple Competitive Learning (SCL) algorithm drastically increases the speed of convergence of the vector quantization process. This fact is demonstrated through extensive simulations on artificial and real examples, with specific SOM (fixed and decreasing neighborhoods) and SCL algorithms.
△ Less
Submitted 4 January, 2007;
originally announced January 2007.
-
Time Series Forecasting: Obtaining Long Term Trends with Self-Organizing Maps
Authors:
Geoffroy Simon,
Amaury Lendasse,
Marie Cottrell,
Jean-Claude Fort,
Michel Verleysen
Abstract:
Kohonen self-organisation maps are a well know classification tool, commonly used in a wide variety of problems, but with limited applications in time series forecasting context. In this paper, we propose a forecasting method specifically designed for multi-dimensional long-term trends prediction, with a double application of the Kohonen algorithm. Practical applications of the method are also p…
▽ More
Kohonen self-organisation maps are a well know classification tool, commonly used in a wide variety of problems, but with limited applications in time series forecasting context. In this paper, we propose a forecasting method specifically designed for multi-dimensional long-term trends prediction, with a double application of the Kohonen algorithm. Practical applications of the method are also presented.
△ Less
Submitted 8 January, 2007;
originally announced January 2007.
-
Working times in atypical forms of employment: the special case of part-time work
Authors:
Patrick Letrémy,
Marie Cottrell
Abstract:
In the present article, we attempt to devise a typology of forms of part-time employment by applying a widely used neuronal methodology called Kohonen maps. Starting out with data that we describe using category-specific variables, we show how it is possible to represent observations and the modalities of the variables that define them simultaneously, on a single map. This allows us to ascertain…
▽ More
In the present article, we attempt to devise a typology of forms of part-time employment by applying a widely used neuronal methodology called Kohonen maps. Starting out with data that we describe using category-specific variables, we show how it is possible to represent observations and the modalities of the variables that define them simultaneously, on a single map. This allows us to ascertain, and to try to describe, the main categories of part-time employment.
△ Less
Submitted 14 November, 2006;
originally announced November 2006.
-
Cartes auto-organisées pour l'analyse exploratoire de données et la visualisation
Authors:
Marie Cottrell,
SmaÏl Ibbou,
Patrick Letrémy,
Patrick Rousset
Abstract:
This paper shows how to use the Kohonen algorithm to represent multidimensional data, by exploiting the self-organizing property. It is possible to get such maps as well for quantitative variables as for qualitative ones, or for a mixing of both. The contents of the paper come from various works by SAMOS-MATISSE members, in particular by E. de Bodt, B. Girard, P. Letrémy, S. Ibbou, P. Rousset. M…
▽ More
This paper shows how to use the Kohonen algorithm to represent multidimensional data, by exploiting the self-organizing property. It is possible to get such maps as well for quantitative variables as for qualitative ones, or for a mixing of both. The contents of the paper come from various works by SAMOS-MATISSE members, in particular by E. de Bodt, B. Girard, P. Letrémy, S. Ibbou, P. Rousset. Most of the examples have been studied with the computation routines written by Patrick Letrémy, with the language IML-SAS, which are available on the WEB page http://samos.univ-paris1.fr.
△ Less
Submitted 14 November, 2006;
originally announced November 2006.
-
Using working patterns as a basis for differentiating part-time employment
Authors:
Patrick Letrémy,
Christèle Meilland,
Marie Cottrell
Abstract:
Seeking to determine which working patterns have a specific effect on part-time work, in 1998-99 France's INSEE statistical agency carried out a Timetable survey that questioned the homogeneity of this form of employment (again in terms of the working patterns upon which it is based). A neuronal method was used to classify an entire sample of part-time employees according to their weekly working…
▽ More
Seeking to determine which working patterns have a specific effect on part-time work, in 1998-99 France's INSEE statistical agency carried out a Timetable survey that questioned the homogeneity of this form of employment (again in terms of the working patterns upon which it is based). A neuronal method was used to classify an entire sample of part-time employees according to their weekly working patterns -the end result being that part-time work was shown to be a very heterogeneous form of employment. This was not only reflected by the existence of many different groups of part-time employees, each with highly differentiated individual and professional characteristics, but also (and above all) by the diversity of their weekly working patterns.
△ Less
Submitted 14 November, 2006;
originally announced November 2006.
-
Advances in Self Organising Maps
Authors:
Marie Cottrell,
Michel Verleysen
Abstract:
The Self-Organizing Map (SOM) with its related extensions is the most popular artificial neural algorithm for use in unsupervised learning, clustering, classification and data visualization. Over 5,000 publications have been reported in the open literature, and many commercial projects employ the SOM as a tool for solving hard real-world problems. Each two years, the "Workshop on Self-Organizing…
▽ More
The Self-Organizing Map (SOM) with its related extensions is the most popular artificial neural algorithm for use in unsupervised learning, clustering, classification and data visualization. Over 5,000 publications have been reported in the open literature, and many commercial projects employ the SOM as a tool for solving hard real-world problems. Each two years, the "Workshop on Self-Organizing Maps" (WSOM) covers the new developments in the field. The WSOM series of conferences was initiated in 1997 by Prof. Teuvo Kohonen, and has been successfully organized in 1997 and 1999 by the Helsinki University of Technology, in 2001 by the University of Lincolnshire and Humberside, and in 2003 by the Kyushu Institute of Technology. The Université Paris I Panthéon Sorbonne (SAMOS-MATISSE research centre) organized WSOM 2005 in Paris on September 5-8, 2005.
△ Less
Submitted 14 November, 2006;
originally announced November 2006.
-
How to use the Kohonen algorithm to simultaneously analyse individuals in a survey
Authors:
Marie Cottrell,
Patrick Letrémy
Abstract:
The Kohonen algorithm (SOM, Kohonen,1984, 1995) is a very powerful tool for data analysis. It was originally designed to model organized connections between some biological neural networks. It was also immediately considered as a very good algorithm to realize vectorial quantization, and at the same time pertinent classification, with nice properties for visualization. If the individuals are des…
▽ More
The Kohonen algorithm (SOM, Kohonen,1984, 1995) is a very powerful tool for data analysis. It was originally designed to model organized connections between some biological neural networks. It was also immediately considered as a very good algorithm to realize vectorial quantization, and at the same time pertinent classification, with nice properties for visualization. If the individuals are described by quantitative variables (ratios, frequencies, measurements, amounts, etc.), the straightforward application of the original algorithm leads to build code vectors and to associate to each of them the class of all the individuals which are more similar to this code-vector than to the others. But, in case of individuals described by categorical (qualitative) variables having a finite number of modalities (like in a survey), it is necessary to define a specific algorithm. In this paper, we present a new algorithm inspired by the SOM algorithm, which provides a simultaneous classification of the individuals and of their modalities.
△ Less
Submitted 19 October, 2006;
originally announced October 2006.
-
SOM-based algorithms for qualitative variables
Authors:
Marie Cottrell,
Smail Ibbou,
Patrick Letrémy
Abstract:
It is well known that the SOM algorithm achieves a clustering of data which can be interpreted as an extension of Principal Component Analysis, because of its topology-preserving property. But the SOM algorithm can only process real-valued data. In previous papers, we have proposed several methods based on the SOM algorithm to analyze categorical data, which is the case in survey data. In this p…
▽ More
It is well known that the SOM algorithm achieves a clustering of data which can be interpreted as an extension of Principal Component Analysis, because of its topology-preserving property. But the SOM algorithm can only process real-valued data. In previous papers, we have proposed several methods based on the SOM algorithm to analyze categorical data, which is the case in survey data. In this paper, we present these methods in a unified manner. The first one (Kohonen Multiple Correspondence Analysis, KMCA) deals only with the modalities, while the two others (Kohonen Multiple Correspondence Analysis with individuals, KMCA\_ind, Kohonen algorithm on DISJonctive table, KDISJ) can take into account the individuals, and the modalities simultaneously.
△ Less
Submitted 19 October, 2006;
originally announced October 2006.
-
Batch and median neural gas
Authors:
Marie Cottrell,
Barbara Hammer,
Alexander Hasenfuss,
Thomas Villmann
Abstract:
Neural Gas (NG) constitutes a very robust clustering algorithm given euclidian data which does not suffer from the problem of local minima like simple vector quantization, or topological restrictions like the self-organizing map. Based on the cost function of NG, we introduce a batch variant of NG which shows much faster convergence and which can be interpreted as an optimization of the cost fun…
▽ More
Neural Gas (NG) constitutes a very robust clustering algorithm given euclidian data which does not suffer from the problem of local minima like simple vector quantization, or topological restrictions like the self-organizing map. Based on the cost function of NG, we introduce a batch variant of NG which shows much faster convergence and which can be interpreted as an optimization of the cost function by the Newton method. This formulation has the additional benefit that, based on the notion of the generalized median in analogy to Median SOM, a variant for non-vectorial proximity data can be introduced. We prove convergence of batch and median versions of NG, SOM, and k-means in a unified formulation, and we investigate the behavior of the algorithms in several experiments.
△ Less
Submitted 18 October, 2006;
originally announced October 2006.