Search | arXiv e-print repository

Generation and Simulation of Synthetic Datasets with Copulas

Authors: Regis Houssou, Mihai-Cezar Augustin, Efstratios Rappos, Vivien Bonvin, Stephan Robert-Nicoud

Abstract: This paper proposes a new method to generate synthetic data sets based on copula models. Our goal is to produce surrogate data resembling real data in terms of marginal and joint distributions. We present a complete and reliable algorithm for generating a synthetic data set comprising numeric or categorical variables. Applying our methodology to two datasets shows better performance compared to ot… ▽ More This paper proposes a new method to generate synthetic data sets based on copula models. Our goal is to produce surrogate data resembling real data in terms of marginal and joint distributions. We present a complete and reliable algorithm for generating a synthetic data set comprising numeric or categorical variables. Applying our methodology to two datasets shows better performance compared to other methods such as SMOTE and autoencoders. △ Less

Submitted 30 March, 2022; originally announced March 2022.

arXiv:2203.15884 [pdf, other]

Radial Autoencoders for Enhanced Anomaly Detection

Authors: Mihai-Cezar Augustin, Vivien Bonvin, Regis Houssou, Efstratios Rappos, Stephan Robert-Nicoud

Abstract: In classification problems, supervised machine-learning methods outperform traditional algorithms, thanks to the ability of neural networks to learn complex patterns. However, in two-class classification tasks like anomaly or fraud detection, unsupervised methods could do even better, because their prediction is not limited to previously learned types of anomalies. An intuitive approach of anomaly… ▽ More In classification problems, supervised machine-learning methods outperform traditional algorithms, thanks to the ability of neural networks to learn complex patterns. However, in two-class classification tasks like anomaly or fraud detection, unsupervised methods could do even better, because their prediction is not limited to previously learned types of anomalies. An intuitive approach of anomaly detection can be based on the distances from the centers of mass of the two respective classes. Autoencoders, although trained without supervision, can also detect anomalies: considering the center of mass of the normal points, reconstructions have now radii, with largest radii most likely indicating anomalous points. Of course, radii-based classification were already possible without interposing an autoencoder. In any space, radial classification can be operated, to some extent. In order to outperform it, we proceed to radial deformations of data (i.e. centric compression or expansions of axes) and autoencoder training. Any autoencoder that makes use of a data center is here baptized a centric autoencoder (cAE). A special type is the cAE trained with a uniformly compressed dataset, named the centripetal autoencoder (cpAE). The new concept is studied here in relation with a schematic artificial dataset, and the derived methods show consistent score improvements. But tested on real banking data, our radial deformation supervised algorithms alone still perform better that cAEs, as expected from most supervised methods; nonetheless, in hybrid approaches, cAEs can be combined with a radial deformation of space, improving its classification score. We expect that centric autoencoders will become irreplaceable objects in anomaly live detection based on geometry, thanks to their ability to stem naturally on geometrical algorithms and to their native capability of detecting unknown anomaly types. △ Less

Submitted 31 March, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

arXiv:2011.12390 [pdf, other]

Anomaly Detection Model for Imbalanced Datasets

Authors: Régis Houssou, Stephan Robert-Nicoud

Abstract: This paper proposes a method to detect bank frauds using a mixed approach combining a stochastic intensity model with the probability of fraud observed on transactions. It is a dynamic unsupervised approach which is able to predict financial frauds. The fraud prediction probability on the financial transaction is derived as a function of the dynamic intensities. In this context, the Kalman filter… ▽ More This paper proposes a method to detect bank frauds using a mixed approach combining a stochastic intensity model with the probability of fraud observed on transactions. It is a dynamic unsupervised approach which is able to predict financial frauds. The fraud prediction probability on the financial transaction is derived as a function of the dynamic intensities. In this context, the Kalman filter method is proposed to estimate the dynamic intensities. The application of our methodology to financial datasets shows a better predictive power in higher imbalanced data compared to other intensity-based models. △ Less

Submitted 24 November, 2020; originally announced November 2020.

Comments: 11 pages, 5 figures

arXiv:2009.07578 [pdf, other]

Anomaly and Fraud Detection in Credit Card Transactions Using the ARIMA Model

Authors: Giulia Moschini, Régis Houssou, Jérôme Bovay, Stephan Robert-Nicoud

Abstract: This paper addresses the problem of unsupervised approach of credit card fraud detection in unbalanced dataset using the ARIMA model. The ARIMA model is fitted on the regular spending behaviour of the customer and is used to detect fraud if some deviations or discrepancies appear. Our model is applied to credit card datasets and is compared to 4 anomaly detection approaches such as K-Means, Box-Pl… ▽ More This paper addresses the problem of unsupervised approach of credit card fraud detection in unbalanced dataset using the ARIMA model. The ARIMA model is fitted on the regular spending behaviour of the customer and is used to detect fraud if some deviations or discrepancies appear. Our model is applied to credit card datasets and is compared to 4 anomaly detection approaches such as K-Means, Box-Plot, Local Outlier Factor and Isolation Forest. The results show that the ARIMA model presents a better detecting power than the benchmark models. △ Less

Submitted 16 September, 2020; originally announced September 2020.

arXiv:1912.04308 [pdf, other]

Adaptive Financial Fraud Detection in Imbalanced Data with Time-Varying Poisson Processes

Authors: Régis Houssou, Jérôme Bovay, Stephan Robert

Abstract: This paper discusses financial fraud detection in imbalanced dataset using homogeneous and non-homogeneous Poisson processes. The probability of predicting fraud on the financial transaction is derived. Applying our methodology to the financial dataset shows a better predicting power than a baseline approach, especially in the case of higher imbalanced data. This paper discusses financial fraud detection in imbalanced dataset using homogeneous and non-homogeneous Poisson processes. The probability of predicting fraud on the financial transaction is derived. Applying our methodology to the financial dataset shows a better predicting power than a baseline approach, especially in the case of higher imbalanced data. △ Less

Submitted 9 December, 2019; originally announced December 2019.

Comments: Accepted for publication in the Journal Of Financial Risk Management (JFRM). Comments welcome

arXiv:1003.4118 [pdf, ps, other]

Indifference of Defaultable Bonds with Stochastic Intensity models

Authors: Regis Houssou, Olivier Besson

Abstract: The utility-based pricing of defaultable bonds in the case of stochastic intensity models of default risk is discussed. The Hamilton-Jacobi- Bellman (HJB) equations for the value functions is derived. A finite difference method is used to solve this problem. The yield-spreads for both buyer and seller are extracted. The behaviour of the spread curve given the default intensity is analyzed. Finally… ▽ More The utility-based pricing of defaultable bonds in the case of stochastic intensity models of default risk is discussed. The Hamilton-Jacobi- Bellman (HJB) equations for the value functions is derived. A finite difference method is used to solve this problem. The yield-spreads for both buyer and seller are extracted. The behaviour of the spread curve given the default intensity is analyzed. Finally the impacts of the risk aversion and the correlation coefficient are discussed. △ Less

Submitted 22 March, 2010; originally announced March 2010.

Showing 1–6 of 6 results for author: Houssou, R