-
Making Parametric Anomaly Detection on Tabular Data Non-Parametric Again
Authors:
Hugo Thimonier,
Fabrice Popineau,
Arpad Rimmel,
Bich-Liên Doan
Abstract:
Deep learning for tabular data has garnered increasing attention in recent years, yet employing deep models for structured data remains challenging. While these models excel with unstructured data, their efficacy with structured data has been limited. Recent research has introduced retrieval-augmented models to address this gap, demonstrating promising results in supervised tasks such as classific…
▽ More
Deep learning for tabular data has garnered increasing attention in recent years, yet employing deep models for structured data remains challenging. While these models excel with unstructured data, their efficacy with structured data has been limited. Recent research has introduced retrieval-augmented models to address this gap, demonstrating promising results in supervised tasks such as classification and regression. In this work, we investigate using retrieval-augmented models for anomaly detection on tabular data. We propose a reconstruction-based approach in which a transformer model learns to reconstruct masked features of \textit{normal} samples. We test the effectiveness of KNN-based and attention-based modules to select relevant samples to help in the reconstruction process of the target sample. Our experiments on a benchmark of 31 tabular datasets reveal that augmenting this reconstruction-based anomaly detection (AD) method with non-parametric relationships via retrieval modules may significantly boost performance.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Comparative Evaluation of Anomaly Detection Methods for Fraud Detection in Online Credit Card Payments
Authors:
Hugo Thimonier,
Fabrice Popineau,
Arpad Rimmel,
Bich-Liên Doan,
Fabrice Daniel
Abstract:
This study explores the application of anomaly detection (AD) methods in imbalanced learning tasks, focusing on fraud detection using real online credit card payment data. We assess the performance of several recent AD methods and compare their effectiveness against standard supervised learning methods. Offering evidence of distribution shift within our dataset, we analyze its impact on the tested…
▽ More
This study explores the application of anomaly detection (AD) methods in imbalanced learning tasks, focusing on fraud detection using real online credit card payment data. We assess the performance of several recent AD methods and compare their effectiveness against standard supervised learning methods. Offering evidence of distribution shift within our dataset, we analyze its impact on the tested models' performances. Our findings reveal that LightGBM exhibits significantly superior performance across all evaluated metrics but suffers more from distribution shifts than AD methods. Furthermore, our investigation reveals that LightGBM also captures the majority of frauds detected by AD methods. This observation challenges the potential benefits of ensemble methods to combine supervised, and AD approaches to enhance performance. In summary, this research provides practical insights into the utility of these techniques in real-world scenarios, showing LightGBM's superiority in fraud detection while highlighting challenges related to distribution shifts.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management
Authors:
Marc Velay,
Bich-Liên Doan,
Arpad Rimmel,
Fabrice Popineau,
Fabrice Daniel
Abstract:
Deep Reinforcement Learning approaches to Online Portfolio Selection have grown in popularity in recent years. The sensitive nature of training Reinforcement Learning agents implies a need for extensive efforts in market representation, behavior objectives, and training processes, which have often been lacking in previous works. We propose a training and evaluation process to assess the performanc…
▽ More
Deep Reinforcement Learning approaches to Online Portfolio Selection have grown in popularity in recent years. The sensitive nature of training Reinforcement Learning agents implies a need for extensive efforts in market representation, behavior objectives, and training processes, which have often been lacking in previous works. We propose a training and evaluation process to assess the performance of classical DRL algorithms for portfolio management. We found that most Deep Reinforcement Learning algorithms were not robust, with strategies generalizing poorly and degrading quickly during backtesting.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Beyond Individual Input for Deep Anomaly Detection on Tabular Data
Authors:
Hugo Thimonier,
Fabrice Popineau,
Arpad Rimmel,
Bich-Liên Doan
Abstract:
Anomaly detection is vital in many domains, such as finance, healthcare, and cybersecurity. In this paper, we propose a novel deep anomaly detection method for tabular data that leverages Non-Parametric Transformers (NPTs), a model initially proposed for supervised tasks, to capture both feature-feature and sample-sample dependencies. In a reconstruction-based framework, we train an NPT to reconst…
▽ More
Anomaly detection is vital in many domains, such as finance, healthcare, and cybersecurity. In this paper, we propose a novel deep anomaly detection method for tabular data that leverages Non-Parametric Transformers (NPTs), a model initially proposed for supervised tasks, to capture both feature-feature and sample-sample dependencies. In a reconstruction-based framework, we train an NPT to reconstruct masked features of normal samples. In a non-parametric fashion, we leverage the whole training set during inference and use the model's ability to reconstruct the masked features to generate an anomaly score. To the best of our knowledge, this is the first work to successfully combine feature-feature and sample-sample dependencies for anomaly detection on tabular datasets. Through extensive experiments on 31 benchmark tabular datasets, we demonstrate that our method achieves state-of-the-art performance, outperforming existing methods by 2.4% and 1.2% in terms of F1-score and AUROC, respectively. Our ablation study further proves that modeling both types of dependencies is crucial for anomaly detection on tabular data.
△ Less
Submitted 2 May, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
TracInAD: Measuring Influence for Anomaly Detection
Authors:
Hugo Thimonier,
Fabrice Popineau,
Arpad Rimmel,
Bich-Liên Doan,
Fabrice Daniel
Abstract:
As with many other tasks, neural networks prove very effective for anomaly detection purposes. However, very few deep-learning models are suited for detecting anomalies on tabular datasets. This paper proposes a novel methodology to flag anomalies based on TracIn, an influence measure initially introduced for explicability purposes. The proposed methods can serve to augment any unsupervised deep a…
▽ More
As with many other tasks, neural networks prove very effective for anomaly detection purposes. However, very few deep-learning models are suited for detecting anomalies on tabular datasets. This paper proposes a novel methodology to flag anomalies based on TracIn, an influence measure initially introduced for explicability purposes. The proposed methods can serve to augment any unsupervised deep anomaly detection method. We test our approach using Variational Autoencoders and show that the average influence of a subsample of training points on a test point can serve as a proxy for abnormality. Our model proves to be competitive in comparison with state-of-the-art approaches: it achieves comparable or better performance in terms of detection accuracy on medical and cyber-security tabular benchmark data.
△ Less
Submitted 30 January, 2024; v1 submitted 3 May, 2022;
originally announced May 2022.
-
Fixed-parameter tractability of counting small minimum $(S,T)$-cuts
Authors:
Pierre Bergé,
Benjamin Mouscadet,
Arpad Rimmel,
Joanna Tomasik
Abstract:
The parameterized complexity of counting minimum cuts stands as a natural question because Ball and Provan showed its #P-completeness. For any undirected graph $G=(V,E)$ and two disjoint sets of its vertices $S,T$, we design a fixed-parameter tractable algorithm which counts minimum edge $(S,T)$-cuts parameterized by their size $p$. Our algorithm operates on a transformed graph instance. This tran…
▽ More
The parameterized complexity of counting minimum cuts stands as a natural question because Ball and Provan showed its #P-completeness. For any undirected graph $G=(V,E)$ and two disjoint sets of its vertices $S,T$, we design a fixed-parameter tractable algorithm which counts minimum edge $(S,T)$-cuts parameterized by their size $p$. Our algorithm operates on a transformed graph instance. This transformation, called drainage, reveals a collection of at most $n=\left| V \right|$ successive minimum $(S,T)$-cuts $Z_i$. We prove that any minimum $(S,T)$-cut $X$ contains edges of at least one cut $Z_i$. This observation, together with Menger's theorem, allows us to build the algorithm counting all minimum $(S,T)$-cuts with running time $2^{O(p^2)}n^{O(1)}$. Initially dedicated to counting minimum cuts, it can be modified to obtain an FPT sampling of minimum edge $(S,T)$-cuts.
△ Less
Submitted 5 July, 2019; v1 submitted 4 July, 2019;
originally announced July 2019.
-
On Simulated Annealing Dedicated to Maximin Latin Hypercube Designs
Authors:
Pierre Bergé,
Kaourintin Le Guiban,
Arpad Rimmel,
Joanna Tomasik
Abstract:
The goal of our research was to enhance local search heuristics used to construct Latin Hypercube Designs. First, we introduce the \textit{1D-move} perturbation to improve the space exploration performed by these algorithms. Second, we propose a new evaluation function $ψ_{p,σ}$ specifically targeting the Maximin criterion.
Exhaustive series of experiments with Simulated Annealing, which we used…
▽ More
The goal of our research was to enhance local search heuristics used to construct Latin Hypercube Designs. First, we introduce the \textit{1D-move} perturbation to improve the space exploration performed by these algorithms. Second, we propose a new evaluation function $ψ_{p,σ}$ specifically targeting the Maximin criterion.
Exhaustive series of experiments with Simulated Annealing, which we used as a typically well-behaving local search heuristics, confirm that our goal was reached as the result we obtained surpasses the best scores reported in the literature. Furthermore, the $ψ_{p,σ}$ function seems very promising for a wide spectrum of optimization problems through the Maximin criterion.
△ Less
Submitted 23 August, 2016;
originally announced August 2016.