-
Graph Feature Preprocessor: Real-time Extraction of Subgraph-based Features from Transaction Graphs
Authors:
Jovan Blanuša,
Maximo Cravero Baraja,
Andreea Anghel,
Luc von Niederhäusern,
Erik Altman,
Haris Pozidis,
Kubilay Atasu
Abstract:
In this paper, we present "Graph Feature Preprocessor", a software library for detecting typical money laundering and fraud patterns in financial transaction graphs in real time. These patterns are used to produce a rich set of transaction features for downstream machine learning training and inference tasks such as money laundering detection. We show that our enriched transaction features dramati…
▽ More
In this paper, we present "Graph Feature Preprocessor", a software library for detecting typical money laundering and fraud patterns in financial transaction graphs in real time. These patterns are used to produce a rich set of transaction features for downstream machine learning training and inference tasks such as money laundering detection. We show that our enriched transaction features dramatically improve the prediction accuracy of gradient-boosting-based machine learning models. Our library exploits multicore parallelism, maintains a dynamic in-memory graph, and efficiently mines subgraph patterns in the incoming transaction stream, which enables it to be operated in a streaming manner. We evaluate our library using highly-imbalanced synthetic anti-money laundering (AML) and real-life Ethereum phishing datasets. In these datasets, the proportion of illicit transactions is very small, which makes the learning process challenging. Our solution, which combines our Graph Feature Preprocessor and gradient-boosting-based machine learning models, is able to detect these illicit transactions with higher minority-class F1 scores than standard graph neural networks. In addition, the end-to-end throughput rate of our solution executed on a multicore CPU outperforms the graph neural network baselines executed on a powerful V100 GPU. Overall, the combination of high accuracy, a high throughput rate, and low latency of our solution demonstrates the practical value of our library in real-world applications. Graph Feature Preprocessor has been integrated into IBM mainframe software products, namely "IBM Cloud Pak for Data on Z" and "AI Toolkit for IBM Z and LinuxONE".
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Realistic Synthetic Financial Transactions for Anti-Money Laundering Models
Authors:
Erik Altman,
Jovan Blanuša,
Luc von Niederhäusern,
Béni Egressy,
Andreea Anghel,
Kubilay Atasu
Abstract:
With the widespread digitization of finance and the increasing popularity of cryptocurrencies, the sophistication of fraud schemes devised by cybercriminals is growing. Money laundering -- the movement of illicit funds to conceal their origins -- can cross bank and national boundaries, producing complex transaction patterns. The UN estimates 2-5\% of global GDP or \$0.8 - \$2.0 trillion dollars ar…
▽ More
With the widespread digitization of finance and the increasing popularity of cryptocurrencies, the sophistication of fraud schemes devised by cybercriminals is growing. Money laundering -- the movement of illicit funds to conceal their origins -- can cross bank and national boundaries, producing complex transaction patterns. The UN estimates 2-5\% of global GDP or \$0.8 - \$2.0 trillion dollars are laundered globally each year. Unfortunately, real data to train machine learning models to detect laundering is generally not available, and previous synthetic data generators have had significant shortcomings. A realistic, standardized, publicly-available benchmark is needed for comparing models and for the advancement of the area.
To this end, this paper contributes a synthetic financial transaction dataset generator and a set of synthetically generated AML (Anti-Money Laundering) datasets. We have calibrated this agent-based generator to match real transactions as closely as possible and made the datasets public. We describe the generator in detail and demonstrate how the datasets generated can help compare different machine learning models in terms of their AML abilities. In a key way, using synthetic data in these comparisons can be even better than using real data: the ground truth labels are complete, whilst many laundering transactions in real data are never detected.
△ Less
Submitted 25 January, 2024; v1 submitted 22 June, 2023;
originally announced June 2023.
-
Provably Powerful Graph Neural Networks for Directed Multigraphs
Authors:
Béni Egressy,
Luc von Niederhäusern,
Jovan Blanusa,
Erik Altman,
Roger Wattenhofer,
Kubilay Atasu
Abstract:
This paper analyses a set of simple adaptations that transform standard message-passing Graph Neural Networks (GNN) into provably powerful directed multigraph neural networks. The adaptations include multigraph port numbering, ego IDs, and reverse message passing. We prove that the combination of these theoretically enables the detection of any directed subgraph pattern. To validate the effectiven…
▽ More
This paper analyses a set of simple adaptations that transform standard message-passing Graph Neural Networks (GNN) into provably powerful directed multigraph neural networks. The adaptations include multigraph port numbering, ego IDs, and reverse message passing. We prove that the combination of these theoretically enables the detection of any directed subgraph pattern. To validate the effectiveness of our proposed adaptations in practice, we conduct experiments on synthetic subgraph detection tasks, which demonstrate outstanding performance with almost perfect results. Moreover, we apply our proposed adaptations to two financial crime analysis tasks. We observe dramatic improvements in detecting money laundering transactions, improving the minority-class F1 score of a standard message-passing GNN by up to 30%, and closely matching or outperforming tree-based and GNN baselines. Similarly impressive results are observed on a real-world phishing detection dataset, boosting three standard GNNs' F1 scores by around 15% and outperforming all baselines.
△ Less
Submitted 4 January, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.
-
The value of Shared Information for allocation of drivers in ride-hailing: a proof-of-concept study
Authors:
Gianfranco Liberona,
David Salas,
Léonard von Niederhäusern
Abstract:
For drivers in ride-hailing companies, allocation within the city is paramount to get matched with rides. This decision depends on many factors, where some of them (such as demand and allocation of others) are unknown for the drivers, but are available for the company. In this work, we investigate whether it is beneficial or not for the ride-hailing company to share this information with their dri…
▽ More
For drivers in ride-hailing companies, allocation within the city is paramount to get matched with rides. This decision depends on many factors, where some of them (such as demand and allocation of others) are unknown for the drivers, but are available for the company. In this work, we investigate whether it is beneficial or not for the ride-hailing company to share this information with their drivers. To do so, we study the problem through the lens of Stackelberg games, and we propose a new indicator called the Expected Value of Shared Information. We present a simplified model to conduct a proof-of-concept study: we provide explicit single-level reformulations of the bilevel programming problems derived from the model, and perform several simulations with randomly generated data. Our preliminary results suggest that sharing information could be beneficial and deserves to be further studied.
△ Less
Submitted 10 October, 2023; v1 submitted 31 December, 2021;
originally announced January 2022.
-
A Rolling Horizon Approach for a Bilevel Stochastic Pricing Problem for Demand-Side Management
Authors:
Luce Brotcorne,
Sébastien Lepaul,
Léonard von Niederhäusern
Abstract:
To guarantee the well-functioning of electricity distribution networks, it is crucial to constantly ensure the demand-supply balance. To do this, one can control the means of production, but also influence the demand: demand-side management becomes more and more popular as the demand keeps increasing and getting more chaotic. In this work, we propose a bilevel model involving an energy supplier an…
▽ More
To guarantee the well-functioning of electricity distribution networks, it is crucial to constantly ensure the demand-supply balance. To do this, one can control the means of production, but also influence the demand: demand-side management becomes more and more popular as the demand keeps increasing and getting more chaotic. In this work, we propose a bilevel model involving an energy supplier and a smart grid operator (SGO): the supplier induces shifts of the load controlled by the SGO by offering time-dependent prices. We assume that the SGO has contracts with consumers and decides their consumption schedule, guaranteeing that the inconvenience induced by the load shifts will not overcome the related financial benefits. Furthermore, we assume that the SGO manages a source of renewable energy (RE), which leads us to consider a stochastic bilevel model, as the generation of RE is by nature highly unpredictable. To cope with the issue of large problem sizes, we design a rolling horizon algorithm that can be applied in a real context.
△ Less
Submitted 26 February, 2021;
originally announced February 2021.
-
On the geometry of symmetry breaking inequalities
Authors:
José Verschae,
Matías Villagra,
Léonard von Niederhäusern
Abstract:
Breaking symmetries is a popular way of speeding up the branch-and-bound method for symmetric integer programs. We study fundamental domains, which are minimal and closed symmetry breaking polyhedra. Our long-term goal is to understand the relationship between the complexity of such polyhedra and their symmetry breaking capability.
Borrowing ideas from geometric group theory, we provide structur…
▽ More
Breaking symmetries is a popular way of speeding up the branch-and-bound method for symmetric integer programs. We study fundamental domains, which are minimal and closed symmetry breaking polyhedra. Our long-term goal is to understand the relationship between the complexity of such polyhedra and their symmetry breaking capability.
Borrowing ideas from geometric group theory, we provide structural properties that relate the action of the group with the geometry of the facets of fundamental domains. Inspired by these insights, we provide a new generalized construction for fundamental domains, which we call generalized Dirichlet domain (GDD). Our construction is recursive and exploits the coset decomposition of the subgroups that fix given vectors in $\mathbb{R}^n$. We use this construction to analyze a recently introduced set of symmetry breaking inequalities by Salvagnin (2018) and Liberti and Ostrowski (2014), called Schreier-Sims inequalities. In particular, this shows that every permutation group admits a fundamental domain with less than $n$ facets. We also show that this bound is tight.
Finally, we prove that the Schreier-Sims inequalities can contain an exponential number of isomorphic binary vectors for a given permutation group $G$, which provides evidence of the lack of symmetry breaking effectiveness of this fundamental domain. Conversely, a suitably constructed GDD for this $G$ has linearly many inequalities and contains unique representatives for isomorphic binary vectors.
△ Less
Submitted 3 June, 2021; v1 submitted 18 November, 2020;
originally announced November 2020.