-
Universal representations for financial transactional data: embracing local, global, and external contexts
Authors:
Alexandra Bazarova,
Maria Kovaleva,
Ilya Kuleshov,
Evgenia Romanenkova,
Alexander Stepikin,
Alexandr Yugay,
Dzhambulat Mollaev,
Ivan Kireev,
Andrey Savchenko,
Alexey Zaytsev
Abstract:
Effective processing of financial transactions is essential for banking data analysis. However, in this domain, most methods focus on specialized solutions to stand-alone problems instead of constructing universal representations suitable for many problems. We present a representation learning framework that addresses diverse business challenges. We also suggest novel generative models that accoun…
▽ More
Effective processing of financial transactions is essential for banking data analysis. However, in this domain, most methods focus on specialized solutions to stand-alone problems instead of constructing universal representations suitable for many problems. We present a representation learning framework that addresses diverse business challenges. We also suggest novel generative models that account for data specifics, and a way to integrate external information into a client's representation, leveraging insights from other customers' actions. Finally, we offer a benchmark, describing representation quality globally, concerning the entire transaction history; locally, reflecting the client's current state; and dynamically, capturing representation evolution over time. Our generative approach demonstrates superior performance in local tasks, with an increase in ROC-AUC of up to 14\% for the next MCC prediction task and up to 46\% for downstream tasks from existing contrastive baselines. Incorporating external information improves the scores by an additional 20\%.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Usage of specific attention improves change point detection
Authors:
Anna Dmitrienko,
Evgenia Romanenkova,
Alexey Zaytsev
Abstract:
The change point is a moment of an abrupt alteration in the data distribution. Current methods for change point detection are based on recurrent neural methods suitable for sequential data. However, recent works show that transformers based on attention mechanisms perform better than standard recurrent models for many tasks. The most benefit is noticeable in the case of longer sequences. In this p…
▽ More
The change point is a moment of an abrupt alteration in the data distribution. Current methods for change point detection are based on recurrent neural methods suitable for sequential data. However, recent works show that transformers based on attention mechanisms perform better than standard recurrent models for many tasks. The most benefit is noticeable in the case of longer sequences. In this paper, we investigate different attentions for the change point detection task and proposed specific form of attention related to the task at hand. We show that using a special form of attention outperforms state-of-the-art results.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
Deep learning model solves change point detection for multiple change types
Authors:
Alexander Stepikin,
Evgenia Romanenkova,
Alexey Zaytsev
Abstract:
A change points detection aims to catch an abrupt disorder in data distribution. Common approaches assume that there are only two fixed distributions for data: one before and another after a change point. Real-world data are richer than this assumption. There can be multiple different distributions before and after a change. We propose an approach that works in the multiple-distributions scenario.…
▽ More
A change points detection aims to catch an abrupt disorder in data distribution. Common approaches assume that there are only two fixed distributions for data: one before and another after a change point. Real-world data are richer than this assumption. There can be multiple different distributions before and after a change. We propose an approach that works in the multiple-distributions scenario. Our approach learn representations for semi-structured data suitable for change point detection, while a common classifiers-based approach fails. Moreover, our model is more robust, when predicting change points. The datasets used for benchmarking are sequences of images with and without change points in them.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
Similarity learning for wells based on logging data
Authors:
Evgenia Romanenkova,
Alina Rogulina,
Anuar Shakirov,
Nikolay Stulov,
Alexey Zaytsev,
Leyla Ismailova,
Dmitry Kovalev,
Klemens Katterbauer,
Abdallah AlShehri
Abstract:
One of the first steps during the investigation of geological objects is the interwell correlation. It provides information on the structure of the objects under study, as it comprises the framework for constructing geological models and assessing hydrocarbon reserves. Today, the detailed interwell correlation relies on manual analysis of well-logging data. Thus, it is time-consuming and of a subj…
▽ More
One of the first steps during the investigation of geological objects is the interwell correlation. It provides information on the structure of the objects under study, as it comprises the framework for constructing geological models and assessing hydrocarbon reserves. Today, the detailed interwell correlation relies on manual analysis of well-logging data. Thus, it is time-consuming and of a subjective nature. The essence of the interwell correlation constitutes an assessment of the similarities between geological profiles. There were many attempts to automate the process of interwell correlation by means of rule-based approaches, classic machine learning approaches, and deep learning approaches in the past. However, most approaches are of limited usage and inherent subjectivity of experts. We propose a novel framework to solve the geological profile similarity estimation based on a deep learning model. Our similarity model takes well-logging data as input and provides the similarity of wells as output. The developed framework enables (1) extracting patterns and essential characteristics of geological profiles within the wells and (2) model training following the unsupervised paradigm without the need for manual analysis and interpretation of well-logging data. For model testing, we used two open datasets originating in New Zealand and Norway. Our data-based similarity models provide high performance: the accuracy of our model is $0.926$ compared to $0.787$ for baselines based on the popular gradient boosting approach. With them, an oil\&gas practitioner can improve interwell correlation quality and reduce operation time.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
InDiD: Instant Disorder Detection via Representation Learning
Authors:
Evgenia Romanenkova,
Alexander Stepikin,
Matvey Morozov,
Alexey Zaytsev
Abstract:
For sequential data, a change point is a moment of abrupt regime switch in data streams. Such changes appear in different scenarios, including simpler data from sensors and more challenging video surveillance data. We need to detect disorders as fast as possible. Classic approaches for change point detection (CPD) might underperform for semi-structured sequential data because they cannot process i…
▽ More
For sequential data, a change point is a moment of abrupt regime switch in data streams. Such changes appear in different scenarios, including simpler data from sensors and more challenging video surveillance data. We need to detect disorders as fast as possible. Classic approaches for change point detection (CPD) might underperform for semi-structured sequential data because they cannot process its structure without a proper representation. We propose a principled loss function that balances change detection delay and time to a false alarm. It approximates classic rigorous solutions but is differentiable and allows representation learning for deep models. We consider synthetic sequences, real-world data sensors and videos with change points. We carefully labelled available data with change point moments for video data and released it for the first time. Experiments suggest that complex data require meaningful representations tailored for the specificity of the CPD task -- and our approach provides them outperforming considered baselines. For example, for explosion detection in video, the F1 score for our method is $0.53$ compared to baseline scores of $0.31$ and $0.35$.
△ Less
Submitted 22 April, 2022; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Application of Machine Learning to accidents detection at directional drilling
Authors:
Ekaterina Gurina,
Nikita Klyuchnikov,
Alexey Zaytsev,
Evgenya Romanenkova,
Ksenia Antipova,
Igor Simon,
Victor Makarov,
Dmitry Koroteev
Abstract:
We present a data-driven algorithm and mathematical model for anomaly alarming at directional drilling. The algorithm is based on machine learning. It compares the real-time drilling telemetry with one corresponding to past accidents and analyses the level of similarity. The model performs a time-series comparison using aggregated statistics and Gradient Boosting classification. It is trained on h…
▽ More
We present a data-driven algorithm and mathematical model for anomaly alarming at directional drilling. The algorithm is based on machine learning. It compares the real-time drilling telemetry with one corresponding to past accidents and analyses the level of similarity. The model performs a time-series comparison using aggregated statistics and Gradient Boosting classification. It is trained on historical data containing the drilling telemetry of $80$ wells drilled within $19$ oilfields. The model can detect an anomaly and identify its type by comparing the real-time measurements while drilling with the ones from the database of past accidents. Validation tests show that our algorithm identifies half of the anomalies with about $0.53$ false alarms per day on average. The model performance ensures sufficient time and cost savings as it enables partial prevention of the failures and accidents at the well construction.
△ Less
Submitted 12 December, 2019; v1 submitted 6 June, 2019;
originally announced June 2019.
-
Real-time data-driven detection of the rock type alteration during a directional drilling
Authors:
Evgenya Romanenkova,
Alexey Zaytsev,
Nikita Klyuchnikov,
Arseniy Gruzdev,
Ksenia Antipova,
Leyla Ismailova,
Evgeny Burnaev,
Artyom Semenikhin,
Vitaliy Koryabkin,
Igor Simon,
Dmitry Koroteev
Abstract:
During the directional drilling, a bit may sometimes go to a nonproductive rock layer due to the gap about 20m between the bit and high-fidelity rock type sensors. The only way to detect the lithotype changes in time is the usage of Measurements While Drilling (MWD) data. However, there are no general mathematical modeling approaches that both well reconstruct the rock type based on MWD data and c…
▽ More
During the directional drilling, a bit may sometimes go to a nonproductive rock layer due to the gap about 20m between the bit and high-fidelity rock type sensors. The only way to detect the lithotype changes in time is the usage of Measurements While Drilling (MWD) data. However, there are no general mathematical modeling approaches that both well reconstruct the rock type based on MWD data and correspond to specifics of the oil and gas industry. In this article, we present a data-driven procedure that utilizes MWD data for quick detection of changes in rock type. We propose the approach that combines traditional machine learning based on the solution of the rock type classification problem with change detection procedures rarely used before in the Oil\&Gas industry. The data come from a newly developed oilfield in the north of western Siberia. The results suggest that we can detect a significant part of changes in rock type reducing the change detection delay from $20$ to $1.8$ meters and the number of false-positive alarms from $43$ to $6$ per well.
△ Less
Submitted 12 December, 2019; v1 submitted 27 March, 2019;
originally announced March 2019.
-
Interpolation error of misspecified Gaussian process regression
Authors:
A. Zaytsev,
E. Romanenkova,
D. Ermilov
Abstract:
An interpolation error is an integral of the squared error of a regression model over a domain of interest. We consider the interpolation error for the case of misspecified Gaussian process regression: used covariance function differs from the true one. We derive the interpolation error for an infinite grid design of experiments. In particular, we show that for Matern 1/2 covariance function poor…
▽ More
An interpolation error is an integral of the squared error of a regression model over a domain of interest. We consider the interpolation error for the case of misspecified Gaussian process regression: used covariance function differs from the true one. We derive the interpolation error for an infinite grid design of experiments. In particular, we show that for Matern 1/2 covariance function poor estimation of parameters only slightly affects the quality of interpolation. Then we proceed to numerical experiments that consider the misspecification for the most common covariance functions including other Matern and squared exponential covariance functions. For them, the quality of estimates of parameters affects the interpolation error.
△ Less
Submitted 26 March, 2018;
originally announced March 2018.