Search | arXiv e-print repository

arXiv:2404.02047 [pdf, other]

Universal representations for financial transactional data: embracing local, global, and external contexts

Authors: Alexandra Bazarova, Maria Kovaleva, Ilya Kuleshov, Evgenia Romanenkova, Alexander Stepikin, Alexandr Yugay, Dzhambulat Mollaev, Ivan Kireev, Andrey Savchenko, Alexey Zaytsev

Abstract: Effective processing of financial transactions is essential for banking data analysis. However, in this domain, most methods focus on specialized solutions to stand-alone problems instead of constructing universal representations suitable for many problems. We present a representation learning framework that addresses diverse business challenges. We also suggest novel generative models that accoun… ▽ More Effective processing of financial transactions is essential for banking data analysis. However, in this domain, most methods focus on specialized solutions to stand-alone problems instead of constructing universal representations suitable for many problems. We present a representation learning framework that addresses diverse business challenges. We also suggest novel generative models that account for data specifics, and a way to integrate external information into a client's representation, leveraging insights from other customers' actions. Finally, we offer a benchmark, describing representation quality globally, concerning the entire transaction history; locally, reflecting the client's current state; and dynamically, capturing representation evolution over time. Our generative approach demonstrates superior performance in local tasks, with an increase in ROC-AUC of up to 14\% for the next MCC prediction task and up to 46\% for downstream tasks from existing contrastive baselines. Incorporating external information improves the scores by an additional 20\%. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.01259 [pdf, other]

doi 10.1103/PhysRevD.109.112018

Improved Modelling of Detector Response Effects in Phonon-based Crystal Detectors used for Dark Matter Searches

Authors: M. J. Wilson, A. Zaytsev, B. von Krosigk, I. Alkhatib, M. Buchanan, R. Chen, M. D. Diamond, E. Figueroa-Feliciano, S. A. S. Harms, Z. Hong, K. T. Kennard, N. A. Kurinsky, R. Mahapatra, N. Mirabolfathi, V. Novati, M. Platt, R. Ren, A. Sattari, B. Schmidt, Y. Wang, S. Zatschler, E. Zhang, A. Zuniga

Abstract: Various dark matter search experiments employ phonon-based crystal detectors operated at cryogenic temperatures. Some of these detectors, including certain silicon detectors used by the SuperCDMS Collaboration, are able to achieve single-charge sensitivity when a voltage bias is applied across the detector. The total amount of phonon energy measured by such a detector is proportional to the number… ▽ More Various dark matter search experiments employ phonon-based crystal detectors operated at cryogenic temperatures. Some of these detectors, including certain silicon detectors used by the SuperCDMS Collaboration, are able to achieve single-charge sensitivity when a voltage bias is applied across the detector. The total amount of phonon energy measured by such a detector is proportional to the number of electron-hole pairs created by the interaction. However, crystal impurities and surface effects can cause propagating charges to either become trapped inside the crystal or create additional unpaired charges, producing non-quantized measured energy as a result. A new analytical model for describing these detector response effects in phonon-based crystal detectors is presented. This model improves upon previous versions by demonstrating how the detector response, and thus the measured energy spectrum, is expected to differ depending on the source of events. We use this model to extract detector response parameters for SuperCDMS HVeV detectors, and illustrate how this robust modelling can help statistically discriminate between sources of events in order to improve the sensitivity of dark matter search experiments. △ Less

Submitted 24 June, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

Comments: 19 pages, 7 figures

Journal ref: Phys. Rev. D 109, 112018 (2024)

arXiv:2402.14184 [pdf, other]

Diversity-Aware Ensembling of Language Models Based on Topological Data Analysis

Authors: Polina Proskura, Alexey Zaytsev

Abstract: Ensembles are important tools for improving the performance of machine learning models. In cases related to natural language processing, ensembles boost the performance of a method due to multiple large models available in open source. However, existing approaches mostly rely on simple averaging of predictions by ensembles with equal weights for each model, ignoring differences in the quality and… ▽ More Ensembles are important tools for improving the performance of machine learning models. In cases related to natural language processing, ensembles boost the performance of a method due to multiple large models available in open source. However, existing approaches mostly rely on simple averaging of predictions by ensembles with equal weights for each model, ignoring differences in the quality and conformity of models. We propose to estimate weights for ensembles of NLP models using not only knowledge of their individual performance but also their similarity to each other. By adopting distance measures based on Topological Data Analysis (TDA), we improve our ensemble. The quality improves for both text classification accuracy and relevant uncertainty estimation. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.09766 [pdf, other]

From Variability to Stability: Advancing RecSys Benchmarking Practices

Authors: Valeriy Shevchenko, Nikita Belousov, Alexey Vasilev, Vladimir Zholobov, Artyom Sosedka, Natalia Semenova, Anna Volodkevich, Andrey Savchenko, Alexey Zaytsev

Abstract: In the rapidly evolving domain of Recommender Systems (RecSys), new algorithms frequently claim state-of-the-art performance based on evaluations over a limited set of arbitrarily selected datasets. However, this approach may fail to holistically reflect their effectiveness due to the significant impact of dataset characteristics on algorithm performance. Addressing this deficiency, this paper int… ▽ More In the rapidly evolving domain of Recommender Systems (RecSys), new algorithms frequently claim state-of-the-art performance based on evaluations over a limited set of arbitrarily selected datasets. However, this approach may fail to holistically reflect their effectiveness due to the significant impact of dataset characteristics on algorithm performance. Addressing this deficiency, this paper introduces a novel benchmarking methodology to facilitate a fair and robust comparison of RecSys algorithms, thereby advancing evaluation practices. By utilizing a diverse set of $30$ open datasets, including two introduced in this work, and evaluating $11$ collaborative filtering algorithms across $9$ metrics, we critically examine the influence of dataset characteristics on algorithm performance. We further investigate the feasibility of aggregating outcomes from multiple datasets into a unified ranking. Through rigorous experimental analysis, we validate the reliability of our methodology under the variability of datasets, offering a benchmarking strategy that balances quality and computational demands. This methodology enables a fair yet effective means of evaluating RecSys algorithms, providing valuable guidance for future research endeavors. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 8 pages with 11 figures

arXiv:2311.11057 [pdf, other]

Challenges in data-based geospatial modeling for environmental research and practice

Authors: Diana Koldasbayeva, Polina Tregubova, Mikhail Gasanov, Alexey Zaytsev, Anna Petrovskaia, Evgeny Burnaev

Abstract: With the rise of electronic data, particularly Earth observation data, data-based geospatial modelling using machine learning (ML) has gained popularity in environmental research. Accurate geospatial predictions are vital for domain research based on ecosystem monitoring and quality assessment and for policy-making and action planning, considering effective management of natural resources. The acc… ▽ More With the rise of electronic data, particularly Earth observation data, data-based geospatial modelling using machine learning (ML) has gained popularity in environmental research. Accurate geospatial predictions are vital for domain research based on ecosystem monitoring and quality assessment and for policy-making and action planning, considering effective management of natural resources. The accuracy and computation speed of ML has generally proved efficient. However, many questions have yet to be addressed to obtain precise and reproducible results suitable for further use in both research and practice. A better understanding of the ML concepts applicable to geospatial problems enhances the development of data science tools providing transparent information crucial for making decisions on global challenges such as biosphere degradation and climate change. This survey reviews common nuances in geospatial modelling, such as imbalanced data, spatial autocorrelation, prediction errors, model generalisation, domain specificity, and uncertainty estimation. We provide an overview of techniques and popular programming tools to overcome or account for the challenges. We also discuss prospects for geospatial Artificial Intelligence in environmental applications. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2311.05317 [pdf, other]

RepQ: Generalizing Quantization-Aware Training for Re-Parametrized Architectures

Authors: Anastasiia Prutianova, Alexey Zaytsev, Chung-Kuei Lee, Fengyu Sun, Ivan Koryakovskiy

Abstract: Existing neural networks are memory-consuming and computationally intensive, making deploying them challenging in resource-constrained environments. However, there are various methods to improve their efficiency. Two such methods are quantization, a well-known approach for network compression, and re-parametrization, an emerging technique designed to improve model performance. Although both techni… ▽ More Existing neural networks are memory-consuming and computationally intensive, making deploying them challenging in resource-constrained environments. However, there are various methods to improve their efficiency. Two such methods are quantization, a well-known approach for network compression, and re-parametrization, an emerging technique designed to improve model performance. Although both techniques have been studied individually, there has been limited research on their simultaneous application. To address this gap, we propose a novel approach called RepQ, which applies quantization to re-parametrized networks. Our method is based on the insight that the test stage weights of an arbitrary re-parametrized layer can be presented as a differentiable function of trainable parameters. We enable quantization-aware training by applying quantization on top of this function. RepQ generalizes well to various re-parametrized models and outperforms the baseline method LSQ quantization scheme in all experiments. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: BMVC 2023 (Oral)

arXiv:2310.02525 [pdf, other]

Assessing the Risk of Permafrost Degradation with Physics-Informed Machine Learning

Authors: Polina Pilyugina, Timofey Chernikov, Alexey Zaytsev, Alexander Bulkin, Evgeny Burnaev, Ilya Belalov, Nazar Sotiriadi, Yury Maximov, Oleg Anisimov

Abstract: Global warming accelerates permafrost degradation, impacting the reliability of critical infrastructure used by more than five million people daily. Furthermore, permafrost thaw produces substantial methane emissions, further accelerating global warming and climate change and putting more than eight billion people at additional risk. To mitigate the upcoming risk, policymakers and stakeholders mus… ▽ More Global warming accelerates permafrost degradation, impacting the reliability of critical infrastructure used by more than five million people daily. Furthermore, permafrost thaw produces substantial methane emissions, further accelerating global warming and climate change and putting more than eight billion people at additional risk. To mitigate the upcoming risk, policymakers and stakeholders must be given an accurate prediction of the thaw development. Unfortunately, comprehensive physics-based permafrost models require location-specific fine-tuning that is challenging in practice. Models of intermediate complexity require few input parameters but have relatively low accuracy. The performance of pure data-driven models is low as well as the observational data is sparse and limited. In this work, we designed a physics-informed machine-learning approach for permafrost thaw prediction. The method uses a heat equation to regularize data-driven approach trained over permafrost monitoring data and climate projections. The latter leads to higher precision and better numerical stability allowing for reliable decision-making or construction and maintenance in the areas endangered by permafrost thaw with a time horizon of decades. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 30 pages, 24 figures

arXiv:2309.06527 [pdf, other]

Machine Translation Models Stand Strong in the Face of Adversarial Attacks

Authors: Pavel Burnyshev, Elizaveta Kostenok, Alexey Zaytsev

Abstract: Adversarial attacks expose vulnerabilities of deep learning models by introducing minor perturbations to the input, which lead to substantial alterations in the output. Our research focuses on the impact of such adversarial attacks on sequence-to-sequence (seq2seq) models, specifically machine translation models. We introduce algorithms that incorporate basic text perturbation heuristics and more… ▽ More Adversarial attacks expose vulnerabilities of deep learning models by introducing minor perturbations to the input, which lead to substantial alterations in the output. Our research focuses on the impact of such adversarial attacks on sequence-to-sequence (seq2seq) models, specifically machine translation models. We introduce algorithms that incorporate basic text perturbation heuristics and more advanced strategies, such as the gradient-based attack, which utilizes a differentiable approximation of the inherently non-differentiable translation metric. Through our investigation, we provide evidence that machine translation models display robustness displayed robustness against best performed known adversarial attacks, as the degree of perturbation in the output is directly proportional to the perturbation in the input. However, among underdogs, our attacks outperform alternatives, providing the best relative performance. Another strong candidate is an attack based on mixing of individual characters. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Journal ref: AIST-2023

arXiv:2309.06212 [pdf, other]

Long-term drought prediction using deep neural networks based on geospatial weather data

Authors: Alexander Marusov, Vsevolod Grabar, Yury Maximov, Nazar Sotiriadi, Alexander Bulkin, Alexey Zaytsev

Abstract: The problem of high-quality drought forecasting up to a year in advance is critical for agriculture planning and insurance. Yet, it is still unsolved with reasonable accuracy due to data complexity and aridity stochasticity. We tackle drought data by introducing an end-to-end approach that adopts a spatio-temporal neural network model with accessible open monthly climate data as the input. Our s… ▽ More The problem of high-quality drought forecasting up to a year in advance is critical for agriculture planning and insurance. Yet, it is still unsolved with reasonable accuracy due to data complexity and aridity stochasticity. We tackle drought data by introducing an end-to-end approach that adopts a spatio-temporal neural network model with accessible open monthly climate data as the input. Our systematic research employs diverse proposed models and five distinct environmental regions as a testbed to evaluate the efficacy of the Palmer Drought Severity Index (PDSI) prediction. Key aggregated findings are the exceptional performance of a Transformer model, EarthFormer, in making accurate short-term (up to six months) forecasts. At the same time, the Convolutional LSTM excels in longer-term forecasting. Both models achieved high ROC AUC scores: 0.948 for one month ahead and 0.617 for twelve months ahead forecasts, becoming closer to perfect ROC-AUC by $54\%$ and $16\%$, respectively, c.t. classic approaches. △ Less

Submitted 1 July, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.04824 [pdf, other]

Correcting sampling biases via importance reweighting for spatial modeling

Authors: Boris Prokhorov, Diana Koldasbayeva, Alexey Zaytsev

Abstract: In machine learning models, the estimation of errors is often complex due to distribution bias, particularly in spatial data such as those found in environmental studies. We introduce an approach based on the ideas of importance sampling to obtain an unbiased estimate of the target error. By taking into account difference between desirable error and available data, our method reweights errors at e… ▽ More In machine learning models, the estimation of errors is often complex due to distribution bias, particularly in spatial data such as those found in environmental studies. We introduce an approach based on the ideas of importance sampling to obtain an unbiased estimate of the target error. By taking into account difference between desirable error and available data, our method reweights errors at each sample point and neutralizes the shift. Importance sampling technique and kernel density estimation were used for reweighteing. We validate the effectiveness of our approach using artificial data that resemble real-world spatial datasets. Our findings demonstrate advantages of the proposed approach for the estimation of the target error, offering a solution to a distribution shift problem. Overall error of predictions dropped from 7% to just 2% and it gets smaller for larger samples. △ Less

Submitted 14 September, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

arXiv:2308.11406 [pdf, other]

Designing an attack-defense game: how to increase robustness of financial transaction models via a competition

Authors: Alexey Zaytsev, Alex Natekin, Evgeni Vorsin, Valerii Smirnov, Georgii Smirnov, Oleg Sidorshin, Alexander Senin, Alexander Dudin, Dmitry Berestnev

Abstract: Given the escalating risks of malicious attacks in the finance sector and the consequential severe damage, a thorough understanding of adversarial strategies and robust defense mechanisms for machine learning models is critical. The threat becomes even more severe with the increased adoption in banks more accurate, but potentially fragile neural networks. We aim to investigate the current state an… ▽ More Given the escalating risks of malicious attacks in the finance sector and the consequential severe damage, a thorough understanding of adversarial strategies and robust defense mechanisms for machine learning models is critical. The threat becomes even more severe with the increased adoption in banks more accurate, but potentially fragile neural networks. We aim to investigate the current state and dynamics of adversarial attacks and defenses for neural network models that use sequential financial data as the input. To achieve this goal, we have designed a competition that allows realistic and detailed investigation of problems in modern financial transaction data. The participants compete directly against each other, so possible attacks and defenses are examined in close-to-real-life conditions. Our main contributions are the analysis of the competition dynamics that answers the questions on how important it is to conceal a model from malicious users, how long does it take to break it, and what techniques one should use to make it more robust, and introduction additional way to attack models or increase their robustness. Our analysis continues with a meta-study on the used approaches with their power, numerical experiments, and accompanied ablations studies. We show that the developed attacks and defenses outperform existing alternatives from the literature while being practical in terms of execution, proving the validity of the competition as a tool for uncovering vulnerabilities of machine learning models and mitigating them in various domains. △ Less

Submitted 23 August, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.11295 [pdf, other]

Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices

Authors: Elizaveta Kostenok, Daniil Cherniavskii, Alexey Zaytsev

Abstract: Determining the degree of confidence of deep learning model in its prediction is an open problem in the field of natural language processing. Most of the classical methods for uncertainty estimation are quite weak for text classification models. We set the task of obtaining an uncertainty estimate for neural networks based on the Transformer architecture. A key feature of such mo-dels is the atten… ▽ More Determining the degree of confidence of deep learning model in its prediction is an open problem in the field of natural language processing. Most of the classical methods for uncertainty estimation are quite weak for text classification models. We set the task of obtaining an uncertainty estimate for neural networks based on the Transformer architecture. A key feature of such mo-dels is the attention mechanism, which supports the information flow between the hidden representations of tokens in the neural network. We explore the formed relationships between internal representations using Topological Data Analysis methods and utilize them to predict model's confidence. In this paper, we propose a method for uncertainty estimation based on the topological properties of the attention mechanism and compare it with classical methods. As a result, the proposed algorithm surpasses the existing methods in quality and opens up a new area of application of the attention mechanism, but requires the selection of topological features. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.10201 [pdf, other]

Hiding Backdoors within Event Sequence Data via Poisoning Attacks

Authors: Elizaveta Kovtun, Alina Ermilova, Dmitry Berestnev, Alexey Zaytsev

Abstract: The financial industry relies on deep learning models for making important decisions. This adoption brings new danger, as deep black-box models are known to be vulnerable to adversarial attacks. In computer vision, one can shape the output during inference by performing an adversarial attack called poisoning via introducing a backdoor into the model during training. For sequences of financial tran… ▽ More The financial industry relies on deep learning models for making important decisions. This adoption brings new danger, as deep black-box models are known to be vulnerable to adversarial attacks. In computer vision, one can shape the output during inference by performing an adversarial attack called poisoning via introducing a backdoor into the model during training. For sequences of financial transactions of a customer, insertion of a backdoor is harder to perform, as models operate over a more complex discrete space of sequences, and systematic checks for insecurities occur. We provide a method to introduce concealed backdoors, creating vulnerabilities without altering their functionality for uncontaminated data. To achieve this, we replace a clean model with a poisoned one that is aware of the availability of a backdoor and utilize this knowledge. Our most difficult for uncovering attacks include either additional supervised detection step of poisoned data activated during the test or well-hidden model weight modifications. The experimental study provides insights into how these effects vary across different datasets, architectures, and model components. Alternative methods and baselines, such as distillation-type regularization, are also explored but found to be less efficient. Conducted on three open transaction datasets and architectures, including LSTM, CNN, and Transformer, our findings not only illuminate the vulnerabilities in contemporary models but also can drive the construction of more robust systems. △ Less

Submitted 20 August, 2023; originally announced August 2023.

arXiv:2308.07944 [pdf, other]

Portfolio Selection via Topological Data Analysis

Authors: Petr Sokerin, Kristian Kuznetsov, Elizaveta Makhneva, Alexey Zaytsev

Abstract: Portfolio management is an essential part of investment decision-making. However, traditional methods often fail to deliver reasonable performance. This problem stems from the inability of these methods to account for the unique characteristics of multivariate time series data from stock markets. We present a two-stage method for constructing an investment portfolio of common stocks. The method in… ▽ More Portfolio management is an essential part of investment decision-making. However, traditional methods often fail to deliver reasonable performance. This problem stems from the inability of these methods to account for the unique characteristics of multivariate time series data from stock markets. We present a two-stage method for constructing an investment portfolio of common stocks. The method involves the generation of time series representations followed by their subsequent clustering. Our approach utilizes features based on Topological Data Analysis (TDA) for the generation of representations, allowing us to elucidate the topological structure within the data. Experimental results show that our proposed system outperforms other methods. This superior performance is consistent over different time frames, suggesting the viability of TDA as a powerful tool for portfolio selection. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2303.02196 [pdf, other]

doi 10.1103/PhysRevLett.131.091801

First measurement of the nuclear-recoil ionization yield in silicon at 100 eV

Authors: M. F. Albakry, I. Alkhatib, D. Alonso, D. W. P. Amaral, P. An, T. Aralis, T. Aramaki, I. J. Arnquist, I. Ataee Langroudy, E. Azadbakht, S. Banik, P. S. Barbeau, C. Bathurst, R. Bhattacharyya, P. L. Brink, R. Bunker, B. Cabrera, R. Calkins, R. A. Cameron, C. Cartaro, D. G. Cerdeño, Y. -Y. Chang, M. Chaudhuri, R. Chen, N. Chott , et al. (115 additional authors not shown)

Abstract: We measured the nuclear--recoil ionization yield in silicon with a cryogenic phonon-sensitive gram-scale detector. Neutrons from a mono-energetic beam scatter off of the silicon nuclei at angles corresponding to energy depositions from 4\,keV down to 100\,eV, the lowest energy probed so far. The results show no sign of an ionization production threshold above 100\,eV. These results call for furthe… ▽ More We measured the nuclear--recoil ionization yield in silicon with a cryogenic phonon-sensitive gram-scale detector. Neutrons from a mono-energetic beam scatter off of the silicon nuclei at angles corresponding to energy depositions from 4\,keV down to 100\,eV, the lowest energy probed so far. The results show no sign of an ionization production threshold above 100\,eV. These results call for further investigation of the ionization yield theory and a comprehensive determination of the detector response function at energies below the keV scale. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Journal ref: Physical Review Letters 131.9 (2023): 091801

arXiv:2303.00280 [pdf, other]

Label Attention Network for sequential multi-label classification: you were looking at a wrong self-attention

Authors: Elizaveta Kovtun, Galina Boeva, Artem Zabolotnyi, Evgeny Burnaev, Martin Spindler, Alexey Zaytsev

Abstract: Most of the available user information can be represented as a sequence of timestamped events. Each event is assigned a set of categorical labels whose future structure is of great interest. For instance, our goal is to predict a group of items in the next customer's purchase or tomorrow's client transactions. This is a multi-label classification problem for sequential data. Modern approaches focu… ▽ More Most of the available user information can be represented as a sequence of timestamped events. Each event is assigned a set of categorical labels whose future structure is of great interest. For instance, our goal is to predict a group of items in the next customer's purchase or tomorrow's client transactions. This is a multi-label classification problem for sequential data. Modern approaches focus on transformer architecture for sequential data introducing self-attention for the elements in a sequence. In that case, we take into account events' time interactions but lose information on label inter-dependencies. Motivated by this shortcoming, we propose leveraging a self-attention mechanism over labels preceding the predicted step. As our approach is a Label-Attention NETwork, we call it LANET. Experimental evidence suggests that LANET outperforms the established models' performance and greatly captures interconnections between labels. For example, the micro-AUC of our approach is $0.9536$ compared to $0.7501$ for a vanilla transformer. We provide an implementation of LANET to facilitate its wider usage. △ Less

Submitted 4 April, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

arXiv:2302.09115 [pdf, other]

doi 10.1103/PhysRevD.107.112013

A Search for Low-mass Dark Matter via Bremsstrahlung Radiation and the Migdal Effect in SuperCDMS

Authors: M. F. Albakry, I. Alkhatib, D. Alonso, D. W. P. Amaral, T. Aralis, T. Aramaki, I. J. Arnquist, I. Ataee Langroudy, E. Azadbakht, S. Banik, C. Bathurst, R. Bhattacharyya, P. L. Brink, R. Bunker, B. Cabrera, R. Calkins, R. A. Cameron, C. Cartaro, D. G. Cerdeño, Y. -Y. Chang, M. Chaudhuri, R. Chen, N. Chott, J. Cooley, H. Coombes , et al. (108 additional authors not shown)

Abstract: We present a new analysis of previously published of SuperCDMS data using a profile likelihood framework to search for sub-GeV dark matter (DM) particles through two inelastic scattering channels: bremsstrahlung radiation and the Migdal effect. By considering these possible inelastic scattering channels, experimental sensitivity can be extended to DM masses that are undetectable through the DM-nuc… ▽ More We present a new analysis of previously published of SuperCDMS data using a profile likelihood framework to search for sub-GeV dark matter (DM) particles through two inelastic scattering channels: bremsstrahlung radiation and the Migdal effect. By considering these possible inelastic scattering channels, experimental sensitivity can be extended to DM masses that are undetectable through the DM-nucleon elastic scattering channel, given the energy threshold of current experiments. We exclude DM masses down to $220~\textrm{MeV}/c^2$ at $2.7 \times 10^{-30}~\textrm{cm}^2$ via the bremsstrahlung channel. The Migdal channel search provides overall considerably more stringent limits and excludes DM masses down to $30~\textrm{MeV}/c^2$ at $5.0 \times 10^{-30}~\textrm{cm}^2$. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: Submitted to PRD

Report number: 112013

Journal ref: Phys. Rev. D 107, 2023

arXiv:2302.06247 [pdf, other]

Continuous-time convolutions model of event sequences

Authors: Vladislav Zhuzhel, Vsevolod Grabar, Galina Boeva, Artem Zabolotnyi, Alexander Stepikin, Vladimir Zholobov, Maria Ivanova, Mikhail Orlov, Ivan Kireev, Evgeny Burnaev, Rodrigo Rivera-Castro, Alexey Zaytsev

Abstract: Massive samples of event sequences data occur in various domains, including e-commerce, healthcare, and finance. There are two main challenges regarding inference of such data: computational and methodological. The amount of available data and the length of event sequences per client are typically large, thus it requires long-term modelling. Moreover, this data is often sparse and non-uniform, mak… ▽ More Massive samples of event sequences data occur in various domains, including e-commerce, healthcare, and finance. There are two main challenges regarding inference of such data: computational and methodological. The amount of available data and the length of event sequences per client are typically large, thus it requires long-term modelling. Moreover, this data is often sparse and non-uniform, making classic approaches for time series processing inapplicable. Existing solutions include recurrent and transformer architectures in such cases. To allow continuous time, the authors introduce specific parametric intensity functions defined at each moment on top of existing models. Due to the parametric nature, these intensities represent only a limited class of event sequences. We propose the COTIC method based on a continuous convolution neural network suitable for non-uniform occurrence of events in time. In COTIC, dilations and multi-layer architecture efficiently handle dependencies between events. Furthermore, the model provides general intensity dynamics in continuous time - including self-excitement encountered in practice. The COTIC model outperforms existing approaches on majority of the considered datasets, producing embeddings for an event sequence that can be used to solve downstream tasks - e.g. predicting next event type and return time. The code of the proposed method can be found in the GitHub repository (https://github.com/VladislavZh/COTIC). △ Less

Submitted 13 February, 2023; originally announced February 2023.

Comments: 9 pages, 3 figures

arXiv:2302.02834 [pdf, other]

Uncertainty estimation for time series forecasting via Gaussian process regression surrogates

Authors: Leonid Erlygin, Vladimir Zholobov, Valeriia Baklanova, Evgeny Sokolovskiy, Alexey Zaytsev

Abstract: Machine learning models are widely used to solve real-world problems in science and industry. To build robust models, we should quantify the uncertainty of the model's predictions on new data. This study proposes a new method for uncertainty estimation based on the surrogate Gaussian process model. Our method can equip any base model with an accurate uncertainty estimate produced by a separate sur… ▽ More Machine learning models are widely used to solve real-world problems in science and industry. To build robust models, we should quantify the uncertainty of the model's predictions on new data. This study proposes a new method for uncertainty estimation based on the surrogate Gaussian process model. Our method can equip any base model with an accurate uncertainty estimate produced by a separate surrogate. Compared to other approaches, the estimate remains computationally effective with training only one additional model and doesn't rely on data-specific assumptions. The only requirement is the availability of the base model as a black box, which is typical. Experiments for challenging time-series forecasting data show that surrogate model-based methods provide more accurate confidence intervals than bootstrap-based methods in both medium and small-data regimes and different families of base models, including linear regression, ARIMA, and gradient boosting. △ Less

Submitted 6 February, 2023; originally announced February 2023.

arXiv:2212.14246 [pdf, other]

Robust representations of oil wells' intervals via sparse attention mechanism

Authors: Alina Ermilova, Nikita Baramiia, Valerii Kornilov, Sergey Petrakov, Alexey Zaytsev

Abstract: Transformer-based neural network architectures achieve state-of-the-art results in different domains, from natural language processing (NLP) to computer vision (CV). The key idea of Transformers, the attention mechanism, has already led to significant breakthroughs in many areas. The attention has found their implementation for time series data as well. However, due to the quadratic complexity of… ▽ More Transformer-based neural network architectures achieve state-of-the-art results in different domains, from natural language processing (NLP) to computer vision (CV). The key idea of Transformers, the attention mechanism, has already led to significant breakthroughs in many areas. The attention has found their implementation for time series data as well. However, due to the quadratic complexity of the attention calculation regarding input sequence length, the application of Transformers is limited by high resource demands. Moreover, their modifications for industrial time series need to be robust to missing or noised values, which complicates the expansion of the horizon of their application. To cope with these issues, we introduce the class of efficient Transformers named Regularized Transformers (Reguformers). We implement the regularization technique inspired by the dropout ideas to improve robustness and reduce computational expenses. The focus in our experiments is on oil&gas data, namely, well logs, a prominent example of multivariate time series. The goal is to solve the problems of similarity and representation learning for them. To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells. The experiments show that all variations of Reguformers outperform the previously developed RNNs, classical Transformer model, and robust modifications of it like Informer and Performer in terms of well-intervals' classification and the quality of the obtained well-intervals' representations. Moreover, the sustainability to missing and incorrect data in our models exceeds that of others by a significant margin. The best result that the Reguformer achieves on well-interval similarity task is the mean PR~AUC score equal to 0.983, which is comparable to the classical Transformer and outperforms the previous models. △ Less

Submitted 6 November, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

arXiv:2210.00773 [pdf, ps, other]

doi 10.1103/PhysRevA.107.032801

Complex-scaled ab initio QED approach to autoionizing states

Authors: V. A. Zaytsev, A. V. Malyshev, V. M. Shabaev

Abstract: Ab initio method based on a complex-scaling approach and aimed at a rigorous QED description of autoionizing states is worked out. The autoionizing-state binding energies are treated nonperturbatively in $αZ$ and include all the many-electron QED contributions up to the second order. The higher-order electron correlation, nuclear recoil, and nuclear polarization effects are taken into account as w… ▽ More Ab initio method based on a complex-scaling approach and aimed at a rigorous QED description of autoionizing states is worked out. The autoionizing-state binding energies are treated nonperturbatively in $αZ$ and include all the many-electron QED contributions up to the second order. The higher-order electron correlation, nuclear recoil, and nuclear polarization effects are taken into account as well. The developed formalism is demonstrated on the $LL$ resonances in heliumlike argon and uranium. The most accurate theoretical predictions for the binding energies are obtained. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: 6 pages, 3 figures

arXiv:2209.14750 [pdf, other]

doi 10.1109/LGRS.2023.3277214

Non-contrastive representation learning for intervals from well logs

Authors: Alexander Marusov, Alexey Zaytsev

Abstract: The representation learning problem in the oil & gas industry aims to construct a model that provides a representation based on logging data for a well interval. Previous attempts are mainly supervised and focus on similarity task, which estimates closeness between intervals. We desire to build informative representations without using supervised (labelled) data. One of the possible approaches is… ▽ More The representation learning problem in the oil & gas industry aims to construct a model that provides a representation based on logging data for a well interval. Previous attempts are mainly supervised and focus on similarity task, which estimates closeness between intervals. We desire to build informative representations without using supervised (labelled) data. One of the possible approaches is self-supervised learning (SSL). In contrast to the supervised paradigm, this one requires little or no labels for the data. Nowadays, most SSL approaches are either contrastive or non-contrastive. Contrastive methods make representations of similar (positive) objects closer and distancing different (negative) ones. Due to possible wrong marking of positive and negative pairs, these methods can provide an inferior performance. Non-contrastive methods don't rely on such labelling and are widespread in computer vision. They learn using only pairs of similar objects that are easier to identify in logging data. We are the first to introduce non-contrastive SSL for well-logging data. In particular, we exploit Bootstrap Your Own Latent (BYOL) and Barlow Twins methods that avoid using negative pairs and focus only on matching positive pairs. The crucial part of these methods is an augmentation strategy. Our augmentation strategies and adaption of BYOL and Barlow Twins together allow us to achieve superior quality on clusterization and mostly the best performance on different classification tasks. Our results prove the usefulness of the proposed non-contrastive self-supervised approaches for representation learning and interval similarity in particular. △ Less

Submitted 10 November, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

Comments: IEEE Geoscience and Remote Sensing Letters (2023)

arXiv:2209.12444 [pdf, other]

Self-supervised similarity models based on well-logging data

Authors: Sergey Egorov, Narek Gevorgyan, Alexey Zaytsev

Abstract: Adopting data-based approaches leads to model improvement in numerous Oil&Gas logging data processing problems. These improvements become even more sound due to new capabilities provided by deep learning. However, usage of deep learning is limited to areas where researchers possess large amounts of high-quality data. We present an approach that provides universal data representations suitable for… ▽ More Adopting data-based approaches leads to model improvement in numerous Oil&Gas logging data processing problems. These improvements become even more sound due to new capabilities provided by deep learning. However, usage of deep learning is limited to areas where researchers possess large amounts of high-quality data. We present an approach that provides universal data representations suitable for solutions to different problems for different oil fields with little additional data. Our approach relies on the self-supervised methodology for sequential logging data for intervals from well, so it also doesn't require labelled data from the start. For validation purposes of the received representations, we consider classification and clusterization problems. We as well consider the transfer learning scenario. We found out that using the variational autoencoder leads to the most reliable and accurate models. approach We also found that a researcher only needs a tiny separate data set for the target oil field to solve a specific problem on top of universal representations. △ Less

Submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.01880 [pdf, other]

ScaleFace: Uncertainty-aware Deep Metric Learning

Authors: Roman Kail, Kirill Fedyanin, Nikita Muravev, Alexey Zaytsev, Maxim Panov

Abstract: The performance of modern deep learning-based systems dramatically depends on the quality of input objects. For example, face recognition quality would be lower for blurry or corrupted inputs. However, it is hard to predict the influence of input quality on the resulting accuracy in more complex scenarios. We propose an approach for deep metric learning that allows direct estimation of the uncerta… ▽ More The performance of modern deep learning-based systems dramatically depends on the quality of input objects. For example, face recognition quality would be lower for blurry or corrupted inputs. However, it is hard to predict the influence of input quality on the resulting accuracy in more complex scenarios. We propose an approach for deep metric learning that allows direct estimation of the uncertainty with almost no additional computational cost. The developed \textit{ScaleFace} algorithm uses trainable scale values that modify similarities in the space of embeddings. These input-dependent scale values represent a measure of confidence in the recognition result, thus allowing uncertainty estimation. We provide comprehensive experiments on face recognition tasks that show the superior performance of ScaleFace compared to other uncertainty-aware face recognition approaches. We also extend the results to the task of text-to-image retrieval showing that the proposed approach beats the competitors with significant margin. △ Less

Submitted 12 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

arXiv:2209.00463 [pdf, ps, other]

doi 10.1103/PhysRevA.106.012806

Model-QED operator for superheavy elements

Authors: A. V. Malyshev, D. A. Glazov, V. M. Shabaev, I. I. Tupitsyn, V. A. Yerokhin, V. A. Zaytsev

Abstract: The model-QED-operator approach [Phys. Rev. A 88, 012513 (2013)] to calculations of the radiative corrections to binding and transition energies in atomic systems is extended to the range of nuclear charges $110 \leqslant Z \leqslant 170$. The self-energy part of the model operator is represented by a nonlocal potential based on diagonal and off-diagonal matrix elements of the ab initio self-energ… ▽ More The model-QED-operator approach [Phys. Rev. A 88, 012513 (2013)] to calculations of the radiative corrections to binding and transition energies in atomic systems is extended to the range of nuclear charges $110 \leqslant Z \leqslant 170$. The self-energy part of the model operator is represented by a nonlocal potential based on diagonal and off-diagonal matrix elements of the ab initio self-energy operator with the Dirac-Coulomb wave functions. The vacuum-polarization part consists of the Uehling contribution which is readily computed for an arbitrary nuclear-charge distribution and the Wichmann-Kroll contribution represented in terms of matrix elements similarly to the self-energy part. Performance of the method is studied by comparing the model-QED-operator predictions with the results of ab initio calculations. The model-QED operator can be conveniently incorporated in any numerical approach based on the Dirac-Coulomb-Breit Hamiltonian to account for the QED effects in a wide variety of superheavy elements. △ Less

Submitted 1 September, 2022; originally announced September 2022.

Comments: 14 pages, 2 figures, 15 tables

Journal ref: Phys. Rev. A 106, 012806 (2022)

arXiv:2208.14839 [pdf, other]

QuantNAS for super resolution: searching for efficient quantization-friendly architectures against quantization noise

Authors: Egor Shvetsov, Dmitry Osin, Alexey Zaytsev, Ivan Koryakovskiy, Valentin Buchnev, Ilya Trofimov, Evgeny Burnaev

Abstract: There is a constant need for high-performing and computationally efficient neural network models for image super-resolution: computationally efficient models can be used via low-capacity devices and reduce carbon footprints. One way to obtain such models is to compress models, e.g. quantization. Another way is a neural architecture search that automatically discovers new, more efficient solutions.… ▽ More There is a constant need for high-performing and computationally efficient neural network models for image super-resolution: computationally efficient models can be used via low-capacity devices and reduce carbon footprints. One way to obtain such models is to compress models, e.g. quantization. Another way is a neural architecture search that automatically discovers new, more efficient solutions. We propose a novel quantization-aware procedure, the QuantNAS that combines pros of these two approaches. To make QuantNAS work, the procedure looks for quantization-friendly super-resolution models. The approach utilizes entropy regularization, quantization noise, and Adaptive Deviation for Quantization (ADQ) module to enhance the search procedure. The entropy regularization technique prioritizes a single operation within each block of the search space. Adding quantization noise to parameters and activations approximates model degradation after quantization, resulting in a more quantization-friendly architectures. ADQ helps to alleviate problems caused by Batch Norm blocks in super-resolution models. Our experimental results show that the proposed approximations are better for search procedure than direct model quantization. QuantNAS discovers architectures with better PSNR/BitOps trade-off than uniform or mixed precision quantization of fixed architectures. We showcase the effectiveness of our method through its application to two search spaces inspired by the state-of-the-art SR models and RFDN. Thus, anyone can design a proper search space based on an existing architecture and apply our method to obtain better quality and efficiency. The proposed procedure is 30\% faster than direct weight quantization and is more stable. △ Less

Submitted 10 January, 2024; v1 submitted 31 August, 2022; originally announced August 2022.

arXiv:2208.14833 [pdf, other]

Predicting spatial distribution of Palmer Drought Severity Index

Authors: V. Grabar, A. Lukashevich, A. Zaytsev

Abstract: The probability of a drought for a particular region is crucial when making decisions related to agriculture. Forecasting this probability is critical for management and challenging at the same time. The prediction model should consider multiple factors with complex relationships across the region of interest and neighbouring regions. We approach this problem by presenting an end-to-end solution… ▽ More The probability of a drought for a particular region is crucial when making decisions related to agriculture. Forecasting this probability is critical for management and challenging at the same time. The prediction model should consider multiple factors with complex relationships across the region of interest and neighbouring regions. We approach this problem by presenting an end-to-end solution based on a spatio-temporal neural network. The model predicts the Palmer Drought Severity Index (PDSI) for subregions of interest. Predictions by climate models provide an additional source of knowledge of the model leading to more accurate drought predictions. Our model has better accuracy than baseline Gradient boosting solutions, as the $R^2$ score for it is $0.90$ compared to $0.85$ for Gradient boosting. Specific attention is on the range of applicability of the model. We examine various regions across the globe to validate them under different conditions. We complement the results with an analysis of how future climate changes for different scenarios affect the PDSI and how our model can help to make better decisions and more sustainable economics. △ Less

Submitted 1 September, 2022; v1 submitted 31 August, 2022; originally announced August 2022.

arXiv:2207.08255 [pdf, other]

doi 10.1002/qua.27232

Calculation of the moscovium ground-state energy by quantum algorithms

Authors: V. A. Zaytsev, M. E. Groshev, I. A. Maltsev, A. V. Durova, V. M. Shabaev

Abstract: We investigate the possibility to calculate the ground-state energy of the atomic systems on a quantum computer. For this purpose we evaluate the lowest binding energy of the moscovium atom with the use of the iterative phase estimation and variational quantum eigensolver. The calculations by the variational quantum eigensolver are performed with a disentangled unitary coupled cluster ansatz and w… ▽ More We investigate the possibility to calculate the ground-state energy of the atomic systems on a quantum computer. For this purpose we evaluate the lowest binding energy of the moscovium atom with the use of the iterative phase estimation and variational quantum eigensolver. The calculations by the variational quantum eigensolver are performed with a disentangled unitary coupled cluster ansatz and with various types of hardware-efficient ansatze. The optimization is performed with the use of the Adam and Quantum Natural Gradients procedures. The scalability of the ansatze and optimizers is tested by increasing the size of the basis set and the number of active electrons. The number of gates required for the iterative phase estimation and variational quantum eigensolver is also estimated. △ Less

Submitted 24 November, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

Comments: 29 pages, 5 figures

arXiv:2206.13491 [pdf, other]

Effective training-time stacking for ensembling of deep neural networks

Authors: Polina Proscura, Alexey Zaytsev

Abstract: Ensembling is a popular and effective method for improving machine learning (ML) models. It proves its value not only in classical ML but also for deep learning. Ensembles enhance the quality and trustworthiness of ML solutions, and allow uncertainty estimation. However, they come at a price: training ensembles of deep learning models eat a huge amount of computational resources. A snapshot ense… ▽ More Ensembling is a popular and effective method for improving machine learning (ML) models. It proves its value not only in classical ML but also for deep learning. Ensembles enhance the quality and trustworthiness of ML solutions, and allow uncertainty estimation. However, they come at a price: training ensembles of deep learning models eat a huge amount of computational resources. A snapshot ensembling collects models in the ensemble along a single training path. As it runs training only one time, the computational time is similar to the training of one model. However, the quality of models along the training path is different: typically, later models are better if no overfitting occurs. So, the models are of varying utility. Our method improves snapshot ensembling by selecting and weighting ensemble members along the training path. It relies on training-time likelihoods without looking at validation sample errors that standard stacking methods do. Experimental evidence for Fashion MNIST, CIFAR-10, and CIFAR-100 datasets demonstrates the superior quality of the proposed weighted ensembles c.t. vanilla ensembling of deep learning models. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.13116 [pdf, other]

Transfer learning for ensembles: reducing computation time and kee** the diversity

Authors: Ilya Shashkov, Nikita Balabin, Evgeny Burnaev, Alexey Zaytsev

Abstract: Transferring a deep neural network trained on one problem to another requires only a small amount of data and little additional computation time. The same behaviour holds for ensembles of deep learning models typically superior to a single model. However, a transfer of deep neural networks ensemble demands relatively high computational expenses. The probability of overfitting also increases. Our… ▽ More Transferring a deep neural network trained on one problem to another requires only a small amount of data and little additional computation time. The same behaviour holds for ensembles of deep learning models typically superior to a single model. However, a transfer of deep neural networks ensemble demands relatively high computational expenses. The probability of overfitting also increases. Our approach for the transfer learning of ensembles consists of two steps: (a) shifting weights of encoders of all models in the ensemble by a single shift vector and (b) doing a tiny fine-tuning for each individual model afterwards. This strategy leads to a speed-up of the training process and gives an opportunity to add models to an ensemble with significantly reduced training time using the shift vector. We compare different strategies by computation time, the accuracy of an ensemble, uncertainty estimation and disagreement and conclude that our approach gives competitive results using the same computation complexity in comparison with the traditional approach. Also, our method keeps the ensemble's models' diversity higher. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.10691 [pdf, other]

Towards OOD Detection in Graph Classification from Uncertainty Estimation Perspective

Authors: Gleb Bazhenov, Sergei Ivanov, Maxim Panov, Alexey Zaytsev, Evgeny Burnaev

Abstract: The problem of out-of-distribution detection for graph classification is far from being solved. The existing models tend to be overconfident about OOD examples or completely ignore the detection task. In this work, we consider this problem from the uncertainty estimation perspective and perform the comparison of several recently proposed methods. In our experiment, we find that there is no univers… ▽ More The problem of out-of-distribution detection for graph classification is far from being solved. The existing models tend to be overconfident about OOD examples or completely ignore the detection task. In this work, we consider this problem from the uncertainty estimation perspective and perform the comparison of several recently proposed methods. In our experiment, we find that there is no universal approach for OOD detection, and it is important to consider both graph representations and predictive categorical distribution. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: ICML 2022 PODS Workshop

arXiv:2206.00110 [pdf, other]

doi 10.1103/PhysRevA.106.033119

Scattering of a twisted electron wavepacket by a finite laser pulse

Authors: I. A. Aleksandrov, D. A. Tumakov, A. Kudlis, V. A. Zaytsev, N. N. Rosanov

Abstract: The behavior of a twisted electron colliding with a linearly polarized laser pulse is investigated within relativistic quantum mechanics. In order to better fit the real experimental conditions, we introduce a Gaussian spatial profile for the initial electron state as well as an envelope function for the laser pulse, so the both interacting objects have a finite size along the laser propagation di… ▽ More The behavior of a twisted electron colliding with a linearly polarized laser pulse is investigated within relativistic quantum mechanics. In order to better fit the real experimental conditions, we introduce a Gaussian spatial profile for the initial electron state as well as an envelope function for the laser pulse, so the both interacting objects have a finite size along the laser propagation direction. For this setup we analyze the dynamics of various observable quantities regarding the electron state: the probability density, angular momentum, and mean values of the spatial coordinates. It is shown that the motion of a twisted wavepacket can be accurately described by averaging over classical trajectories with various directions of the transverse momentum component. On the other hand, full quantum simulations demonstrate that the ring structure of the wavepacket in the transverse plane can be significantly distorted leading to large uncertainties in the total angular momentum of the electron. This effect remains after the interaction once the laser pulse has a nonzero electric-field area. △ Less

Submitted 31 May, 2022; originally announced June 2022.

Comments: 12 pages, 7 figures

arXiv:2205.11683 [pdf, other]

Effective Field Theory Analysis of CDMSlite Run 2 Data

Authors: SuperCDMS Collaboration, M. F. Albakry, I. Alkhatib, D. W. P. Amaral, T. Aralis, T. Aramaki, I. J. Arnquist, I. Ataee Langroudy, E. Azadbakht, S. Banik, C. Bathurst, D. A. Bauer, L. V. S. Bezerra, R. Bhattacharyya, P. L. Brink, R. Bunker, B. Cabrera, R. Calkins, R. A. Cameron, C. Cartaro, D. G. Cerdeño, Y. -Y. Chang, M. Chaudhuri, R. Chen, N. Chott , et al. (105 additional authors not shown)

Abstract: CDMSlite Run 2 was a search for weakly interacting massive particles (WIMPs) with a cryogenic 600 g Ge detector operated in a high-voltage mode to optimize sensitivity to WIMPs of relatively low mass from 2 - 20 GeV/$c^2$. In this article, we present an effective field theory (EFT) analysis of the CDMSlite Run 2 data using an extended energy range and a comprehensive treatment of the expected back… ▽ More CDMSlite Run 2 was a search for weakly interacting massive particles (WIMPs) with a cryogenic 600 g Ge detector operated in a high-voltage mode to optimize sensitivity to WIMPs of relatively low mass from 2 - 20 GeV/$c^2$. In this article, we present an effective field theory (EFT) analysis of the CDMSlite Run 2 data using an extended energy range and a comprehensive treatment of the expected background. A binned likelihood Bayesian analysis was performed on the recoil energy data, taking into account the parameters of the EFT interactions and optimizing the data selection with respect to the dominant background components. Energy regions within 5$σ$ of known activation peaks were removed from the analysis. The Bayesian evidences resulting from the different operator hypotheses show that the CDMSlite Run 2 data are consistent with the background-only models and do not allow for a signal interpretation assuming any additional EFT interaction. Consequently, upper limits on the WIMP mass and coupling-coefficient amplitudes and phases are presented for each EFT operator. These limits improve previous CDMSlite Run 2 bounds for WIMP masses above 5 GeV/$c^2$. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: 16 pages, 8 figures

arXiv:2204.08175 [pdf, other]

Usage of specific attention improves change point detection

Authors: Anna Dmitrienko, Evgenia Romanenkova, Alexey Zaytsev

Abstract: The change point is a moment of an abrupt alteration in the data distribution. Current methods for change point detection are based on recurrent neural methods suitable for sequential data. However, recent works show that transformers based on attention mechanisms perform better than standard recurrent models for many tasks. The most benefit is noticeable in the case of longer sequences. In this p… ▽ More The change point is a moment of an abrupt alteration in the data distribution. Current methods for change point detection are based on recurrent neural methods suitable for sequential data. However, recent works show that transformers based on attention mechanisms perform better than standard recurrent models for many tasks. The most benefit is noticeable in the case of longer sequences. In this paper, we investigate different attentions for the change point detection task and proposed specific form of attention related to the task at hand. We show that using a special form of attention outperforms state-of-the-art results. △ Less

Submitted 18 April, 2022; originally announced April 2022.

arXiv:2204.08038 [pdf, other]

doi 10.1103/PhysRevD.105.112006

Investigating the sources of low-energy events in a SuperCDMS-HVeV detector

Authors: SuperCDMS Collaboration, M. F. Albakry, I. Alkhatib, D. W. P. Amaral, T. Aralis, T. Aramaki, I. J. Arnquist, I. Ataee Langroudy, E. Azadbakht, S. Banik, C. Bathurst, D. A. Bauer, R. Bhattacharyya, P. L. Brink, R. Bunker, B. Cabrera, R. Calkins, R. A. Cameron, C. Cartaro, D. G. Cerdeño, Y. -Y. Chang, M. Chaudhuri, R. Chen, N. Chott, J. Cooley , et al. (104 additional authors not shown)

Abstract: Recent experiments searching for sub-GeV/$c^2$ dark matter have observed event excesses close to their respective energy thresholds. Although specific to the individual technologies, the measured excess event rates have been consistently reported at or below event energies of a few-hundred eV, or with charges of a few electron-hole pairs. In the present work, we operated a 1-gram silicon SuperCDMS… ▽ More Recent experiments searching for sub-GeV/$c^2$ dark matter have observed event excesses close to their respective energy thresholds. Although specific to the individual technologies, the measured excess event rates have been consistently reported at or below event energies of a few-hundred eV, or with charges of a few electron-hole pairs. In the present work, we operated a 1-gram silicon SuperCDMS-HVeV detector at three voltages across the crystal (0 V, 60 V and 100 V). The 0 V data show an excess of events in the tens of eV region. Despite this event excess, we demonstrate the ability to set a competitive exclusion limit on the spin-independent dark matter--nucleon elastic scattering cross section for dark matter masses of $\mathcal{O}(100)$ MeV/$c^2$, enabled by operation of the detector at 0 V potential and achievement of a very low $\mathcal{O}(10)$ eV threshold for nuclear recoils. Comparing the data acquired at 0 V, 60 V and 100 V potentials across the crystal, we investigated possible sources of the unexpected events observed at low energy. The data indicate that the dominant contribution to the excess is consistent with a hypothesized luminescence from the printed circuit boards used in the detector holder. △ Less

Submitted 11 October, 2022; v1 submitted 17 April, 2022; originally announced April 2022.

arXiv:2204.07403 [pdf, other]

Deep learning model solves change point detection for multiple change types

Authors: Alexander Stepikin, Evgenia Romanenkova, Alexey Zaytsev

Abstract: A change points detection aims to catch an abrupt disorder in data distribution. Common approaches assume that there are only two fixed distributions for data: one before and another after a change point. Real-world data are richer than this assumption. There can be multiple different distributions before and after a change. We propose an approach that works in the multiple-distributions scenario.… ▽ More A change points detection aims to catch an abrupt disorder in data distribution. Common approaches assume that there are only two fixed distributions for data: one before and another after a change point. Real-world data are richer than this assumption. There can be multiple different distributions before and after a change. We propose an approach that works in the multiple-distributions scenario. Our approach learn representations for semi-structured data suitable for change point detection, while a common classifiers-based approach fails. Moreover, our model is more robust, when predicting change points. The datasets used for benchmarking are sequences of images with and without change points in them. △ Less

Submitted 15 April, 2022; originally announced April 2022.

arXiv:2203.08463 [pdf, other]

A Strategy for Low-Mass Dark Matter Searches with Cryogenic Detectors in the SuperCDMS SNOLAB Facility

Authors: SuperCDMS Collaboration, M. F. Albakry, I. Alkhatib, D. W. P. Amaral, T. Aralis, T. Aramaki, I. J. Arnquist, I. Ataee Langroudy, E. Azadbakht, S. Banik, C. Bathurst, D. A. Bauer, R. Bhattacharyya, P. L. Brink, R. Bunker, B. Cabrera, R. Calkins, R. A. Cameron, C. Cartaro, D. G. Cerdeno, Y. -Y. Chang, M. Chaudhuri, R. Chen, N. Chott, J. Cooley , et al. (103 additional authors not shown)

Abstract: The SuperCDMS Collaboration is currently building SuperCDMS SNOLAB, a dark matter search focused on nucleon-coupled dark matter in the 1-5 GeV/c$^2$ mass range. Looking to the future, the Collaboration has developed a set of experience-based upgrade scenarios, as well as novel directions, to extend the search for dark matter using the SuperCDMS technology in the SNOLAB facility. The experienced-ba… ▽ More The SuperCDMS Collaboration is currently building SuperCDMS SNOLAB, a dark matter search focused on nucleon-coupled dark matter in the 1-5 GeV/c$^2$ mass range. Looking to the future, the Collaboration has developed a set of experience-based upgrade scenarios, as well as novel directions, to extend the search for dark matter using the SuperCDMS technology in the SNOLAB facility. The experienced-based scenarios are forecasted to probe many square decades of unexplored dark matter parameter space below 5 GeV/c$^2$, covering over 6 decades in mass: 1-100 eV/c$^2$ for dark photons and axion-like particles, 1-100 MeV/c$^2$ for dark-photon-coupled light dark matter, and 0.05-5 GeV/c$^2$ for nucleon-coupled dark matter. They will reach the neutrino fog in the 0.5-5 GeV/c$^2$ mass range and test a variety of benchmark models and sharp targets. The novel directions involve greater departures from current SuperCDMS technology but promise even greater reach in the long run, and their development must begin now for them to be available in a timely fashion. The experienced-based upgrade scenarios rely mainly on dramatic improvements in detector performance based on demonstrated scaling laws and reasonable extrapolations of current performance. Importantly, these improvements in detector performance obviate significant reductions in background levels beyond current expectations for the SuperCDMS SNOLAB experiment. Given that the dominant limiting backgrounds for SuperCDMS SNOLAB are cosmogenically created radioisotopes in the detectors, likely amenable only to isotopic purification and an underground detector life-cycle from before crystal growth to detector testing, the potential cost and time savings are enormous and the necessary improvements much easier to prototype. △ Less

Submitted 1 April, 2023; v1 submitted 16 March, 2022; originally announced March 2022.

Comments: contribution to Snowmass 2021; v2 updated (assorted corrections and improvements to forecasts) October 2022; v3 updated (corrected SuperCDMS SNOLAB sensitivity curves in upgrade forecast plots in body of text) April 2023

arXiv:2203.06754 [pdf, ps, other]

doi 10.1103/PhysRevA.105.062806

Two-photon Annihilation of Positrons with K-shell Electrons of H-like ions

Authors: Z. A. Mandrykina, V. A. Zaytsev, V. A. Yerokhin, V. M. Shabaev

Abstract: The two-photon annihilation of a positron with an electron bound in the 1s state of a H-like ion is calculated within the fully relativistic QED framework. The interaction with the nucleus is treated nonperturbatively, thus allowing the calculations to be carried out for the annihilation with strongly-bound inner shells of heavy ions. Infrared divergences, appearing when one of the emitted photons… ▽ More The two-photon annihilation of a positron with an electron bound in the 1s state of a H-like ion is calculated within the fully relativistic QED framework. The interaction with the nucleus is treated nonperturbatively, thus allowing the calculations to be carried out for the annihilation with strongly-bound inner shells of heavy ions. Infrared divergences, appearing when one of the emitted photons approaches the low-frequency limit, are accurately eliminated from final expressions. The total cross section of the two-photon and one-photon annihilation processes are compared for a wide range of collision energies and nuclear charge numbers. It is demonstrated that the two-photon annihilation channel dominates over the one-photon channel for the low and medium-Z ions, whereas for the high-Z ions the situation reverses. △ Less

Submitted 13 March, 2022; originally announced March 2022.

Comments: 18 pages, 5 figures

arXiv:2203.02594

A Search for Low-mass Dark Matter via Bremsstrahlung Radiation and the Migdal Effect in SuperCDMS

Authors: SuperCDMS Collaboration, Musaab Al-Bakry, Imran Alkhatib, Dorian Praia do Amaral, Taylor Aralis, Tsuguo Aramaki, Isaac Arnquist, Iman Ataee Langroudy, Elham Azadbakht, Samir Banik, Corey Bathurst, Dan Bauer, Lucas Bezerra, Rik Bhattacharyya, Paul Brink, Ray Bunker, Blas Cabrera, Robert Calkins, Robert Cameron, Concetta Cartaro, David Cerdeno, Yen-Yung Chang, Mouli Chaudhuri, Ran Chen, Nicholas Chott , et al. (106 additional authors not shown)

Abstract: In this paper, we present a re-analysis of SuperCDMS data using a profile likelihood approach to search for sub-GeV dark matter particles (DM) through two inelastic scattering channels: bremsstrahlung radiation and the Migdal effect. By considering possible inelastic scattering channels, experimental sensitivity can be extended to DM masses that would otherwise be undetectable through the DM-nucle… ▽ More In this paper, we present a re-analysis of SuperCDMS data using a profile likelihood approach to search for sub-GeV dark matter particles (DM) through two inelastic scattering channels: bremsstrahlung radiation and the Migdal effect. By considering possible inelastic scattering channels, experimental sensitivity can be extended to DM masses that would otherwise be undetectable through the DM-nucleon elastic scattering channel, given the energy threshold of current experiments. We exclude DM masses down to $220~\textrm{MeV}/c^2$ at $2.7 \times 10^{-30}~\textrm{cm}^2$ via the bremsstrahlung channel. The Migdal channel search excludes DM masses down to $30~\textrm{MeV}/c^2$ at $5.0 \times 10^{-30}~\textrm{cm}^2$. △ Less

Submitted 19 May, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: This paper is being withdrawn due to an error in data selection during the analysis. Although incorrect, the limits are roughly representative of the sensitivity. The new corrected version of the result will be uploaded once ready

arXiv:2202.12297 [pdf, other]

Embedded Ensembles: Infinite Width Limit and Operating Regimes

Authors: Maksim Velikanov, Roman Kail, Ivan Anokhin, Roman Vashurin, Maxim Panov, Alexey Zaytsev, Dmitry Yarotsky

Abstract: A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as Embedded Ensembling (EE); its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a systematic theoretical and empirical analysis of embedded ensembles with different number… ▽ More A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as Embedded Ensembling (EE); its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a systematic theoretical and empirical analysis of embedded ensembles with different number of models. Theoretically, we use a Neural-Tangent-Kernel-based approach to derive the wide network limit of the gradient descent dynamics. In this limit, we identify two ensemble regimes - independent and collective - depending on the architecture and initialization strategy of ensemble models. We prove that in the independent regime the embedded ensemble behaves as an ensemble of independent models. We confirm our theoretical prediction with a wide range of experiments with finite networks, and further study empirically various effects such as transition between the two regimes, scaling of ensemble performance with the network width and number of models, and dependence of performance on a number of architecture and hyperparameter choices. △ Less

Submitted 24 February, 2022; originally announced February 2022.

arXiv:2202.07043 [pdf, other]

doi 10.1103/PhysRevD.105.122002

Ionization yield measurement in a germanium CDMSlite detector using photo-neutron sources

Authors: SuperCDMS Collaboration, M. F. Albakry, I. Alkhatib, D. W. P. Amaral, T. Aralis, T. Aramaki, I. J. Arnquist, I. Ataee Langroudy, E. Azadbakht, S. Banik, C. Bathurst, D. A. Bauer, L. V. S. Bezerra, R. Bhattacharyya, M. A. Bowles, P. L. Brink, R. Bunker, B. Cabrera, R. Calkins, R. A. Cameron, C. Cartaro, D. G. Cerdeño, Y. -Y. Chang, M. Chaudhuri, R. Chen , et al. (104 additional authors not shown)

Abstract: Two photo-neutron sources, $^{88}$Y$^{9}$Be and $^{124}$Sb$^{9}$Be, have been used to investigate the ionization yield of nuclear recoils in the CDMSlite germanium detectors by the SuperCDMS collaboration. This work evaluates the yield for nuclear recoil energies between 1 keV and 7 keV at a temperature of $\sim$ 50 mK. We use a Geant4 simulation to model the neutron spectrum assuming a charge yie… ▽ More Two photo-neutron sources, $^{88}$Y$^{9}$Be and $^{124}$Sb$^{9}$Be, have been used to investigate the ionization yield of nuclear recoils in the CDMSlite germanium detectors by the SuperCDMS collaboration. This work evaluates the yield for nuclear recoil energies between 1 keV and 7 keV at a temperature of $\sim$ 50 mK. We use a Geant4 simulation to model the neutron spectrum assuming a charge yield model that is a generalization of the standard Lindhard model and consists of two energy dependent parameters. We perform a likelihood analysis using the simulated neutron spectrum, modeled background, and experimental data to obtain the best fit values of the yield model. The ionization yield between recoil energies of 1 keV and 7 keV is shown to be significantly lower than predicted by the standard Lindhard model for germanium. There is a general lack of agreement among different experiments using a variety of techniques studying the low-energy range of the nuclear recoil yield, which is most critical for interpretation of direct dark matter searches. This suggests complexity in the physical process that many direct detection experiments use to model their primary signal detection mechanism and highlights the need for further studies to clarify underlying systematic effects that have not been well understood up to this point. △ Less

Submitted 27 June, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

Journal ref: Phys. Rev. D 105, 122002 (2022)

arXiv:2202.05583 [pdf, other]

Similarity learning for wells based on logging data

Authors: Evgenia Romanenkova, Alina Rogulina, Anuar Shakirov, Nikolay Stulov, Alexey Zaytsev, Leyla Ismailova, Dmitry Kovalev, Klemens Katterbauer, Abdallah AlShehri

Abstract: One of the first steps during the investigation of geological objects is the interwell correlation. It provides information on the structure of the objects under study, as it comprises the framework for constructing geological models and assessing hydrocarbon reserves. Today, the detailed interwell correlation relies on manual analysis of well-logging data. Thus, it is time-consuming and of a subj… ▽ More One of the first steps during the investigation of geological objects is the interwell correlation. It provides information on the structure of the objects under study, as it comprises the framework for constructing geological models and assessing hydrocarbon reserves. Today, the detailed interwell correlation relies on manual analysis of well-logging data. Thus, it is time-consuming and of a subjective nature. The essence of the interwell correlation constitutes an assessment of the similarities between geological profiles. There were many attempts to automate the process of interwell correlation by means of rule-based approaches, classic machine learning approaches, and deep learning approaches in the past. However, most approaches are of limited usage and inherent subjectivity of experts. We propose a novel framework to solve the geological profile similarity estimation based on a deep learning model. Our similarity model takes well-logging data as input and provides the similarity of wells as output. The developed framework enables (1) extracting patterns and essential characteristics of geological profiles within the wells and (2) model training following the unsupervised paradigm without the need for manual analysis and interpretation of well-logging data. For model testing, we used two open datasets originating in New Zealand and Norway. Our data-based similarity models provide high performance: the accuracy of our model is $0.926$ compared to $0.787$ for baselines based on the popular gradient boosting approach. With them, an oil\&gas practitioner can improve interwell correlation quality and reduce operation time. △ Less

Submitted 11 February, 2022; originally announced February 2022.

arXiv:2202.05097 [pdf, other]

doi 10.21468/SciPostPhysProc.9.001

EXCESS workshop: Descriptions of rising low-energy spectra

Authors: P. Adari, A. Aguilar-Arevalo, D. Amidei, G. Angloher, E. Armengaud, C. Augier, L. Balogh, S. Banik, D. Baxter, C. Beaufort, G. Beaulieu, V. Belov, Y. Ben Gal, G. Benato, A. Benoît, A. Bento, L. Bergé, A. Bertolini, R. Bhattacharyya, J. Billard, I. M. Bloch, A. Botti, R. Breier, G. Bres, J-. L. Bret , et al. (281 additional authors not shown)

Abstract: Many low-threshold experiments observe sharply rising event rates of yet unknown origins below a few hundred eV, and larger than expected from known backgrounds. Due to the significant impact of this excess on the dark matter or neutrino sensitivity of these experiments, a collective effort has been started to share the knowledge about the individual observations. For this, the EXCESS Workshop was… ▽ More Many low-threshold experiments observe sharply rising event rates of yet unknown origins below a few hundred eV, and larger than expected from known backgrounds. Due to the significant impact of this excess on the dark matter or neutrino sensitivity of these experiments, a collective effort has been started to share the knowledge about the individual observations. For this, the EXCESS Workshop was initiated. In its first iteration in June 2021, ten rare event search collaborations contributed to this initiative via talks and discussions. The contributing collaborations were CONNIE, CRESST, DAMIC, EDELWEISS, MINER, NEWS-G, NUCLEUS, RICOCHET, SENSEI and SuperCDMS. They presented data about their observed energy spectra and known backgrounds together with details about the respective measurements. In this paper, we summarize the presented information and give a comprehensive overview of the similarities and differences between the distinct measurements. The provided data is furthermore publicly available on the workshop's data repository together with a plotting tool for visualization. △ Less

Submitted 4 March, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

Comments: 44 pages, 20 figures; Editors: A. Fuss, M. Kaznacheeva, F. Reindl, F. Wagner; updated copyright statements and funding information

Journal ref: SciPost Phys. Proc. 9, 001 (2022)

arXiv:2201.01657 [pdf, other]

Calculations of Delbrück scattering to all orders in $αZ$

Authors: J. Sommerfeldt, V. A. Yerokhin, R. A. Müller, V. A. Zaytsev, A. V. Volotka, A. Surzhykov

Abstract: We present a theoretical method to calculate Delbrück scattering amplitudes. Our formalism is based on the exact analytical Dirac-Coulomb Green's function and, therefore, accounts for the interaction of the virtual electron-positron pair with the nucleus to all orders, including the Coulomb corrections. The numerical convergence of our calculations is accelerated by solving the radial integrals th… ▽ More We present a theoretical method to calculate Delbrück scattering amplitudes. Our formalism is based on the exact analytical Dirac-Coulomb Green's function and, therefore, accounts for the interaction of the virtual electron-positron pair with the nucleus to all orders, including the Coulomb corrections. The numerical convergence of our calculations is accelerated by solving the radial integrals that are involved analytically in the asymptotic region. Numerical results for the collision of photons with energies 102.2 keV and 255.5 keV with bare neon and lead nuclei are compared with the predictions of the lowest-order Born approximation. We find that our method can produce accurate results within a reasonable computation time and that the Coulomb corrections enhance the absolute value of the Delbrück amplitude by a few percent for the studied photon energies. △ Less

Submitted 5 January, 2022; originally announced January 2022.

arXiv:2112.03021 [pdf, ps, other]

doi 10.1103/PhysRevA.105.052803

Electronic structure effects in the electron bremsstrahlung from heavy ions

Authors: M. E. Groshev, V. A. Zaytsev, V. A. Yerokhin, P. -M. Hillenbrand, Yu. A. Litvinov, V. M. Shabaev

Abstract: A fully relativistic approach is presented for the calculation of the bremsstrahlung emitted by an electron scattered off an ionic target. The ionic target is described as a combination of an effective Coulomb potential and a finite-range potential induced by the electronic cloud of the ion. The approach allows us to investigate the influence of the electronic structure of the target on the proper… ▽ More A fully relativistic approach is presented for the calculation of the bremsstrahlung emitted by an electron scattered off an ionic target. The ionic target is described as a combination of an effective Coulomb potential and a finite-range potential induced by the electronic cloud of the ion. The approach allows us to investigate the influence of the electronic structure of the target on the properties of the emitted radiation. We calculate the double differential cross-section and Stokes parameters of the bremsstrahlung of an electron scattered off uranium ions in different charge states, ranging from bare to neutral uranium. Results on the high-energy endpoint of the electron bremsstrahlung from Li-like uranium ions ${\rm U}^{89+}$ are compared to the recent experimental data. For this process, it is found that taking into account the electronic structure of the target results in modification of the cross-section on the level of 14%, which can, in principle, be seen in present-day experiments. △ Less

Submitted 6 December, 2021; originally announced December 2021.

Comments: 20 pages, 6 figures

arXiv:2110.12000 [pdf, other]

Bank transactions embeddings help to uncover current macroeconomics

Authors: Maria Begicheva, Alexey Zaytsev

Abstract: Macroeconomic indexes are of high importance for banks: many risk-control decisions utilize these indexes. A typical workflow of these indexes evaluation is costly and protracted, with a lag between the actual date and available index being a couple of months. Banks predict such indexes now using autoregressive models to make decisions in a rapidly changing environment. However, autoregressive mod… ▽ More Macroeconomic indexes are of high importance for banks: many risk-control decisions utilize these indexes. A typical workflow of these indexes evaluation is costly and protracted, with a lag between the actual date and available index being a couple of months. Banks predict such indexes now using autoregressive models to make decisions in a rapidly changing environment. However, autoregressive models fail in complex scenarios related to appearances of crises. We propose to use clients' financial transactions data from a large Russian bank to get such indexes. Financial transactions are long, and a number of clients is huge, so we develop an efficient approach that allows fast and accurate estimation of macroeconomic indexes based on a stream of transactions consisting of millions of transactions. The approach uses a neural networks paradigm and a smart sampling scheme. The results show that our neural network approach outperforms the baseline method on hand-crafted features based on transactions. Calculated embeddings show the correlation between the client's transaction activity and bank macroeconomic indexes over time. △ Less

Submitted 29 December, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

Journal ref: ICMLA 2021

arXiv:2107.11275 [pdf, other]

A Differentiable Language Model Adversarial Attack on Text Classifiers

Authors: Ivan Fursov, Alexey Zaytsev, Pavel Burnyshev, Ekaterina Dmitrieva, Nikita Klyuchnikov, Andrey Kravchenko, Ekaterina Artemova, Evgeny Burnaev

Abstract: Robustness of huge Transformer-based models for natural language processing is an important issue due to their capabilities and wide adoption. One way to understand and improve robustness of these models is an exploration of an adversarial attack scenario: check if a small perturbation of an input can fool a model. Due to the discrete nature of textual data, gradient-based adversarial methods, w… ▽ More Robustness of huge Transformer-based models for natural language processing is an important issue due to their capabilities and wide adoption. One way to understand and improve robustness of these models is an exploration of an adversarial attack scenario: check if a small perturbation of an input can fool a model. Due to the discrete nature of textual data, gradient-based adversarial methods, widely used in computer vision, are not applicable per~se. The standard strategy to overcome this issue is to develop token-level transformations, which do not take the whole sentence into account. In this paper, we propose a new black-box sentence-level attack. Our method fine-tunes a pre-trained language model to generate adversarial examples. A proposed differentiable loss function depends on a substitute classifier score and an approximate edit distance computed via a deep learning model. We show that the proposed attack outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation. Moreover, due to the usage of the fine-tuned language model, the generated adversarial examples are hard to detect, thus current models are not robust. Hence, it is difficult to defend from the proposed attack, which is not the case for other attacks. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2006.11078

arXiv:2106.08361 [pdf, other]

Adversarial Attacks on Deep Models for Financial Transaction Records

Authors: Ivan Fursov, Matvey Morozov, Nina Kaploukhaya, Elizaveta Kovtun, Rodrigo Rivera-Castro, Gleb Gusev, Dmitry Babaev, Ivan Kireev, Alexey Zaytsev, Evgeny Burnaev

Abstract: Machine learning models using transaction records as inputs are popular among financial institutions. The most efficient models use deep-learning architectures similar to those in the NLP community, posing a challenge due to their tremendous number of parameters and limited robustness. In particular, deep-learning models are vulnerable to adversarial attacks: a little change in the input harms the… ▽ More Machine learning models using transaction records as inputs are popular among financial institutions. The most efficient models use deep-learning architectures similar to those in the NLP community, posing a challenge due to their tremendous number of parameters and limited robustness. In particular, deep-learning models are vulnerable to adversarial attacks: a little change in the input harms the model's output. In this work, we examine adversarial attacks on transaction records data and defences from these attacks. The transaction records data have a different structure than the canonical NLP or time series data, as neighbouring records are less connected than words in sentences, and each record consists of both discrete merchant code and continuous transaction amount. We consider a black-box attack scenario, where the attack doesn't know the true decision model, and pay special attention to adding transaction tokens to the end of a sequence. These limitations provide more realistic scenario, previously unexplored in NLP world. The proposed adversarial attacks and the respective defences demonstrate remarkable performance using relevant datasets from the financial industry. Our results show that a couple of generated transactions are sufficient to fool a deep-learning model. Further, we improve model robustness via adversarial training or separate adversarial examples detection. This work shows that embedding protection from adversarial attacks improves model robustness, allowing a wider adoption of deep models for transaction records in banking and finance. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2106.02602 [pdf, other]

doi 10.1145/3503161.3548182

InDiD: Instant Disorder Detection via Representation Learning

Authors: Evgenia Romanenkova, Alexander Stepikin, Matvey Morozov, Alexey Zaytsev

Abstract: For sequential data, a change point is a moment of abrupt regime switch in data streams. Such changes appear in different scenarios, including simpler data from sensors and more challenging video surveillance data. We need to detect disorders as fast as possible. Classic approaches for change point detection (CPD) might underperform for semi-structured sequential data because they cannot process i… ▽ More For sequential data, a change point is a moment of abrupt regime switch in data streams. Such changes appear in different scenarios, including simpler data from sensors and more challenging video surveillance data. We need to detect disorders as fast as possible. Classic approaches for change point detection (CPD) might underperform for semi-structured sequential data because they cannot process its structure without a proper representation. We propose a principled loss function that balances change detection delay and time to a false alarm. It approximates classic rigorous solutions but is differentiable and allows representation learning for deep models. We consider synthetic sequences, real-world data sensors and videos with change points. We carefully labelled available data with change point moments for video data and released it for the first time. Experiments suggest that complex data require meaningful representations tailored for the specificity of the CPD task -- and our approach provides them outperforming considered baselines. For example, for explosion detection in video, the F1 score for our method is $0.53$ compared to baseline scores of $0.31$ and $0.35$. △ Less

Submitted 22 April, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

arXiv:2104.01440 [pdf, other]

COHORTNEY: Non-Parametric Clustering of Event Sequences

Authors: Vladislav Zhuzhel, Rodrigo Rivera-Castro, Nina Kaploukhaya, Liliya Mironova, Alexey Zaytsev, Evgeny Burnaev

Abstract: Cohort analysis is a pervasive activity in web analytics. One divides users into groups according to specific criteria and tracks their behavior over time. Despite its extensive use, academic circles do not discuss cohort analysis to evaluate user behavior online. This work introduces an unsupervised non-parametric approach to group Internet users based on their activities. In comparison, canonica… ▽ More Cohort analysis is a pervasive activity in web analytics. One divides users into groups according to specific criteria and tracks their behavior over time. Despite its extensive use, academic circles do not discuss cohort analysis to evaluate user behavior online. This work introduces an unsupervised non-parametric approach to group Internet users based on their activities. In comparison, canonical methods in marketing and engineering-based techniques underperform. COHORTNEY is the first machine learning-based cohort analysis algorithm with a robust theoretical explanation. △ Less

Submitted 12 June, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

Comments: 18 pages

Showing 1–50 of 104 results for author: Zaytsev, A