-
MLEM: Generative and Contrastive Learning as Distinct Modalities for Event Sequences
Authors:
Viktor Moskvoretskii,
Dmitry Osin,
Egor Shvetsov,
Igor Udovichenko,
Maxim Zhelnin,
Andrey Dukhovny,
Anna Zhimerikina,
Evgeny Burnaev
Abstract:
This study explores the application of self-supervised learning techniques for event sequences. It is a key modality in various applications such as banking, e-commerce, and healthcare. However, there is limited research on self-supervised learning for event sequences, and methods from other domains like images, texts, and speech may not easily transfer. To determine the most suitable approach, we…
▽ More
This study explores the application of self-supervised learning techniques for event sequences. It is a key modality in various applications such as banking, e-commerce, and healthcare. However, there is limited research on self-supervised learning for event sequences, and methods from other domains like images, texts, and speech may not easily transfer. To determine the most suitable approach, we conduct a detailed comparative analysis of previously identified best-performing methods. We find that neither the contrastive nor generative method is superior. Our assessment includes classifying event sequences, predicting the next event, and evaluating embedding quality. These results further highlight the potential benefits of combining both methods. Given the lack of research on hybrid models in this domain, we initially adapt the baseline model from another domain. However, upon observing its underperformance, we develop a novel method called the Multimodal-Learning Event Model (MLEM). MLEM treats contrastive learning and generative modeling as distinct yet complementary modalities, aligning their embeddings. The results of our study demonstrate that combining contrastive and generative approaches into one procedure with MLEM achieves superior performance across multiple metrics.
△ Less
Submitted 3 July, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
SeqNAS: Neural Architecture Search for Event Sequence Classification
Authors:
Igor Udovichenko,
Egor Shvetsov,
Denis Divitsky,
Dmitry Osin,
Ilya Trofimov,
Anatoly Glushenko,
Ivan Sukharev,
Dmitry Berestenev,
Evgeny Burnaev
Abstract:
Neural Architecture Search (NAS) methods are widely used in various industries to obtain high quality taskspecific solutions with minimal human intervention. Event Sequences find widespread use in various industrial applications including churn prediction customer segmentation fraud detection and fault diagnosis among others. Such data consist of categorical and real-valued components with irregul…
▽ More
Neural Architecture Search (NAS) methods are widely used in various industries to obtain high quality taskspecific solutions with minimal human intervention. Event Sequences find widespread use in various industrial applications including churn prediction customer segmentation fraud detection and fault diagnosis among others. Such data consist of categorical and real-valued components with irregular timestamps. Despite the usefulness of NAS methods previous approaches only have been applied to other domains images texts or time series. Our work addresses this limitation by introducing a novel NAS algorithm SeqNAS specifically designed for event sequence classification. We develop a simple yet expressive search space that leverages commonly used building blocks for event sequence classification including multihead self attention convolutions and recurrent cells. To perform the search we adopt sequential Bayesian Optimization and utilize previously trained models as an ensemble of teachers to augment knowledge distillation. As a result of our work we demonstrate that our method surpasses state of the art NAS methods and popular architectures suitable for sequence classification and holds great potential for various industrial applications.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
QuantNAS for super resolution: searching for efficient quantization-friendly architectures against quantization noise
Authors:
Egor Shvetsov,
Dmitry Osin,
Alexey Zaytsev,
Ivan Koryakovskiy,
Valentin Buchnev,
Ilya Trofimov,
Evgeny Burnaev
Abstract:
There is a constant need for high-performing and computationally efficient neural network models for image super-resolution: computationally efficient models can be used via low-capacity devices and reduce carbon footprints. One way to obtain such models is to compress models, e.g. quantization. Another way is a neural architecture search that automatically discovers new, more efficient solutions.…
▽ More
There is a constant need for high-performing and computationally efficient neural network models for image super-resolution: computationally efficient models can be used via low-capacity devices and reduce carbon footprints. One way to obtain such models is to compress models, e.g. quantization. Another way is a neural architecture search that automatically discovers new, more efficient solutions. We propose a novel quantization-aware procedure, the QuantNAS that combines pros of these two approaches. To make QuantNAS work, the procedure looks for quantization-friendly super-resolution models. The approach utilizes entropy regularization, quantization noise, and Adaptive Deviation for Quantization (ADQ) module to enhance the search procedure. The entropy regularization technique prioritizes a single operation within each block of the search space. Adding quantization noise to parameters and activations approximates model degradation after quantization, resulting in a more quantization-friendly architectures. ADQ helps to alleviate problems caused by Batch Norm blocks in super-resolution models. Our experimental results show that the proposed approximations are better for search procedure than direct model quantization. QuantNAS discovers architectures with better PSNR/BitOps trade-off than uniform or mixed precision quantization of fixed architectures. We showcase the effectiveness of our method through its application to two search spaces inspired by the state-of-the-art SR models and RFDN. Thus, anyone can design a proper search space based on an existing architecture and apply our method to obtain better quality and efficiency.
The proposed procedure is 30\% faster than direct weight quantization and is more stable.
△ Less
Submitted 10 January, 2024; v1 submitted 31 August, 2022;
originally announced August 2022.