Search | arXiv e-print repository

arXiv:2406.14347 [pdf, other]

$\nabla^2$DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

Authors: Kuzma Khrabrov, Anton Ber, Artem Tsypin, Konstantin Ushenin, Egor Rumiantsev, Alexander Telepov, Dmitry Protasov, Ilya Shenbin, Anton Alekseev, Mikhail Shirokikh, Sergey Nikolenko, Elena Tutubalina, Artur Kadurin

Abstract: Methods of computational quantum chemistry provide accurate approximations of molecular properties crucial for computer-aided drug discovery and other areas of chemical science. However, high computational complexity limits the scalability of their applications. Neural network potentials (NNPs) are a promising alternative to quantum chemistry methods, but they require large and diverse datasets fo… ▽ More Methods of computational quantum chemistry provide accurate approximations of molecular properties crucial for computer-aided drug discovery and other areas of chemical science. However, high computational complexity limits the scalability of their applications. Neural network potentials (NNPs) are a promising alternative to quantum chemistry methods, but they require large and diverse datasets for training. This work presents a new dataset and benchmark called $\nabla^2$DFT that is based on the nablaDFT. It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models. The dataset includes energies, forces, 17 molecular properties, Hamiltonian and overlap matrices, and a wavefunction object. All calculations were performed at the DFT level ($ω$B97X-D/def2-SVP) for each conformation. Moreover, $\nabla^2$DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules. We also introduce a novel benchmark for evaluating NNPs in molecular property prediction, Hamiltonian prediction, and conformational optimization tasks. Finally, we propose an extendable framework for training NNPs and implement 10 models within it. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2309.17076 [pdf, other]

Benefits of mirror weight symmetry for 3D mesh segmentation in biomedical applications

Authors: Vladislav Dordiuk, Maksim Dzhigil, Konstantin Ushenin

Abstract: 3D mesh segmentation is an important task with many biomedical applications. The human body has bilateral symmetry and some variations in organ positions. It allows us to expect a positive effect of rotation and inversion invariant layers in convolutional neural networks that perform biomedical segmentations. In this study, we show the impact of weight symmetry in neural networks that perform 3D m… ▽ More 3D mesh segmentation is an important task with many biomedical applications. The human body has bilateral symmetry and some variations in organ positions. It allows us to expect a positive effect of rotation and inversion invariant layers in convolutional neural networks that perform biomedical segmentations. In this study, we show the impact of weight symmetry in neural networks that perform 3D mesh segmentation. We analyze the problem of 3D mesh segmentation for pathological vessel structures (aneurysms) and conventional anatomical structures (endocardium and epicardium of ventricles). Local geometrical features are encoded as sampling from the signed distance function, and the neural network performs prediction for each mesh node. We show that weight symmetry gains from 1 to 3% of additional accuracy and allows decreasing the number of trainable parameters up to 8 times without suffering the performance loss if neural networks have at least three convolutional layers. This also works for very small training sets. △ Less

Submitted 6 November, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: was sent to IEEE conference

MSC Class: 65D18; 68U10; ACM Class: I.4.6; J.3

arXiv:2308.13328 [pdf, other]

Compressor-Based Classification for Atrial Fibrillation Detection

Authors: Nikita Markov, Konstantin Ushenin, Yakov Bozhko, Olga Solovyova

Abstract: Atrial fibrillation (AF) is one of the most common arrhythmias with challenging public health implications. Therefore, automatic detection of AF episodes on ECG is one of the essential tasks in biomedical engineering. In this paper, we applied the recently introduced method of compressor-based text classification with gzip algorithm for AF detection (binary classification between heart rhythms). W… ▽ More Atrial fibrillation (AF) is one of the most common arrhythmias with challenging public health implications. Therefore, automatic detection of AF episodes on ECG is one of the essential tasks in biomedical engineering. In this paper, we applied the recently introduced method of compressor-based text classification with gzip algorithm for AF detection (binary classification between heart rhythms). We investigated the normalized compression distance applied to RR-interval and $Δ$RR-interval sequences ($Δ$RR-interval is the difference between subsequent RR-intervals). Here, the configuration of the k-nearest neighbour classifier, an optimal window length, and the choice of data types for compression were analyzed. We achieved good classification results while learning on the full MIT-BIH Atrial Fibrillation database, close to the best specialized AF detection algorithms (avg. sensitivity = 97.1\%, avg. specificity = 91.7\%, best sensitivity of 99.8\%, best specificity of 97.6\% with fivefold cross-validation). In addition, we evaluated the classification performance under the few-shot learning setting. Our results suggest that gzip compression-based classification, originally proposed for texts, is suitable for biomedical data and quantized continuous stochastic sequences in general. △ Less

Submitted 2 October, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

Comments: This paper is sent for review at the IEEE conference, 2023

MSC Class: 92C55; 68T10; 68T99; ACM Class: J.3; G.3

arXiv:2211.03122 [pdf, other]

doi 10.1109/SIBIRCON56155.2022.10016940

Computational anatomy atlas using multilayer perceptron with Lipschitz regularization

Authors: Konstantin Ushenin, Maksim Dzhigil, Vladislav Dordiuk

Abstract: A computational anatomy atlas is a set of internal organ geometries. It is based on data of real patients and complemented with virtual cases by using a some numerical approach. Atlases are in demand in computational physiology, especially in cardiological and neurophysiological applications. Usually, atlas generation uses explicit object representation, such as voxel models or surface meshes. In… ▽ More A computational anatomy atlas is a set of internal organ geometries. It is based on data of real patients and complemented with virtual cases by using a some numerical approach. Atlases are in demand in computational physiology, especially in cardiological and neurophysiological applications. Usually, atlas generation uses explicit object representation, such as voxel models or surface meshes. In this paper, we propose a method of atlas generation using an implicit representation of 3D objects. Our approach has two key stages. The first stage converts voxel models of segmented organs to implicit form using the usual multilayer perceptron. This stage smooths the model and reduces memory consumption. The second stage uses a multilayer perceptron with Lipschitz regularization. This neural network provides a smooth transition between implicitly defined 3D geometries. Our work shows examples of models of the left and right human ventricles. All code and data for this work are open. △ Less

Submitted 6 November, 2022; originally announced November 2022.

Comments: This paper is send to SIBIRICON 2022 conference

arXiv:2207.08165 [pdf, other]

doi 10.1109/CSGB56354.2022.9865298

Statistical model for describing heart rate variability in normal rhythm and atrial fibrillation

Authors: Nikita Markov, Ilya Kotov, Konstantin Ushenin, Yakov Bozhko

Abstract: Heart rate variability (HRV) indices describe properties of interbeat intervals in electrocardiogram (ECG). Usually HRV is measured exclusively in normal sinus rhythm (NSR) excluding any form of paroxysmal rhythm. Atrial fibrillation (AF) is the most widespread cardiac arrhythmia in human population. Usually such abnormal rhythm is not analyzed and assumed to be chaotic and unpredictable. Nonethel… ▽ More Heart rate variability (HRV) indices describe properties of interbeat intervals in electrocardiogram (ECG). Usually HRV is measured exclusively in normal sinus rhythm (NSR) excluding any form of paroxysmal rhythm. Atrial fibrillation (AF) is the most widespread cardiac arrhythmia in human population. Usually such abnormal rhythm is not analyzed and assumed to be chaotic and unpredictable. Nonetheless, ranges of HRV indices differ between patients with AF, yet physiological characteristics which influence them are poorly understood. In this study, we propose a statistical model that describes relationship between HRV indices in NSR and AF. The model is based on Mahalanobis distance, the k-Nearest neighbour approach and multivariate normal distribution framework. Verification of the method was performed using 10 min intervals of NSR and AF that were extracted from long-term Holter ECGs. For validation we used Bhattacharyya distance and Kolmogorov-Smirnov 2-sample test in a k-fold procedure. The model is able to predict at least 7 HRV indices with high precision. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Comments: Ural-Siberian Conference on Computational Technologies in Cognitive Science, Genomics and Biomedicine 2022 (CSGB 2022)

MSC Class: 62P10

arXiv:2207.08162 [pdf, other]

doi 10.1109/CSGB56354.2022.9865330

Natural language processing for clusterization of genes according to their functions

Authors: Vladislav Dordiuk, Ekaterina Demicheva, Fernando Polanco Espino, Konstantin Ushenin

Abstract: There are hundreds of methods for analysis of data obtained in mRNA-sequencing. The most of them are focused on small number of genes. In this study, we propose an approach that reduces the analysis of several thousand genes to analysis of several clusters. The list of genes is enriched with information from open databases. Then, the descriptions are encoded as vectors using the pretrained languag… ▽ More There are hundreds of methods for analysis of data obtained in mRNA-sequencing. The most of them are focused on small number of genes. In this study, we propose an approach that reduces the analysis of several thousand genes to analysis of several clusters. The list of genes is enriched with information from open databases. Then, the descriptions are encoded as vectors using the pretrained language model (BERT) and some text processing approaches. The encoded gene function pass through the dimensionality reduction and clusterization. Aiming to find the most efficient pipeline, 180 cases of pipeline with different methods in the major pipeline steps were analyzed. The performance was evaluated with clusterization indexes and expert review of the results. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Comments: Ural-Siberian Conference on Computational Technologies in Cognitive Science, Genomics and Biomedicine 2022 (CSGB 2022)

MSC Class: 68T50; 92-08

arXiv:2111.03873 [pdf, other]

doi 10.1002/zamm.202100217

On uniqueness theorems for the inverse problem of Electrocardiography in the Sobolev spaces

Authors: Vitaly Kalinin, Alexander Shlapunov, Konstantin Ushenin

Abstract: We consider a mathematical model related to reconstruction of cardiac electrical activity from ECG measurements on the body surface. An application of recent developments in solving boundary value problems for elliptic and parabolic equations in Sobolev type spaces allows us to obtain uniqueness theorems for the model. The obtained results can be used as a sound basis for creating numerical method… ▽ More We consider a mathematical model related to reconstruction of cardiac electrical activity from ECG measurements on the body surface. An application of recent developments in solving boundary value problems for elliptic and parabolic equations in Sobolev type spaces allows us to obtain uniqueness theorems for the model. The obtained results can be used as a sound basis for creating numerical methods for non-invasive map** of the heart. △ Less

Submitted 28 September, 2022; v1 submitted 6 November, 2021; originally announced November 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2106.04125

MSC Class: Primary 35J56; Secondary 35J57; 35K40

arXiv:2104.06193 [pdf, other]

doi 10.1109/USBEREIT51232.2021.9455004

Anomaly Detection in Image Datasets Using Convolutional Neural Networks, Center Loss, and Mahalanobis Distance

Authors: Garnik Vareldzhan, Kirill Yurkov, Konstantin Ushenin

Abstract: User activities generate a significant number of poor-quality or irrelevant images and data vectors that cannot be processed in the main data processing pipeline or included in the training dataset. Such samples can be found with manual analysis by an expert or with anomalous detection algorithms. There are several formal definitions for the anomaly samples. For neural networks, the anomalous is u… ▽ More User activities generate a significant number of poor-quality or irrelevant images and data vectors that cannot be processed in the main data processing pipeline or included in the training dataset. Such samples can be found with manual analysis by an expert or with anomalous detection algorithms. There are several formal definitions for the anomaly samples. For neural networks, the anomalous is usually defined as out-of-distribution samples. This work proposes methods for supervised and semi-supervised detection of out-of-distribution samples in image datasets. Our approach extends a typical neural network that solves the image classification problem. Thus, one neural network after extension can solve image classification and anomalous detection problems simultaneously. Proposed methods are based on the center loss and its effect on a deep feature distribution in a last hidden layer of the neural network. This paper provides an analysis of the proposed methods for the LeNet and EfficientNet-B0 on the MNIST and ImageNet-30 datasets. △ Less

Submitted 13 April, 2021; originally announced April 2021.

arXiv:1912.04672 [pdf, other]

doi 10.1109/USBEREIT48449.2020.9117753

Effects of lead position, cardiac rhythm variation and drug-induced QT prolongation on performance of machine learning methods for ECG processing

Authors: Marat Bogdanov, Salim Baigildin, Aygul Fabarisova, Konstantin Ushenin, Olga Solovyova

Abstract: Machine learning shows great performance in various problems of electrocardiography (ECG) signal analysis. However, collecting a dataset for biomedical engineering is a very difficult task. Any dataset for ECG processing contains from 100 to 10,000 times fewer cases than datasets for image or text analysis. This issue is especially important because of physiological phenomena that can significantl… ▽ More Machine learning shows great performance in various problems of electrocardiography (ECG) signal analysis. However, collecting a dataset for biomedical engineering is a very difficult task. Any dataset for ECG processing contains from 100 to 10,000 times fewer cases than datasets for image or text analysis. This issue is especially important because of physiological phenomena that can significantly change the morphology of heartbeats in ECG signals. In this preliminary study, we analyze the effects of lead choice from the standard ECG recordings, variation of ECG during 24-hours, and the effects of QT-prolongation agents on the performance of machine learning methods for ECG processing. We choose the problem of subject identification for analysis, because this problem may be solved for almost any available dataset of ECG data. In a discussion, we compare our findings with observations from other works that use machine learning for ECG processing with different problem statements. Our results show the importance of training dataset enrichment with ECG signals acquired in specific physiological conditions for obtaining good performance of ECG processing for real applications. △ Less

Submitted 16 February, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

arXiv:1911.09731 [pdf, other]

doi 10.1109/USBEREIT48449.2020.9117627

Phase map** for cardiac unipolar electrograms with neural network instead of phase transformation

Authors: Konstantin Ushenin, Tatyana Nesterova, Dmitry Shmarko, Vladimir Sholokhov

Abstract: A phase map** is an approach to processing signals of electrograms recorded from the surface of cardiac tissue. The main concept of phase map** is the application of the phase transformation with the aim to obtain signals with useful properties. In our study, we propose to use a simple sawtooth signal instead of a phase signal for processing of electrogram data and building of the phase maps.… ▽ More A phase map** is an approach to processing signals of electrograms recorded from the surface of cardiac tissue. The main concept of phase map** is the application of the phase transformation with the aim to obtain signals with useful properties. In our study, we propose to use a simple sawtooth signal instead of a phase signal for processing of electrogram data and building of the phase maps. We denote transformation that can provide this signal as a phase-like transformation (PLT). PLT defined via a convolutional neural network that is trained on a dataset from computer models of cardiac tissue electrophysiology. The proposed approaches were validated on data from the detailed personalized model of the human torso electrophysiology. This paper includes visualization of the phase map based on PLT and shows the robustness of the proposed approaches in the analysis of the complex non-stationary periodic activity of the excitable cardiac tissue. △ Less

Submitted 13 April, 2021; v1 submitted 21 November, 2019; originally announced November 2019.

MSC Class: 92B25; 65M99

arXiv:1909.06840 [pdf, other]

doi 10.1109/SIBIRCON48586.2019.8958121

Comparison of UNet, ENet, and BoxENet for Segmentation of Mast Cells in Scans of Histological Slices

Authors: Alexander Karimov, Artem Razumov, Ruslana Manbatchurina, Ksenia Simonova, Irina Donets, Anastasia Vlasova, Yulia Khramtsova, Konstantin Ushenin

Abstract: Deep neural networks show high accuracy in theproblem of semantic and instance segmentation of biomedicaldata. However, this approach is computationally expensive. Thecomputational cost may be reduced with network simplificationafter training or choosing the proper architecture, which providessegmentation with less accuracy but does it much faster. In thepresent study, we analyzed the accuracy and… ▽ More Deep neural networks show high accuracy in theproblem of semantic and instance segmentation of biomedicaldata. However, this approach is computationally expensive. Thecomputational cost may be reduced with network simplificationafter training or choosing the proper architecture, which providessegmentation with less accuracy but does it much faster. In thepresent study, we analyzed the accuracy and performance ofUNet and ENet architectures for the problem of semantic imagesegmentation. In addition, we investigated the ENet architecture by replacing of some convolution layers with box-convolutionlayers. The analysis performed on the original dataset consisted of histology slices with mast cells. These cells provide a region forsegmentation with different types of borders, which vary fromclearly visible to ragged. ENet was less accurate than UNet byonly about 1-2%, but ENet performance was 8-15 times faster than UNet one. △ Less

Submitted 22 November, 2019; v1 submitted 15 September, 2019; originally announced September 2019.

Comments: 4 pages, 5 figures, 1 table

Showing 1–11 of 11 results for author: Ushenin, K