Search | arXiv e-print repository

Conditionally valid Probabilistic Conformal Prediction

Authors: Vincent Plassier, Alexander Fishkov, Maxim Panov, Eric Moulines

Abstract: We develop a new method for creating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution $P_{Y \mid X}$. Most existing methods, such as conformalized quantile regression and probabilistic conformal prediction, only offer marginal coverage guarantees. Our approach extends these methods to achieve conditional coverage, which is essentia… ▽ More We develop a new method for creating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution $P_{Y \mid X}$. Most existing methods, such as conformalized quantile regression and probabilistic conformal prediction, only offer marginal coverage guarantees. Our approach extends these methods to achieve conditional coverage, which is essential for many practical applications. While exact conditional guarantees are impossible without assumptions on the data distribution, we provide non-asymptotic bounds that explicitly depend on the quality of the available estimate of the conditional distribution. Our confidence sets are highly adaptive to the local structure of the data, making them particularly useful in high heteroskedasticity situations. We demonstrate the effectiveness of our approach through extensive simulations, showing that it outperforms existing methods in terms of conditional coverage and improves the reliability of statistical inference in a wide range of applications. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 23 pages

arXiv:2406.15627 [pdf, other]

Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph

Authors: Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Akim Tsvigun, Daniil Vasilev, Rui Xing, Abdelrahman Boda Sadallah, Lyudmila Rvanova, Sergey Petrakov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, Artem Shelmanov

Abstract: Uncertainty quantification (UQ) is becoming increasingly recognized as a critical component of applications that rely on machine learning (ML). The rapid proliferation of large language models (LLMs) has stimulated researchers to seek efficient and effective approaches to UQ in text generation tasks, as in addition to their emerging capabilities, these models have introduced new challenges for bui… ▽ More Uncertainty quantification (UQ) is becoming increasingly recognized as a critical component of applications that rely on machine learning (ML). The rapid proliferation of large language models (LLMs) has stimulated researchers to seek efficient and effective approaches to UQ in text generation tasks, as in addition to their emerging capabilities, these models have introduced new challenges for building safe applications. As with other ML models, LLMs are prone to make incorrect predictions, ``hallucinate'' by fabricating claims, or simply generate low-quality output for a given input. UQ is a key element in dealing with these challenges. However research to date on UQ methods for LLMs has been fragmented, with disparate evaluation methods. In this work, we tackle this issue by introducing a novel benchmark that implements a collection of state-of-the-art UQ baselines, and provides an environment for controllable and consistent evaluation of novel techniques by researchers in various text generation tasks. Our benchmark also supports the assessment of confidence normalization methods in terms of their ability to provide interpretable scores. Using our benchmark, we conduct a large-scale empirical investigation of UQ and normalization techniques across nine tasks and shed light on the most promising approaches. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev contributed equally

arXiv:2403.11696 [pdf, other]

Generalization error of spectral algorithms

Authors: Maksim Velikanov, Maxim Panov, Dmitry Yarotsky

Abstract: The asymptotically precise estimation of the generalization of kernel methods has recently received attention due to the parallels between neural networks and their associated kernels. However, prior works derive such estimates for training by kernel ridge regression (KRR), whereas neural networks are typically trained with gradient descent (GD). In the present work, we consider the training of ke… ▽ More The asymptotically precise estimation of the generalization of kernel methods has recently received attention due to the parallels between neural networks and their associated kernels. However, prior works derive such estimates for training by kernel ridge regression (KRR), whereas neural networks are typically trained with gradient descent (GD). In the present work, we consider the training of kernels with a family of $\textit{spectral algorithms}$ specified by profile $h(λ)$, and including KRR and GD as special cases. Then, we derive the generalization error as a functional of learning profile $h(λ)$ for two data models: high-dimensional Gaussian and low-dimensional translation-invariant model. Under power-law assumptions on the spectrum of the kernel and target, we use our framework to (i) give full loss asymptotics for both noisy and noiseless observations (ii) show that the loss localizes on certain spectral scales, giving a new perspective on the KRR saturation phenomenon (iii) conjecture, and demonstrate for the considered data models, the universality of the loss w.r.t. non-spectral details of the problem, but only in case of noisy observation. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.04696 [pdf, other]

Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

Authors: Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov

Abstract: Large language models (LLMs) are notorious for hallucinating, i.e., producing erroneous claims in their output. Such hallucinations can be dangerous, as occasional factual inaccuracies in the generated text might be obscured by the rest of the output being generally factually correct, making it extremely hard for the users to spot them. Current services that leverage LLMs usually do not provide an… ▽ More Large language models (LLMs) are notorious for hallucinating, i.e., producing erroneous claims in their output. Such hallucinations can be dangerous, as occasional factual inaccuracies in the generated text might be obscured by the rest of the output being generally factually correct, making it extremely hard for the users to spot them. Current services that leverage LLMs usually do not provide any means for detecting unreliable generations. Here, we aim to bridge this gap. In particular, we propose a novel fact-checking and hallucination detection pipeline based on token-level uncertainty quantification. Uncertainty scores leverage information encapsulated in the output of a neural network or its layers to detect unreliable predictions, and we show that they can be used to fact-check the atomic claims in the LLM output. Moreover, we present a novel token-level uncertainty quantification method that removes the impact of uncertainty about what claim to generate on the current step and what surface form to use. Our method Claim Conditioned Probability (CCP) measures only the uncertainty of a particular claim value expressed by the model. Experiments on the task of biography generation demonstrate strong improvements for CCP compared to the baselines for seven LLMs and four languages. Human evaluation reveals that the fact-checking pipeline based on uncertainty quantification is competitive with a fact-checking tool that leverages external knowledge. △ Less

Submitted 6 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted to ACL-2024 (Findings). Ekaterina Fadeeva, Aleksandr Rubashevskii, and Artem Shelmanov contributed equally

arXiv:2402.10727 [pdf, other]

Predictive Uncertainty Quantification via Risk Decompositions for Strictly Proper Scoring Rules

Authors: Nikita Kotelevskii, Maxim Panov

Abstract: Uncertainty quantification in predictive modeling often relies on ad hoc methods as there is no universally accepted formal framework for that. This paper introduces a theoretical approach to understanding uncertainty through statistical risks, distinguishing between aleatoric (data-related) and epistemic (model-related) uncertainties. We explain how to split pointwise risk into Bayes risk and exc… ▽ More Uncertainty quantification in predictive modeling often relies on ad hoc methods as there is no universally accepted formal framework for that. This paper introduces a theoretical approach to understanding uncertainty through statistical risks, distinguishing between aleatoric (data-related) and epistemic (model-related) uncertainties. We explain how to split pointwise risk into Bayes risk and excess risk. In particular, we show that excess risk, related to epistemic uncertainty, aligns with Bregman divergences. To turn considered risk measures into actual uncertainty estimates, we suggest using the Bayesian approach by approximating the risks with the help of posterior distributions. We tested our method on image datasets, evaluating its performance in detecting out-of-distribution and misclassified data using the AUROC metric. Our results confirm the effectiveness of the considered approach and offer practical guidance for estimating uncertainty in real-world applications. △ Less

Submitted 6 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2312.15799 [pdf, other]

Efficient Conformal Prediction under Data Heterogeneity

Authors: Vincent Plassier, Nikita Kotelevskii, Aleksandr Rubashevskii, Fedor Noskov, Maksim Velikanov, Alexander Fishkov, Samuel Horvath, Martin Takac, Eric Moulines, Maxim Panov

Abstract: Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification, which is crucial for ensuring the reliability of predictions. However, common CP methods heavily rely on data exchangeability, a condition often violated in practice. Existing approaches for tackling non-exchangeability lead to methods that are not computable beyond the simplest examples. This work introduce… ▽ More Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification, which is crucial for ensuring the reliability of predictions. However, common CP methods heavily rely on data exchangeability, a condition often violated in practice. Existing approaches for tackling non-exchangeability lead to methods that are not computable beyond the simplest examples. This work introduces a new efficient approach to CP that produces provably valid confidence sets for fairly general non-exchangeable data distributions. We illustrate the general theory with applications to the challenging setting of federated learning under data heterogeneity between agents. Our method allows constructing provably valid personalized prediction sets for agents in a fully federated way. The effectiveness of the proposed method is demonstrated in a series of experiments on real-world datasets. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: 28 pages

arXiv:2312.11230 [pdf, other]

Dirichlet-based Uncertainty Quantification for Personalized Federated Learning with Improved Posterior Networks

Authors: Nikita Kotelevskii, Samuel Horváth, Karthik Nandakumar, Martin Takáč, Maxim Panov

Abstract: In modern federated learning, one of the main challenges is to account for inherent heterogeneity and the diverse nature of data distributions for different clients. This problem is often addressed by introducing personalization of the models towards the data distribution of the particular client. However, a personalized model might be unreliable when applied to the data that is not typical for th… ▽ More In modern federated learning, one of the main challenges is to account for inherent heterogeneity and the diverse nature of data distributions for different clients. This problem is often addressed by introducing personalization of the models towards the data distribution of the particular client. However, a personalized model might be unreliable when applied to the data that is not typical for this client. Eventually, it may perform worse for these data than the non-personalized global model trained in a federated way on the data from all the clients. This paper presents a new approach to federated learning that allows selecting a model from global and personalized ones that would perform better for a particular input point. It is achieved through a careful modeling of predictive uncertainties that helps to detect local and global in- and out-of-distribution data and use this information to select the model that is confident in a prediction. The comprehensive experimental evaluation on the popular real-world image datasets shows the superior performance of the model in the presence of out-of-distribution data while performing on par with state-of-the-art personalized federated learning algorithms in the standard scenarios. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.07383 [pdf, other]

LM-Polygraph: Uncertainty Estimation for Language Models

Authors: Ekaterina Fadeeva, Roman Vashurin, Akim Tsvigun, Artem Vazhentsev, Sergey Petrakov, Kirill Fedyanin, Daniil Vasilev, Elizaveta Goncharova, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Artem Shelmanov

Abstract: Recent advancements in the capabilities of large language models (LLMs) have paved the way for a myriad of groundbreaking applications in various fields. However, a significant challenge arises as these models often "hallucinate", i.e., fabricate facts without providing users an apparent means to discern the veracity of their statements. Uncertainty estimation (UE) methods are one path to safer, m… ▽ More Recent advancements in the capabilities of large language models (LLMs) have paved the way for a myriad of groundbreaking applications in various fields. However, a significant challenge arises as these models often "hallucinate", i.e., fabricate facts without providing users an apparent means to discern the veracity of their statements. Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of LLMs. However, to date, research on UE methods for LLMs has been focused primarily on theoretical rather than engineering contributions. In this work, we tackle this issue by introducing LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. Additionally, it introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores, empowering end-users to discern unreliable responses. LM-Polygraph is compatible with the most recent LLMs, including BLOOMz, LLaMA-2, ChatGPT, and GPT-4, and is designed to support future releases of similarly-styled LMs. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: Accepted at EMNLP-2023

arXiv:2310.12587 [pdf]

Accurate FTIR determination of boron concentration in CVD homoepitaxial diamond layers

Authors: Mikhail Panov, Vasily Zubkov, Anna Solomnikova, Igor Klepikov

Abstract: The intensive development of technology for fabrication semiconducting CVD diamond layers poses an important task of develo** a precise and non-destructive method for estimation the boron content in thin epitaxial layers. For bulk and uniformly doped diamond samples, the infrared optical spectroscopy successfully performs such a role. Here we propose a correct method to determine the boron conce… ▽ More The intensive development of technology for fabrication semiconducting CVD diamond layers poses an important task of develo** a precise and non-destructive method for estimation the boron content in thin epitaxial layers. For bulk and uniformly doped diamond samples, the infrared optical spectroscopy successfully performs such a role. Here we propose a correct method to determine the boron concentration in CVD homoepitaxial diamond layers from FTIR spectra. The method is the natural advancement of the existing technique for bulk samples. The feature of the novel technique is the accurate accounting of passing radiation through a multilayered structure with different thicknesses of absorbing media for special absorbing mechanisms. For this situation, an expression for the effective optical density is obtained. We have demonstrated the benefit of the method for a set of samples with CVD homoepitaxial layers grown on various HPHT substrates with and without nitrogen impurity. The measured FTIR spectra were subdivided into relevant sections responsible for the specific absorption mechanisms, and the correct amplitudes of the boron absorption peaks were derived. The data obtained from FTIR spectra is thoroughly compared to the charge carrier concentration derived from electrical capacitance-voltage measurements. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 18 pages, 7 figures, 1 table

MSC Class: 78-11

arXiv:2309.16412 [pdf, other]

Selective Nonparametric Regression via Testing

Authors: Fedor Noskov, Alexander Fishkov, Maxim Panov

Abstract: Prediction with the possibility of abstention (or selective prediction) is an important problem for error-critical machine learning applications. While well-studied in the classification setup, selective approaches to regression are much less developed. In this work, we consider the nonparametric heteroskedastic regression problem and develop an abstention procedure via testing the hypothesis on t… ▽ More Prediction with the possibility of abstention (or selective prediction) is an important problem for error-critical machine learning applications. While well-studied in the classification setup, selective approaches to regression are much less developed. In this work, we consider the nonparametric heteroskedastic regression problem and develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor. We prove non-asymptotic bounds on the risk of the resulting estimator and show the existence of several different convergence regimes. Theoretical analysis is illustrated with a series of experiments on simulated and real-world data. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2307.14530 [pdf, other]

Optimal Estimation in Mixed-Membership Stochastic Block Models

Authors: Fedor Noskov, Maxim Panov

Abstract: Community detection is one of the most critical problems in modern network science. Its applications can be found in various fields, from protein modeling to social network analysis. Recently, many papers appeared studying the problem of overlap** community detection, where each node of a network may belong to several communities. In this work, we consider Mixed-Membership Stochastic Block Model… ▽ More Community detection is one of the most critical problems in modern network science. Its applications can be found in various fields, from protein modeling to social network analysis. Recently, many papers appeared studying the problem of overlap** community detection, where each node of a network may belong to several communities. In this work, we consider Mixed-Membership Stochastic Block Model (MMSB) first proposed by Airoldi et al. (2008). MMSB provides quite a general setting for modeling overlap** community structure in graphs. The central question of this paper is to reconstruct relations between communities given an observed network. We compare different approaches and establish the minimax lower bound on the estimation error. Then, we propose a new estimator that matches this lower bound. Theoretical results are proved under fairly general conditions on the considered model. Finally, we illustrate the theory in a series of experiments. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2306.05131 [pdf, other]

Conformal Prediction for Federated Uncertainty Quantification Under Label Shift

Authors: Vincent Plassier, Mehdi Makni, Aleksandr Rubashevskii, Eric Moulines, Maxim Panov

Abstract: Federated Learning (FL) is a machine learning framework where many clients collaboratively train models while kee** the training data decentralized. Despite recent advances in FL, the uncertainty quantification topic (UQ) remains partially addressed. Among UQ methods, conformal prediction (CP) approaches provides distribution-free guarantees under minimal assumptions. We develop a new federated… ▽ More Federated Learning (FL) is a machine learning framework where many clients collaboratively train models while kee** the training data decentralized. Despite recent advances in FL, the uncertainty quantification topic (UQ) remains partially addressed. Among UQ methods, conformal prediction (CP) approaches provides distribution-free guarantees under minimal assumptions. We develop a new federated conformal prediction method based on quantile regression and take into account privacy constraints. This method takes advantage of importance weighting to effectively address the label shift between agents and provides theoretical guarantees for both valid coverage of the prediction sets and differential privacy. Extensive experimental studies demonstrate that this method outperforms current competitors. △ Less

Submitted 24 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: ICML 2023

arXiv:2301.05490 [pdf, other]

Scalable Batch Acquisition for Deep Bayesian Active Learning

Authors: Aleksandr Rubashevskii, Daria Kotova, Maxim Panov

Abstract: In deep active learning, it is especially important to choose multiple examples to markup at each step to work efficiently, especially on large datasets. At the same time, existing solutions to this problem in the Bayesian setup, such as BatchBALD, have significant limitations in selecting a large number of examples, associated with the exponential complexity of computing mutual information for jo… ▽ More In deep active learning, it is especially important to choose multiple examples to markup at each step to work efficiently, especially on large datasets. At the same time, existing solutions to this problem in the Bayesian setup, such as BatchBALD, have significant limitations in selecting a large number of examples, associated with the exponential complexity of computing mutual information for joint random variables. We, therefore, present the Large BatchBALD algorithm, which gives a well-grounded approximation to the BatchBALD method that aims to achieve comparable quality while being more computationally efficient. We provide a complexity analysis of the algorithm, showing a reduction in computation time, especially for large batches. Furthermore, we present an extensive set of experimental results on image and text data, both on toy datasets and larger ones such as CIFAR-100. △ Less

Submitted 16 February, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

Comments: Accepted to SIAM International Conference on Data Mining 2023

arXiv:2301.03252 [pdf, other]

Active Learning for Abstractive Text Summarization

Authors: Akim Tsvigun, Ivan Lysenko, Danila Sedashov, Ivan Lazichny, Eldar Damirov, Vladimir Karlov, Artemy Belousov, Leonid Sanochkin, Maxim Panov, Alexander Panchenko, Mikhail Burtsev, Artem Shelmanov

Abstract: Construction of human-curated annotated datasets for abstractive text summarization (ATS) is very time-consuming and expensive because creating each instance requires a human annotator to read a long document and compose a shorter summary that would preserve the key information relayed by the original document. Active Learning (AL) is a technique developed to reduce the amount of annotation requir… ▽ More Construction of human-curated annotated datasets for abstractive text summarization (ATS) is very time-consuming and expensive because creating each instance requires a human annotator to read a long document and compose a shorter summary that would preserve the key information relayed by the original document. Active Learning (AL) is a technique developed to reduce the amount of annotation required to achieve a certain level of machine learning model performance. In information extraction and text classification, AL can reduce the amount of labor up to multiple times. Despite its potential for aiding expensive annotation, as far as we know, there were no effective AL query strategies for ATS. This stems from the fact that many AL strategies rely on uncertainty estimation, while as we show in our work, uncertain instances are usually noisy, and selecting them can degrade the model performance compared to passive annotation. We address this problem by proposing the first effective query strategy for AL in ATS based on diversity principles. We show that given a certain annotation budget, using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores. Additionally, we analyze the effect of self-learning and show that it can further increase the performance of the model. △ Less

Submitted 9 January, 2023; originally announced January 2023.

Comments: Accepted at EMNLP-2022 Findings

arXiv:2301.00524 [pdf, other]

Learning Confident Classifiers in the Presence of Label Noise

Authors: Asma Ahmed Hashmi, Aigerim Zhumabayeva, Nikita Kotelevskii, Artem Agafonov, Mohammad Yaqub, Maxim Panov, Martin Takáč

Abstract: The success of Deep Neural Network (DNN) models significantly depends on the quality of provided annotations. In medical image segmentation, for example, having multiple expert annotations for each data point is common to minimize subjective annotation bias. Then, the goal of estimation is to filter out the label noise and recover the ground-truth masks, which are not explicitly given. This paper… ▽ More The success of Deep Neural Network (DNN) models significantly depends on the quality of provided annotations. In medical image segmentation, for example, having multiple expert annotations for each data point is common to minimize subjective annotation bias. Then, the goal of estimation is to filter out the label noise and recover the ground-truth masks, which are not explicitly given. This paper proposes a probabilistic model for noisy observations that allows us to build a confident classification and segmentation models. To accomplish it, we explicitly model label noise and introduce a new information-based regularization that pushes the network to recover the ground-truth labels. In addition, for segmentation task we adjust the loss function by prioritizing learning in high-confidence regions where all the annotators agree on labeling. We evaluate the proposed method on a series of classification tasks such as noisy versions of MNIST, CIFAR-10, Fashion-MNIST datasets as well as CIFAR-10N, which is real-world dataset with noisy human annotations. Additionally, for segmentation task, we consider several medical imaging datasets, such as, LIDC and RIGA that reflect real-world inter-variability among multiple annotators. Our experiments show that our algorithm outperforms state-of-the-art solutions for the considered classification and segmentation problems. △ Less

Submitted 9 December, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

arXiv:2209.01880 [pdf, other]

ScaleFace: Uncertainty-aware Deep Metric Learning

Authors: Roman Kail, Kirill Fedyanin, Nikita Muravev, Alexey Zaytsev, Maxim Panov

Abstract: The performance of modern deep learning-based systems dramatically depends on the quality of input objects. For example, face recognition quality would be lower for blurry or corrupted inputs. However, it is hard to predict the influence of input quality on the resulting accuracy in more complex scenarios. We propose an approach for deep metric learning that allows direct estimation of the uncerta… ▽ More The performance of modern deep learning-based systems dramatically depends on the quality of input objects. For example, face recognition quality would be lower for blurry or corrupted inputs. However, it is hard to predict the influence of input quality on the resulting accuracy in more complex scenarios. We propose an approach for deep metric learning that allows direct estimation of the uncertainty with almost no additional computational cost. The developed \textit{ScaleFace} algorithm uses trainable scale values that modify similarities in the space of embeddings. These input-dependent scale values represent a measure of confidence in the recognition result, thus allowing uncertainty estimation. We provide comprehensive experiments on face recognition tasks that show the superior performance of ScaleFace compared to other uncertainty-aware face recognition approaches. We also extend the results to the task of text-to-image retrieval showing that the proposed approach beats the competitors with significant margin. △ Less

Submitted 12 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

arXiv:2206.10691 [pdf, other]

Towards OOD Detection in Graph Classification from Uncertainty Estimation Perspective

Authors: Gleb Bazhenov, Sergei Ivanov, Maxim Panov, Alexey Zaytsev, Evgeny Burnaev

Abstract: The problem of out-of-distribution detection for graph classification is far from being solved. The existing models tend to be overconfident about OOD examples or completely ignore the detection task. In this work, we consider this problem from the uncertainty estimation perspective and perform the comparison of several recently proposed methods. In our experiment, we find that there is no univers… ▽ More The problem of out-of-distribution detection for graph classification is far from being solved. The existing models tend to be overconfident about OOD examples or completely ignore the detection task. In this work, we consider this problem from the uncertainty estimation perspective and perform the comparison of several recently proposed methods. In our experiment, we find that there is no universal approach for OOD detection, and it is important to consider both graph representations and predictive categorical distribution. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: ICML 2022 PODS Workshop

arXiv:2205.03194 [pdf, ps, other]

Scalable computation of prediction intervals for neural networks via matrix sketching

Authors: Alexander Fishkov, Maxim Panov

Abstract: Accounting for the uncertainty in the predictions of modern neural networks is a challenging and important task in many domains. Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure (e.g., Bayesian neural networks) or dramatically increase the computational cost of predictions such as approaches based on ensembling. This work proposes a new… ▽ More Accounting for the uncertainty in the predictions of modern neural networks is a challenging and important task in many domains. Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure (e.g., Bayesian neural networks) or dramatically increase the computational cost of predictions such as approaches based on ensembling. This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals. The method is based on the classical delta method in statistics but achieves computational efficiency by using matrix sketching to approximate the Jacobian matrix. The resulting algorithm is competitive with state-of-the-art approaches for constructing predictive intervals on various regression datasets from the UCI repository. △ Less

Submitted 6 May, 2022; originally announced May 2022.

arXiv:2202.12297 [pdf, other]

Embedded Ensembles: Infinite Width Limit and Operating Regimes

Authors: Maksim Velikanov, Roman Kail, Ivan Anokhin, Roman Vashurin, Maxim Panov, Alexey Zaytsev, Dmitry Yarotsky

Abstract: A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as Embedded Ensembling (EE); its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a systematic theoretical and empirical analysis of embedded ensembles with different number… ▽ More A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as Embedded Ensembling (EE); its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a systematic theoretical and empirical analysis of embedded ensembles with different number of models. Theoretically, we use a Neural-Tangent-Kernel-based approach to derive the wide network limit of the gradient descent dynamics. In this limit, we identify two ensemble regimes - independent and collective - depending on the architecture and initialization strategy of ensemble models. We prove that in the independent regime the embedded ensemble behaves as an ensemble of independent models. We confirm our theoretical prediction with a wide range of experiments with finite networks, and further study empirically various effects such as transition between the two regimes, scaling of ensemble performance with the network width and number of models, and dependence of performance on a number of architecture and hyperparameter choices. △ Less

Submitted 24 February, 2022; originally announced February 2022.

arXiv:2202.03101 [pdf, other]

Nonparametric Uncertainty Quantification for Single Deterministic Neural Network

Authors: Nikita Kotelevskii, Aleksandr Artemenkov, Kirill Fedyanin, Fedor Noskov, Alexander Fishkov, Artem Shelmanov, Artem Vazhentsev, Aleksandr Petiushko, Maxim Panov

Abstract: This paper proposes a fast and scalable method for uncertainty quantification of machine learning models' predictions. First, we show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution. Importantly, the proposed approach allows to disentangle explicitly aleatoric and epistemic uncerta… ▽ More This paper proposes a fast and scalable method for uncertainty quantification of machine learning models' predictions. First, we show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution. Importantly, the proposed approach allows to disentangle explicitly aleatoric and epistemic uncertainties. The resulting method works directly in the feature space. However, one can apply it to any neural network by considering an embedding of the data induced by the network. We demonstrate the strong performance of the method in uncertainty estimation tasks on text classification problems and a variety of real-world image datasets, such as MNIST, SVHN, CIFAR-100 and several versions of ImageNet. △ Less

Submitted 27 October, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: NeurIPS 2022 paper

arXiv:2108.00089 [pdf, other]

Tensor-Train Density Estimation

Authors: Georgii S. Novikov, Maxim E. Panov, Ivan V. Oseledets

Abstract: Estimation of probability density function from samples is one of the central problems in statistics and machine learning. Modern neural network-based models can learn high dimensional distributions but have problems with hyperparameter selection and are often prone to instabilities during training and inference. We propose a new efficient tensor train-based model for density estimation (TTDE). Su… ▽ More Estimation of probability density function from samples is one of the central problems in statistics and machine learning. Modern neural network-based models can learn high dimensional distributions but have problems with hyperparameter selection and are often prone to instabilities during training and inference. We propose a new efficient tensor train-based model for density estimation (TTDE). Such density parametrization allows exact sampling, calculation of cumulative and marginal density functions, and partition function. It also has very intuitive hyperparameters. We develop an efficient non-adversarial training procedure for TTDE based on the Riemannian optimization. Experimental results demonstrate the competitive performance of the proposed method in density estimation and sampling tasks, while TTDE significantly outperforms competitors in training speed. △ Less

Submitted 25 February, 2022; v1 submitted 30 July, 2021; originally announced August 2021.

Comments: Accepted for the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021)

ACM Class: G.3

arXiv:2107.03684 [pdf, other]

Assigning Topics to Documents by Successive Projections

Authors: Olga Klopp, Maxim Panov, Suzanne Sigalla, Alexandre Tsybakov

Abstract: Topic models provide a useful tool to organize and understand the structure of large corpora of text documents, in particular, to discover hidden thematic structure. Clustering documents from big unstructured corpora into topics is an important task in various areas, such as image analysis, e-commerce, social networks, population genetics. A common approach to topic modeling is to associate each t… ▽ More Topic models provide a useful tool to organize and understand the structure of large corpora of text documents, in particular, to discover hidden thematic structure. Clustering documents from big unstructured corpora into topics is an important task in various areas, such as image analysis, e-commerce, social networks, population genetics. A common approach to topic modeling is to associate each topic with a probability distribution on the dictionary of words and to consider each document as a mixture of topics. Since the number of topics is typically substantially smaller than the size of the corpus and of the dictionary, the methods of topic modeling can lead to a dramatic dimension reduction. In this paper, we study the problem of estimating topics distribution for each document in the given corpus, that is, we focus on the clustering aspect of the problem. We introduce an algorithm that we call Successive Projection Overlap** Clustering (SPOC) inspired by the Successive Projection Algorithm for separable matrix factorization. This algorithm is simple to implement and computationally fast. We establish theoretical guarantees on the performance of the SPOC algorithm, in particular, near matching minimax upper and lower bounds on its estimation risk. We also propose a new method that estimates the number of topics. We complement our theoretical results with a numerical study on synthetic and semi-synthetic data to analyze the performance of this new algorithm in practice. One of the conclusions is that the error of the algorithm grows at most logarithmically with the size of the dictionary, in contrast to what one observes for Latent Dirichlet Allocation. △ Less

Submitted 8 July, 2021; originally announced July 2021.

arXiv:2106.15921 [pdf, other]

Monte Carlo Variational Auto-Encoders

Authors: Achille Thin, Nikita Kotelevskii, Arnaud Doucet, Alain Durmus, Eric Moulines, Maxim Panov

Abstract: Variational auto-encoders (VAE) are popular deep latent variable models which are trained by maximizing an Evidence Lower Bound (ELBO). To obtain tighter ELBO and hence better variational approximations, it has been proposed to use importance sampling to get a lower variance estimate of the evidence. However, importance sampling is known to perform poorly in high dimensions. While it has been sugg… ▽ More Variational auto-encoders (VAE) are popular deep latent variable models which are trained by maximizing an Evidence Lower Bound (ELBO). To obtain tighter ELBO and hence better variational approximations, it has been proposed to use importance sampling to get a lower variance estimate of the evidence. However, importance sampling is known to perform poorly in high dimensions. While it has been suggested many times in the literature to use more sophisticated algorithms such as Annealed Importance Sampling (AIS) and its Sequential Importance Sampling (SIS) extensions, the potential benefits brought by these advanced techniques have never been realized for VAE: the AIS estimate cannot be easily differentiated, while SIS requires the specification of carefully chosen backward Markov kernels. In this paper, we address both issues and demonstrate the performance of the resulting Monte Carlo VAEs on a variety of applications. △ Less

Submitted 30 June, 2021; originally announced June 2021.

arXiv:2012.15550 [pdf, ps, other]

Nonreversible MCMC from conditional invertible transforms: a complete recipe with convergence guarantees

Authors: Achille Thin, Nikita Kotelevskii, Christophe Andrieu, Alain Durmus, Eric Moulines, Maxim Panov

Abstract: Markov Chain Monte Carlo (MCMC) is a class of algorithms to sample complex and high-dimensional probability distributions. The Metropolis-Hastings (MH) algorithm, the workhorse of MCMC, provides a simple recipe to construct reversible Markov kernels. Reversibility is a tractable property that implies a less tractable but essential property here, invariance. Reversibility is however not necessarily… ▽ More Markov Chain Monte Carlo (MCMC) is a class of algorithms to sample complex and high-dimensional probability distributions. The Metropolis-Hastings (MH) algorithm, the workhorse of MCMC, provides a simple recipe to construct reversible Markov kernels. Reversibility is a tractable property that implies a less tractable but essential property here, invariance. Reversibility is however not necessarily desirable when considering performance. This has prompted recent interest in designing kernels breaking this property. At the same time, an active stream of research has focused on the design of novel versions of the MH kernel, some nonreversible, relying on the use of complex invertible deterministic transforms. While standard implementations of the MH kernel are well understood, the aforementioned developments have not received the same systematic treatment to ensure their validity. This paper fills the gap by develo** general tools to ensure that a class of nonreversible Markov kernels, possibly relying on complex transforms, has the desired invariance property and leads to convergent algorithms. This leads to a set of simple and practically verifiable conditions. △ Less

Submitted 29 March, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

arXiv:2009.14588 [pdf, other]

EWS-GCN: Edge Weight-Shared Graph Convolutional Network for Transactional Banking Data

Authors: Ivan Sukharev, Valentina Shumovskaia, Kirill Fedyanin, Maxim Panov, Dmitry Berestnev

Abstract: In this paper, we discuss how modern deep learning approaches can be applied to the credit scoring of bank clients. We show that information about connections between clients based on money transfers between them allows us to significantly improve the quality of credit scoring compared to the approaches using information about the target client solely. As a final solution, we develop a new graph n… ▽ More In this paper, we discuss how modern deep learning approaches can be applied to the credit scoring of bank clients. We show that information about connections between clients based on money transfers between them allows us to significantly improve the quality of credit scoring compared to the approaches using information about the target client solely. As a final solution, we develop a new graph neural network model EWS-GCN that combines ideas of graph convolutional and recurrent neural networks via attention mechanism. The resulting model allows for robust training and efficient processing of large-scale data. We also demonstrate that our model outperforms the state-of-the-art graph neural networks achieving excellent results △ Less

Submitted 30 September, 2020; originally announced September 2020.

arXiv:2003.03274 [pdf, other]

Dropout Strikes Back: Improved Uncertainty Estimation via Diversity Sampling

Authors: Kirill Fedyanin, Evgenii Tsymbalov, Maxim Panov

Abstract: Uncertainty estimation for machine learning models is of high importance in many scenarios such as constructing the confidence intervals for model predictions and detection of out-of-distribution or adversarially generated points. In this work, we show that modifying the sampling distributions for dropout layers in neural networks improves the quality of uncertainty estimation. Our main idea consi… ▽ More Uncertainty estimation for machine learning models is of high importance in many scenarios such as constructing the confidence intervals for model predictions and detection of out-of-distribution or adversarially generated points. In this work, we show that modifying the sampling distributions for dropout layers in neural networks improves the quality of uncertainty estimation. Our main idea consists of two main steps: computing data-driven correlations between neurons and generating samples, which include maximally diverse neurons. In a series of experiments on simulated and real-world data, we demonstrate that the diversification via determinantal point processes-based sampling achieves state-of-the-art results in uncertainty estimation for regression and classification tasks. An important feature of our approach is that it does not require any modification to the models or training procedures, allowing straightforward application to any deep learning model with dropout layers. △ Less

Submitted 4 May, 2022; v1 submitted 6 March, 2020; originally announced March 2020.

arXiv:2002.12253 [pdf, other]

MetFlow: A New Efficient Method for Bridging the Gap between Markov Chain Monte Carlo and Variational Inference

Authors: Achille Thin, Nikita Kotelevskii, Jean-Stanislas Denain, Leo Grinsztajn, Alain Durmus, Maxim Panov, Eric Moulines

Abstract: In this contribution, we propose a new computationally efficient method to combine Variational Inference (VI) with Markov Chain Monte Carlo (MCMC). This approach can be used with generic MCMC kernels, but is especially well suited to \textit{MetFlow}, a novel family of MCMC algorithms we introduce, in which proposals are obtained using Normalizing Flows. The marginal distribution produced by such… ▽ More In this contribution, we propose a new computationally efficient method to combine Variational Inference (VI) with Markov Chain Monte Carlo (MCMC). This approach can be used with generic MCMC kernels, but is especially well suited to \textit{MetFlow}, a novel family of MCMC algorithms we introduce, in which proposals are obtained using Normalizing Flows. The marginal distribution produced by such MCMC algorithms is a mixture of flow-based distributions, thus drastically increasing the expressivity of the variational family. Unlike previous methods following this direction, our approach is amenable to the reparametrization trick and does not rely on computationally expensive reverse kernels. Extensive numerical experiments show clear computational and performance improvements over state-of-the-art methods. △ Less

Submitted 27 February, 2020; originally announced February 2020.

arXiv:2001.11411 [pdf, other]

NCVis: Noise Contrastive Approach for Scalable Visualization

Authors: Aleksandr Artemenkov, Maxim Panov

Abstract: Modern methods for data visualization via dimensionality reduction, such as t-SNE, usually have performance issues that prohibit their application to large amounts of high-dimensional data. In this work, we propose NCVis -- a high-performance dimensionality reduction method built on a sound statistical basis of noise contrastive estimation. We show that NCVis outperforms state-of-the-art technique… ▽ More Modern methods for data visualization via dimensionality reduction, such as t-SNE, usually have performance issues that prohibit their application to large amounts of high-dimensional data. In this work, we propose NCVis -- a high-performance dimensionality reduction method built on a sound statistical basis of noise contrastive estimation. We show that NCVis outperforms state-of-the-art techniques in terms of speed while preserving the representation quality of other methods. In particular, the proposed approach successfully proceeds a large dataset of more than 1 million news headlines in several minutes and presents the underlying structure in a human-readable way. Moreover, it provides results consistent with classical methods like t-SNE on more straightforward datasets like images of hand-written digits. We believe that the broader usage of such software can significantly simplify the large-scale data analysis and lower the entry barrier to this area. △ Less

Submitted 30 January, 2020; originally announced January 2020.

arXiv:2001.08427 [pdf, other]

Linking Bank Clients using Graph Neural Networks Powered by Rich Transactional Data

Authors: Valentina Shumovskaia, Kirill Fedyanin, Ivan Sukharev, Dmitry Berestnev, Maxim Panov

Abstract: Financial institutions obtain enormous amounts of data about user transactions and money transfers, which can be considered as a large graph dynamically changing in time. In this work, we focus on the task of predicting new interactions in the network of bank clients and treat it as a link prediction problem. We propose a new graph neural network model, which uses not only the topological structur… ▽ More Financial institutions obtain enormous amounts of data about user transactions and money transfers, which can be considered as a large graph dynamically changing in time. In this work, we focus on the task of predicting new interactions in the network of bank clients and treat it as a link prediction problem. We propose a new graph neural network model, which uses not only the topological structure of the network but rich time-series data available for the graph nodes and edges. We evaluate the developed method using the data provided by a large European bank for several years. The proposed model outperforms the existing approaches, including other neural network models, with a significant gap in ROC AUC score on link prediction problem and also allows to improve the quality of credit scoring. △ Less

Submitted 23 January, 2020; originally announced January 2020.

arXiv:1910.06028 [pdf, ps, other]

Accuracy of Gaussian approximation in nonparametric Bernstein -- von Mises Theorem

Authors: Vladimir Spokoiny, Maxim Panov

Abstract: The prominent Bernstein -- von Mises (BvM) result claims that the posterior distribution after centering by the efficient estimator and standardizing by the square root of the total Fisher information is nearly standard normal. In particular, the prior completely washes out from the asymptotic posterior distribution. This fact is fundamental and justifies the Bayes approach from the frequentist vi… ▽ More The prominent Bernstein -- von Mises (BvM) result claims that the posterior distribution after centering by the efficient estimator and standardizing by the square root of the total Fisher information is nearly standard normal. In particular, the prior completely washes out from the asymptotic posterior distribution. This fact is fundamental and justifies the Bayes approach from the frequentist viewpoint. In the nonparametric setup the situation changes dramatically and the impact of prior becomes essential even for the contraction of the posterior; see [vdV2008], [Bo2011], [CaNi2013,CaNi2014] for different models like Gaussian regression or i.i.d. model in different weak topologies. This paper offers another non-asymptotic approach to studying the behavior of the posterior for a special but rather popular and useful class of statistical models and for Gaussian priors. First we derive tight finite sample bounds on posterior contraction in terms of the so called effective dimension of the parameter space. Our main results describe the accuracy of Gaussian approximation of the posterior. In particular, we show that restricting to the class of all centrally symmetric credible sets around pMLE allows to get Gaussian approximation up to order (n^{-1}). We also show that the posterior distribution mimics well the distribution of the penalized maximum likelihood estimator (pMLE) and reduce the question of reliability of credible sets to consistency of the pMLE-based confidence sets. The obtained results are specified for nonparametric log-density estimation and generalized regression. △ Less

Submitted 1 June, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

MSC Class: 62F15; 62F25

arXiv:1904.06151 [pdf, ps, other]

Geometry-Aware Maximum Likelihood Estimation of Intrinsic Dimension

Authors: Marina Gomtsyan, Nikita Mokrov, Maxim Panov, Yury Yanovich

Abstract: The existing approaches to intrinsic dimension estimation usually are not reliable when the data are nonlinearly embedded in the high dimensional space. In this work, we show that the explicit accounting to geometric properties of unknown support leads to the polynomial correction to the standard maximum likelihood estimate of intrinsic dimension for flat manifolds. The proposed algorithm (GeoMLE)… ▽ More The existing approaches to intrinsic dimension estimation usually are not reliable when the data are nonlinearly embedded in the high dimensional space. In this work, we show that the explicit accounting to geometric properties of unknown support leads to the polynomial correction to the standard maximum likelihood estimate of intrinsic dimension for flat manifolds. The proposed algorithm (GeoMLE) realizes the correction by regression of standard MLEs based on distances to nearest neighbors for different sizes of neighborhoods. Moreover, the proposed approach also efficiently handles the case of nonuniform sampling of the manifold. We perform numerous experiments on different synthetic and real-world datasets. The results show that our algorithm achieves state-of-the-art performance, while also being computationally efficient and robust to noise in the data. △ Less

Submitted 12 April, 2019; originally announced April 2019.

arXiv:1902.10350 [pdf, other]

doi 10.24963/ijcai.2019/499

Deeper Connections between Neural Networks and Gaussian Processes Speed-up Active Learning

Authors: Evgenii Tsymbalov, Sergei Makarychev, Alexander Shapeev, Maxim Panov

Abstract: Active learning methods for neural networks are usually based on greedy criteria which ultimately give a single new design point for the evaluation. Such an approach requires either some heuristics to sample a batch of design points at one active learning iteration, or retraining the neural network after adding each data point, which is computationally inefficient. Moreover, uncertainty estimates… ▽ More Active learning methods for neural networks are usually based on greedy criteria which ultimately give a single new design point for the evaluation. Such an approach requires either some heuristics to sample a batch of design points at one active learning iteration, or retraining the neural network after adding each data point, which is computationally inefficient. Moreover, uncertainty estimates for neural networks sometimes are overconfident for the points lying far from the training sample. In this work we propose to approximate Bayesian neural networks (BNN) by Gaussian processes, which allows us to update the uncertainty estimates of predictions efficiently without retraining the neural network, while avoiding overconfident uncertainty prediction for out-of-sample points. In a series of experiments on real-world data including large-scale problems of chemical and physical modeling, we show superiority of the proposed approach over the state-of-the-art methods. △ Less

Submitted 27 February, 2019; originally announced February 2019.

Journal ref: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}, 2019

arXiv:1810.03032 [pdf, other]

doi 10.1109/ICDMW.2018.00152

Constructing Graph Node Embeddings via Discrimination of Similarity Distributions

Authors: Stanislav Tsepa, Maxim Panov

Abstract: The problem of unsupervised learning node embeddings in graphs is one of the important directions in modern network science. In this work we propose a novel framework, which is aimed to find embeddings by \textit{discriminating distributions of similarities (DDoS)} between nodes in the graph. The general idea is implemented by maximizing the \textit{earth mover distance} between distributions of d… ▽ More The problem of unsupervised learning node embeddings in graphs is one of the important directions in modern network science. In this work we propose a novel framework, which is aimed to find embeddings by \textit{discriminating distributions of similarities (DDoS)} between nodes in the graph. The general idea is implemented by maximizing the \textit{earth mover distance} between distributions of decoded similarities of similar and dissimilar nodes. The resulting algorithm generates embeddings which give a state-of-the-art performance in the problem of link prediction in real-world graphs. △ Less

Submitted 6 October, 2018; originally announced October 2018.

Journal ref: In 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 1050-1053

arXiv:1806.09856 [pdf, other]

doi 10.1007/978-3-030-11027-7_24

Dropout-based Active Learning for Regression

Authors: Evgenii Tsymbalov, Maxim Panov, Alexander Shapeev

Abstract: Active learning is relevant and challenging for high-dimensional regression models when the annotation of the samples is expensive. Yet most of the existing sampling methods cannot be applied to large-scale problems, consuming too much time for data processing. In this paper, we propose a fast active learning algorithm for regression, tailored for neural network models. It is based on uncertainty… ▽ More Active learning is relevant and challenging for high-dimensional regression models when the annotation of the samples is expensive. Yet most of the existing sampling methods cannot be applied to large-scale problems, consuming too much time for data processing. In this paper, we propose a fast active learning algorithm for regression, tailored for neural network models. It is based on uncertainty estimation from stochastic dropout output of the network. Experiments on both synthetic and real-world datasets show comparable or better performance (depending on the accuracy metric) as compared to the baselines. This approach can be generalized to other deep learning architectures. It can be used to systematically improve a machine-learning model as it offers a computationally efficient way of sampling additional data. △ Less

Submitted 5 July, 2018; v1 submitted 26 June, 2018; originally announced June 2018.

Comments: Report on AIST 2018; will be published in Springer LNCS series (Analysis of Images, Social Networks and Texts - 7th International Conference, AIST 2018)

Journal ref: Analysis of Images, Social Networks and Texts - 7th International Conference, AIST 2018, Lecture Notes in Computer Science book series (LNCS), volume 11179, pp. 247-258

arXiv:1804.10653 [pdf, other]

Sparse Group Inductive Matrix Completion

Authors: Ivan Nazarov, Boris Shirokikh, Maria Burkina, Gennady Fedonin, Maxim Panov

Abstract: We consider the problem of matrix completion with side information (\textit{inductive matrix completion}). In real-world applications many side-channel features are typically non-informative making feature selection an important part of the problem. We incorporate feature selection into inductive matrix completion by proposing a matrix factorization framework with group-lasso regularization on sid… ▽ More We consider the problem of matrix completion with side information (\textit{inductive matrix completion}). In real-world applications many side-channel features are typically non-informative making feature selection an important part of the problem. We incorporate feature selection into inductive matrix completion by proposing a matrix factorization framework with group-lasso regularization on side feature parameter matrices. We demonstrate, that the theoretical sample complexity for the proposed method is much lower compared to its competitors in sparse problems, and propose an efficient optimization algorithm for the resulting low-rank matrix completion problem with sparsifying regularizers. Experiments on synthetic and real-world datasets show that the proposed approach outperforms other methods. △ Less

Submitted 6 October, 2018; v1 submitted 27 April, 2018; originally announced April 2018.

arXiv:1710.05213 [pdf, ps, other]

doi 10.1007/978-3-319-72150-7_102

Simultaneous Matrix Diagonalization for Structural Brain Networks Classification

Authors: Nikita Mokrov, Maxim Panov, Boris A. Gutman, Joshua I. Faskowitz, Neda Jahanshad, Paul M. Thompson

Abstract: This paper considers the problem of brain disease classification based on connectome data. A connectome is a network representation of a human brain. The typical connectome classification problem is very challenging because of the small sample size and high dimensionality of the data. We propose to use simultaneous approximate diagonalization of adjacency matrices in order to compute their eigenst… ▽ More This paper considers the problem of brain disease classification based on connectome data. A connectome is a network representation of a human brain. The typical connectome classification problem is very challenging because of the small sample size and high dimensionality of the data. We propose to use simultaneous approximate diagonalization of adjacency matrices in order to compute their eigenstructures in more stable way. The obtained approximate eigenvalues are further used as features for classification. The proposed approach is demonstrated to be efficient for detection of Alzheimer's disease, outperforming simple baselines and competing with state-of-the-art approaches to brain disease classification. △ Less

Submitted 14 October, 2017; originally announced October 2017.

Journal ref: Complex Networks & Their Applications VI. COMPLEX NETWORKS 2017. Studies in Computational Intelligence, vol 689

arXiv:1707.01350 [pdf, ps, other]

doi 10.1007/978-3-319-72150-7_5

Consistent Estimation of Mixed Memberships with Successive Projections

Authors: Maxim Panov, Konstantin Slavnov, Roman Ushakov

Abstract: This paper considers the parameter estimation problem in Mixed Membership Stochastic Block Model (MMSB), which is a quite general instance of random graph model allowing for overlap** community structure. We present the new algorithm successive projection overlap** clustering (SPOC) which combines the ideas of spectral clustering and geometric approach for separable non-negative matrix factori… ▽ More This paper considers the parameter estimation problem in Mixed Membership Stochastic Block Model (MMSB), which is a quite general instance of random graph model allowing for overlap** community structure. We present the new algorithm successive projection overlap** clustering (SPOC) which combines the ideas of spectral clustering and geometric approach for separable non-negative matrix factorization. The proposed algorithm is provably consistent under MMSB with general conditions on the parameters of the model. SPOC is also shown to perform well experimentally in comparison to other algorithms. △ Less

Submitted 14 October, 2017; v1 submitted 5 July, 2017; originally announced July 2017.

Journal ref: Complex Networks & Their Applications VI. COMPLEX NETWORKS 2017. Studies in Computational Intelligence, vol 689

arXiv:1609.01088 [pdf, other]

GTApprox: surrogate modeling for industrial design

Authors: Mikhail Belyaev, Evgeny Burnaev, Ermek Kapushev, Maxim Panov, Pavel Prikhodko, Dmitry Vetrov, Dmitry Yarotsky

Abstract: We describe GTApprox - a new tool for medium-scale surrogate modeling in industrial design. Compared to existing software, GTApprox brings several innovations: a few novel approximation algorithms, several advanced methods of automated model selection, novel options in the form of hints. We demonstrate the efficiency of GTApprox on a large collection of test problems. In addition, we describe seve… ▽ More We describe GTApprox - a new tool for medium-scale surrogate modeling in industrial design. Compared to existing software, GTApprox brings several innovations: a few novel approximation algorithms, several advanced methods of automated model selection, novel options in the form of hints. We demonstrate the efficiency of GTApprox on a large collection of test problems. In addition, we describe several applications of GTApprox to real engineering problems. △ Less

Submitted 5 September, 2016; originally announced September 2016.

Comments: 31 pages, 11 figures

arXiv:1310.7796 [pdf, other]

doi 10.1214/14-BA926

Finite Sample Bernstein -- von Mises Theorem for Semiparametric Problems

Authors: Maxim Panov, Vladimir Spokoiny

Abstract: The classical parametric and semiparametric Bernstein -- von Mises (BvM) results are reconsidered in a non-classical setup allowing finite samples and model misspecification. In the case of a finite dimensional nuisance parameter we obtain an upper bound on the error of Gaussian approximation of the posterior distribution for the target parameter which is explicit in the dimension of the nuisance… ▽ More The classical parametric and semiparametric Bernstein -- von Mises (BvM) results are reconsidered in a non-classical setup allowing finite samples and model misspecification. In the case of a finite dimensional nuisance parameter we obtain an upper bound on the error of Gaussian approximation of the posterior distribution for the target parameter which is explicit in the dimension of the nuisance and target parameters. This helps to identify the so called \emph{critical dimension} $ p $ of the full parameter for which the BvM result is applicable. In the important i.i.d. case, we show that the condition "$ p^{3} / n $ is small" is sufficient for BvM result to be valid under general assumptions on the model. We also provide an example of a model with the phase transition effect: the statement of the BvM theorem fails when the dimension $ p $ approaches $ n^{1/3} $. The results are extended to the case of infinite dimensional parameters with the nuisance parameter from a Sobolev class. In particular we show near normality of the posterior if the smoothness parameter $s$ exceeds 3/2. △ Less

Submitted 15 June, 2014; v1 submitted 29 October, 2013; originally announced October 2013.

Journal ref: Bayesian Analysis, 10(3), 665-710, 2015

Showing 1–39 of 39 results for author: Panov, M