-
FI-CBL: A Probabilistic Method for Concept-Based Learning with Expert Rules
Authors:
Lev V. Utkin,
Andrei V. Konstantinov,
Stanislav R. Kirpichenko
Abstract:
A method for solving concept-based learning (CBL) problem is proposed. The main idea behind the method is to divide each concept-annotated image into patches, to transform the patches into embeddings by using an autoencoder, and to cluster the embeddings assuming that each cluster will mainly contain embeddings of patches with certain concepts. To find concepts of a new image, the method implement…
▽ More
A method for solving concept-based learning (CBL) problem is proposed. The main idea behind the method is to divide each concept-annotated image into patches, to transform the patches into embeddings by using an autoencoder, and to cluster the embeddings assuming that each cluster will mainly contain embeddings of patches with certain concepts. To find concepts of a new image, the method implements the frequentist inference by computing prior and posterior probabilities of concepts based on rates of patches from images with certain values of the concepts. Therefore, the proposed method is called the Frequentist Inference CBL (FI-CBL). FI-CBL allows us to incorporate the expert rules in the form of logic functions into the inference procedure. An idea behind the incorporation is to update prior and conditional probabilities of concepts to satisfy the rules. The method is transparent because it has an explicit sequence of probabilistic calculations and a clear frequency interpretation. Numerical experiments show that FI-CBL outperforms the concept bottleneck model in cases when the number of training data is small. The code of proposed algorithms is publicly available.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Incorporating Expert Rules into Neural Networks in the Framework of Concept-Based Learning
Authors:
Andrei V. Konstantinov,
Lev V. Utkin
Abstract:
A problem of incorporating the expert rules into machine learning models for extending the concept-based learning is formulated in the paper. It is proposed how to combine logical rules and neural networks predicting the concept probabilities. The first idea behind the combination is to form constraints for a joint probability distribution over all combinations of concept values to satisfy the exp…
▽ More
A problem of incorporating the expert rules into machine learning models for extending the concept-based learning is formulated in the paper. It is proposed how to combine logical rules and neural networks predicting the concept probabilities. The first idea behind the combination is to form constraints for a joint probability distribution over all combinations of concept values to satisfy the expert rules. The second idea is to represent a feasible set of probability distributions in the form of a convex polytope and to use its vertices or faces. We provide several approaches for solving the stated problem and for training neural networks which guarantee that the output probabilities of concepts would not violate the expert rules. The solution of the problem can be viewed as a way for combining the inductive and deductive learning. Expert rules are used in a broader sense when any logical function that connects concepts and class labels or just concepts with each other can be regarded as a rule. This feature significantly expands the class of the proposed results. Numerical examples illustrate the approaches. The code of proposed algorithms is publicly available.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Generating Survival Interpretable Trajectories and Data
Authors:
Andrei V. Konstantinov,
Stanislav R. Kirpichenko,
Lev V. Utkin
Abstract:
A new model for generating survival trajectories and data based on applying an autoencoder of a specific structure is proposed. It solves three tasks. First, it provides predictions in the form of the expected event time and the survival function for a new generated feature vector on the basis of the Beran estimator. Second, the model generates additional data based on a given training set that wo…
▽ More
A new model for generating survival trajectories and data based on applying an autoencoder of a specific structure is proposed. It solves three tasks. First, it provides predictions in the form of the expected event time and the survival function for a new generated feature vector on the basis of the Beran estimator. Second, the model generates additional data based on a given training set that would supplement the original dataset. Third, the most important, it generates a prototype time-dependent trajectory for an object, which characterizes how features of the object could be changed to achieve a different time to an event. The trajectory can be viewed as a type of the counterfactual explanation. The proposed model is robust during training and inference due to a specific weighting scheme incorporating into the variational autoencoder. The model also determines the censored indicators of new generated data by solving a classification task. The paper demonstrates the efficiency and properties of the proposed model using numerical experiments on synthetic and real datasets. The code of the algorithm implementing the proposed model is publicly available.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Dual feature-based and example-based explanation methods
Authors:
Andrei V. Konstantinov,
Boris V. Kozlov,
Stanislav R. Kirpichenko,
Lev V. Utkin
Abstract:
A new approach to the local and global explanation is proposed. It is based on selecting a convex hull constructed for the finite number of points around an explained instance. The convex hull allows us to consider a dual representation of instances in the form of convex combinations of extreme points of a produced polytope. Instead of perturbing new instances in the Euclidean feature space, vecto…
▽ More
A new approach to the local and global explanation is proposed. It is based on selecting a convex hull constructed for the finite number of points around an explained instance. The convex hull allows us to consider a dual representation of instances in the form of convex combinations of extreme points of a produced polytope. Instead of perturbing new instances in the Euclidean feature space, vectors of convex combination coefficients are uniformly generated from the unit simplex, and they form a new dual dataset. A dual linear surrogate model is trained on the dual dataset. The explanation feature importance values are computed by means of simple matrix calculations. The approach can be regarded as a modification of the well-known model LIME. The dual representation inherently allows us to get the example-based explanation. The neural additive model is also considered as a tool for implementing the example-based explanation approach. Many numerical experiments with real datasets are performed for studying the approach. The code of proposed algorithms is available.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
SurvBeNIM: The Beran-Based Neural Importance Model for Explaining the Survival Models
Authors:
Lev V. Utkin,
Danila Y. Eremenko,
Andrei V. Konstantinov
Abstract:
A new method called the Survival Beran-based Neural Importance Model (SurvBeNIM) is proposed. It aims to explain predictions of machine learning survival models, which are in the form of survival or cumulative hazard functions. The main idea behind SurvBeNIM is to extend the Beran estimator by incorporating the importance functions into its kernels and by implementing these importance functions as…
▽ More
A new method called the Survival Beran-based Neural Importance Model (SurvBeNIM) is proposed. It aims to explain predictions of machine learning survival models, which are in the form of survival or cumulative hazard functions. The main idea behind SurvBeNIM is to extend the Beran estimator by incorporating the importance functions into its kernels and by implementing these importance functions as a set of neural networks which are jointly trained in an end-to-end manner. Two strategies of using and training the whole neural network implementing SurvBeNIM are proposed. The first one explains a single instance, and the neural network is trained for each explained instance. According to the second strategy, the neural network only learns once on all instances from the dataset and on all generated instances. Then the neural network is used to explain any instance in a dataset domain. Various numerical experiments compare the method with different existing explanation methods. A code implementing the proposed method is publicly available.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
SurvBeX: An explanation method of the machine learning survival models based on the Beran estimator
Authors:
Lev V. Utkin,
Danila Y. Eremenko,
Andrei V. Konstantinov
Abstract:
An explanation method called SurvBeX is proposed to interpret predictions of the machine learning survival black-box models. The main idea behind the method is to use the modified Beran estimator as the surrogate explanation model. Coefficients, incorporated into Beran estimator, can be regarded as values of the feature impacts on the black-box model prediction. Following the well-known LIME metho…
▽ More
An explanation method called SurvBeX is proposed to interpret predictions of the machine learning survival black-box models. The main idea behind the method is to use the modified Beran estimator as the surrogate explanation model. Coefficients, incorporated into Beran estimator, can be regarded as values of the feature impacts on the black-box model prediction. Following the well-known LIME method, many points are generated in a local area around an example of interest. For every generated example, the survival function of the black-box model is computed, and the survival function of the surrogate model (the Beran estimator) is constructed as a function of the explanation coefficients. In order to find the explanation coefficients, it is proposed to minimize the mean distance between the survival functions of the black-box model and the Beran estimator produced by the generated examples. Many numerical experiments with synthetic and real survival data demonstrate the SurvBeX efficiency and compare the method with the well-known method SurvLIME. The method is also compared with the method SurvSHAP. The code implementing SurvBeX is available at: https://github.com/DanilaEremenko/SurvBeX
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
A New Computationally Simple Approach for Implementing Neural Networks with Output Hard Constraints
Authors:
Andrei V. Konstantinov,
Lev V. Utkin
Abstract:
A new computationally simple method of imposing hard convex constraints on the neural network output values is proposed. The key idea behind the method is to map a vector of hidden parameters of the network to a point that is guaranteed to be inside the feasible set defined by a set of constraints. The map** is implemented by the additional neural network layer with constraints for output. The p…
▽ More
A new computationally simple method of imposing hard convex constraints on the neural network output values is proposed. The key idea behind the method is to map a vector of hidden parameters of the network to a point that is guaranteed to be inside the feasible set defined by a set of constraints. The map** is implemented by the additional neural network layer with constraints for output. The proposed method is simply extended to the case when constraints are imposed not only on the output vectors, but also on joint constraints depending on inputs. The projection approach to imposing constraints on outputs can simply be implemented in the framework of the proposed method. It is shown how to incorporate different types of constraints into the proposed method, including linear and quadratic constraints, equality constraints, and dynamic constraints, constraints in the form of boundaries. An important feature of the method is its computational simplicity. Complexities of the forward pass of the proposed neural network layer by linear and quadratic constraints are O(n*m) and O(n^2*m), respectively, where n is the number of variables, m is the number of constraints. Numerical experiments illustrate the method by solving optimization and classification problems. The code implementing the method is publicly available.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Neural Attention Forests: Transformer-Based Forest Improvement
Authors:
Andrei V. Konstantinov,
Lev V. Utkin,
Alexey A. Lukashin,
Vladimir A. Muliukha
Abstract:
A new approach called NAF (the Neural Attention Forest) for solving regression and classification tasks under tabular training data is proposed. The main idea behind the proposed NAF model is to introduce the attention mechanism into the random forest by assigning attention weights calculated by neural networks of a specific form to data in leaves of decision trees and to the random forest itself…
▽ More
A new approach called NAF (the Neural Attention Forest) for solving regression and classification tasks under tabular training data is proposed. The main idea behind the proposed NAF model is to introduce the attention mechanism into the random forest by assigning attention weights calculated by neural networks of a specific form to data in leaves of decision trees and to the random forest itself in the framework of the Nadaraya-Watson kernel regression. In contrast to the available models like the attention-based random forest, the attention weights and the Nadaraya-Watson regression are represented in the form of neural networks whose weights can be regarded as trainable parameters. The first part of neural networks with shared weights is trained for all trees and computes attention weights of data in leaves. The second part aggregates outputs of the tree networks and aims to minimize the difference between the random forest prediction and the truth target value from a training set. The neural network is trained in an end-to-end manner. The combination of the random forest and neural networks implementing the attention mechanism forms a transformer for enhancing the forest predictions. Numerical experiments with real datasets illustrate the proposed method. The code implementing the approach is publicly available.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Interpretable Ensembles of Hyper-Rectangles as Base Models
Authors:
Andrei V. Konstantinov,
Lev V. Utkin
Abstract:
A new extremely simple ensemble-based model with the uniformly generated axis-parallel hyper-rectangles as base models (HRBM) is proposed. Two types of HRBMs are studied: closed rectangles and corners. The main idea behind HRBM is to consider and count training examples inside and outside each rectangle. It is proposed to incorporate HRBMs into the gradient boosting machine (GBM). Despite simplici…
▽ More
A new extremely simple ensemble-based model with the uniformly generated axis-parallel hyper-rectangles as base models (HRBM) is proposed. Two types of HRBMs are studied: closed rectangles and corners. The main idea behind HRBM is to consider and count training examples inside and outside each rectangle. It is proposed to incorporate HRBMs into the gradient boosting machine (GBM). Despite simplicity of HRBMs, it turns out that these simple base models allow us to construct effective ensemble-based models and avoid overfitting. A simple method for calculating optimal regularization parameters of the ensemble-based model, which can be modified in the explicit way at each iteration of GBM, is considered. Moreover, a new regularization called the "step height penalty" is studied in addition to the standard L1 and L2 regularizations. An extremely simple approach to the proposed ensemble-based model prediction interpretation by using the well-known method SHAP is proposed. It is shown that GBM with HRBM can be regarded as a model extending a set of interpretable models for explaining black-box models. Numerical experiments with real datasets illustrate the proposed GBM with HRBMs for regression and classification problems. Experiments also illustrate computational efficiency of the proposed SHAP modifications. The code of proposed algorithms implementing GBM with HRBM is publicly available.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Multiple Instance Learning with Trainable Decision Tree Ensembles
Authors:
Andrei V. Konstantinov,
Lev V. Utkin
Abstract:
A new random forest based model for solving the Multiple Instance Learning (MIL) problem under small tabular data, called Soft Tree Ensemble MIL (STE-MIL), is proposed. A new type of soft decision trees is considered, which is similar to the well-known soft oblique trees, but with a smaller number of trainable parameters. In order to train the trees, it is proposed to convert them into neural netw…
▽ More
A new random forest based model for solving the Multiple Instance Learning (MIL) problem under small tabular data, called Soft Tree Ensemble MIL (STE-MIL), is proposed. A new type of soft decision trees is considered, which is similar to the well-known soft oblique trees, but with a smaller number of trainable parameters. In order to train the trees, it is proposed to convert them into neural networks of a specific form, which approximate the tree functions. It is also proposed to aggregate the instance and bag embeddings (output vectors) by using the attention mechanism. The whole STE-MIL model, including soft decision trees, neural networks, the attention mechanism and a classifier, is trained in an end-to-end manner. Numerical experiments with tabular datasets illustrate STE-MIL. The corresponding code implementing the model is publicly available.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
BENK: The Beran Estimator with Neural Kernels for Estimating the Heterogeneous Treatment Effect
Authors:
Stanislav R. Kirpichenko,
Lev V. Utkin,
Andrei V. Konstantinov
Abstract:
A method for estimating the conditional average treatment effect under condition of censored time-to-event data called BENK (the Beran Estimator with Neural Kernels) is proposed. The main idea behind the method is to apply the Beran estimator for estimating the survival functions of controls and treatments. Instead of typical kernel functions in the Beran estimator, it is proposed to implement ker…
▽ More
A method for estimating the conditional average treatment effect under condition of censored time-to-event data called BENK (the Beran Estimator with Neural Kernels) is proposed. The main idea behind the method is to apply the Beran estimator for estimating the survival functions of controls and treatments. Instead of typical kernel functions in the Beran estimator, it is proposed to implement kernels in the form of neural networks of a specific form called the neural kernels. The conditional average treatment effect is estimated by using the survival functions as outcomes of the control and treatment neural networks which consists of a set of neural kernels with shared parameters. The neural kernels are more flexible and can accurately model a complex location structure of feature vectors. Various numerical simulation experiments illustrate BENK and compare it with the well-known T-learner, S-learner and X-learner for several types of the control and treatment outcome functions based on the Cox models, the random survival forest and the Nadaraya-Watson regression with Gaussian kernels. The code of proposed algorithms implementing BENK is available in https://github.com/Stasychbr/BENK.
△ Less
Submitted 19 November, 2022;
originally announced November 2022.
-
LARF: Two-level Attention-based Random Forests with a Mixture of Contamination Models
Authors:
Andrei V. Konstantinov,
Lev V. Utkin
Abstract:
New models of the attention-based random forests called LARF (Leaf Attention-based Random Forest) are proposed. The first idea behind the models is to introduce a two-level attention, where one of the levels is the "leaf" attention and the attention mechanism is applied to every leaf of trees. The second level is the tree attention depending on the "leaf" attention. The second idea is to replace t…
▽ More
New models of the attention-based random forests called LARF (Leaf Attention-based Random Forest) are proposed. The first idea behind the models is to introduce a two-level attention, where one of the levels is the "leaf" attention and the attention mechanism is applied to every leaf of trees. The second level is the tree attention depending on the "leaf" attention. The second idea is to replace the softmax operation in the attention with the weighted sum of the softmax operations with different parameters. It is implemented by applying a mixture of the Huber's contamination models and can be regarded as an analog of the multi-head attention with "heads" defined by selecting a value of the softmax parameter. Attention parameters are simply trained by solving the quadratic optimization problem. To simplify the tuning process of the models, it is proposed to make the tuning contamination parameters to be training and to compute them by solving the quadratic optimization problem. Many numerical experiments with real datasets are performed for studying LARFs. The code of proposed algorithms can be found in https://github.com/andruekonst/leaf-attention-forest.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Improved Anomaly Detection by Using the Attention-Based Isolation Forest
Authors:
Lev V. Utkin,
Andrey Y. Ageev,
Andrei V. Konstantinov
Abstract:
A new modification of Isolation Forest called Attention-Based Isolation Forest (ABIForest) for solving the anomaly detection problem is proposed. It incorporates the attention mechanism in the form of the Nadaraya-Watson regression into the Isolation Forest for improving solution of the anomaly detection problem. The main idea underlying the modification is to assign attention weights to each path…
▽ More
A new modification of Isolation Forest called Attention-Based Isolation Forest (ABIForest) for solving the anomaly detection problem is proposed. It incorporates the attention mechanism in the form of the Nadaraya-Watson regression into the Isolation Forest for improving solution of the anomaly detection problem. The main idea underlying the modification is to assign attention weights to each path of trees with learnable parameters depending on instances and trees themselves. The Huber's contamination model is proposed to be used for defining the attention weights and their parameters. As a result, the attention weights are linearly depend on the learnable attention parameters which are trained by solving the standard linear or quadratic optimization problem. ABIForest can be viewed as the first modification of Isolation Forest, which incorporates the attention mechanism in a simple way without applying gradient-based algorithms. Numerical experiments with synthetic and real datasets illustrate outperforming results of ABIForest. The code of proposed algorithms is available.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Heterogeneous Treatment Effect with Trained Kernels of the Nadaraya-Watson Regression
Authors:
Andrei V. Konstantinov,
Stanislav R. Kirpichenko,
Lev V. Utkin
Abstract:
A new method for estimating the conditional average treatment effect is proposed in the paper. It is called TNW-CATE (the Trainable Nadaraya-Watson regression for CATE) and based on the assumption that the number of controls is rather large whereas the number of treatments is small. TNW-CATE uses the Nadaraya-Watson regression for predicting outcomes of patients from the control and treatment grou…
▽ More
A new method for estimating the conditional average treatment effect is proposed in the paper. It is called TNW-CATE (the Trainable Nadaraya-Watson regression for CATE) and based on the assumption that the number of controls is rather large whereas the number of treatments is small. TNW-CATE uses the Nadaraya-Watson regression for predicting outcomes of patients from the control and treatment groups. The main idea behind TNW-CATE is to train kernels of the Nadaraya-Watson regression by using a weight sharing neural network of a specific form. The network is trained on controls, and it replaces standard kernels with a set of neural subnetworks with shared parameters such that every subnetwork implements the trainable kernel, but the whole network implements the Nadaraya-Watson estimator. The network memorizes how the feature vectors are located in the feature space. The proposed approach is similar to the transfer learning when domains of source and target data are similar, but tasks are different. Various numerical simulation experiments illustrate TNW-CATE and compare it with the well-known T-learner, S-learner and X-learner for several types of the control and treatment outcome functions. The code of proposed algorithms implementing TNW-CATE is available in https://github.com/Stasychbr/TNW-CATE.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Attention and Self-Attention in Random Forests
Authors:
Lev V. Utkin,
Andrei V. Konstantinov
Abstract:
New models of random forests jointly using the attention and self-attention mechanisms are proposed for solving the regression problem. The models can be regarded as extensions of the attention-based random forest whose idea stems from applying a combination of the Nadaraya-Watson kernel regression and the Huber's contamination model to random forests. The self-attention aims to capture dependenci…
▽ More
New models of random forests jointly using the attention and self-attention mechanisms are proposed for solving the regression problem. The models can be regarded as extensions of the attention-based random forest whose idea stems from applying a combination of the Nadaraya-Watson kernel regression and the Huber's contamination model to random forests. The self-attention aims to capture dependencies of the tree predictions and to remove noise or anomalous predictions in the random forest. The self-attention module is trained jointly with the attention module for computing weights. It is shown that the training process of attention weights is reduced to solving a single quadratic or linear optimization problem. Three modifications of the general approach are proposed and compared. A specific multi-head self-attention for the random forest is also considered. Heads of the self-attention are obtained by changing its tuning parameters including the kernel parameters and the contamination parameter of models. Numerical experiments with various datasets illustrate the proposed models and show that the supplement of the self-attention improves the model performance for many datasets.
△ Less
Submitted 9 July, 2022;
originally announced July 2022.
-
Attention-based Random Forest and Contamination Model
Authors:
Lev V. Utkin,
Andrei V. Konstantinov
Abstract:
A new approach called ABRF (the attention-based random forest) and its modifications for applying the attention mechanism to the random forest (RF) for regression and classification are proposed. The main idea behind the proposed ABRF models is to assign attention weights with trainable parameters to decision trees in a specific way. The weights depend on the distance between an instance, which fa…
▽ More
A new approach called ABRF (the attention-based random forest) and its modifications for applying the attention mechanism to the random forest (RF) for regression and classification are proposed. The main idea behind the proposed ABRF models is to assign attention weights with trainable parameters to decision trees in a specific way. The weights depend on the distance between an instance, which falls into a corresponding leaf of a tree, and instances, which fall in the same leaf. This idea stems from representation of the Nadaraya-Watson kernel regression in the form of a RF. Three modifications of the general approach are proposed. The first one is based on applying the Huber's contamination model and on computing the attention weights by solving quadratic or linear optimization problems. The second and the third modifications use the gradient-based algorithms for computing trainable parameters. Numerical experiments with various regression and classification datasets illustrate the proposed method.
△ Less
Submitted 8 January, 2022;
originally announced January 2022.
-
Multi-Attention Multiple Instance Learning
Authors:
Andrei V. Konstantinov,
Lev V. Utkin
Abstract:
A new multi-attention based method for solving the MIL problem (MAMIL), which takes into account the neighboring patches or instances of each analyzed patch in a bag, is proposed. In the method, one of the attention modules takes into account adjacent patches or instances, several attention modules are used to get a diverse feature representation of patches, and one attention module is used to uni…
▽ More
A new multi-attention based method for solving the MIL problem (MAMIL), which takes into account the neighboring patches or instances of each analyzed patch in a bag, is proposed. In the method, one of the attention modules takes into account adjacent patches or instances, several attention modules are used to get a diverse feature representation of patches, and one attention module is used to unite different feature representations to provide an accurate classification of each patch (instance) and the whole bag. Due to MAMIL, a combined representation of patches and their neighbors in the form of embeddings of a small dimensionality for simple classification is realized. Moreover, different types of patches are efficiently processed, and a diverse feature representation of patches in a bag by using several attention modules is implemented. A simple approach for explaining the classification predictions of patches is proposed. Numerical experiments with various datasets illustrate the proposed method.
△ Less
Submitted 11 December, 2021;
originally announced December 2021.
-
Attention-like feature explanation for tabular data
Authors:
Andrei V. Konstantinov,
Lev V. Utkin
Abstract:
A new method for local and global explanation of the machine learning black-box model predictions by tabular data is proposed. It is implemented as a system called AFEX (Attention-like Feature EXplanation) and consisting of two main parts. The first part is a set of the one-feature neural subnetworks which aim to get a specific representation for every feature in the form of a basis of shape funct…
▽ More
A new method for local and global explanation of the machine learning black-box model predictions by tabular data is proposed. It is implemented as a system called AFEX (Attention-like Feature EXplanation) and consisting of two main parts. The first part is a set of the one-feature neural subnetworks which aim to get a specific representation for every feature in the form of a basis of shape functions. The subnetworks use shortcut connections with trainable parameters to improve the network performance. The second part of AFEX produces shape functions of features as the weighted sum of the basis shape functions where weights are computed by using an attention-like mechanism. AFEX identifies pairwise interactions between features based on pairwise multiplications of shape functions corresponding to different features. A modification of AFEX with incorporating an additional surrogate model which approximates the black-box model is proposed. AFEX is trained end-to-end on a whole dataset only once such that it does not require to train neural networks again in the explanation stage. Numerical experiments with synthetic and real data illustrate AFEX.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
An Imprecise SHAP as a Tool for Explaining the Class Probability Distributions under Limited Training Data
Authors:
Lev V. Utkin,
Andrei V. Konstantinov,
Kirill A. Vishniakov
Abstract:
One of the most popular methods of the machine learning prediction explanation is the SHapley Additive exPlanations method (SHAP). An imprecise SHAP as a modification of the original SHAP is proposed for cases when the class probability distributions are imprecise and represented by sets of distributions. The first idea behind the imprecise SHAP is a new approach for computing the marginal contrib…
▽ More
One of the most popular methods of the machine learning prediction explanation is the SHapley Additive exPlanations method (SHAP). An imprecise SHAP as a modification of the original SHAP is proposed for cases when the class probability distributions are imprecise and represented by sets of distributions. The first idea behind the imprecise SHAP is a new approach for computing the marginal contribution of a feature, which fulfils the important efficiency property of Shapley values. The second idea is an attempt to consider a general approach to calculating and reducing interval-valued Shapley values, which is similar to the idea of reachable probability intervals in the imprecise probability theory. A simple special implementation of the general approach in the form of linear optimization problems is proposed, which is based on using the Kolmogorov-Smirnov distance and imprecise contamination models. Numerical examples with synthetic and real data illustrate the imprecise SHAP.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
SurvNAM: The machine learning survival model explanation
Authors:
Lev V. Utkin,
Egor D. Satyukov,
Andrei V. Konstantinov
Abstract:
A new modification of the Neural Additive Model (NAM) called SurvNAM and its modifications are proposed to explain predictions of the black-box machine learning survival model. The method is based on applying the original NAM to solving the explanation problem in the framework of survival analysis. The basic idea behind SurvNAM is to train the network by means of a specific expected loss function…
▽ More
A new modification of the Neural Additive Model (NAM) called SurvNAM and its modifications are proposed to explain predictions of the black-box machine learning survival model. The method is based on applying the original NAM to solving the explanation problem in the framework of survival analysis. The basic idea behind SurvNAM is to train the network by means of a specific expected loss function which takes into account peculiarities of the survival model predictions and is based on approximating the black-box model by the extension of the Cox proportional hazards model which uses the well-known Generalized Additive Model (GAM) in place of the simple linear relationship of covariates. The proposed method SurvNAM allows performing the local and global explanation. A set of examples around the explained example is randomly generated for the local explanation. The global explanation uses the whole training dataset. The proposed modifications of SurvNAM are based on using the Lasso-based regularization for functions from GAM and for a special representation of the GAM functions using their weighted linear and non-linear parts, which is implemented as a shortcut connection. A lot of numerical experiments illustrate the SurvNAM efficiency.
△ Less
Submitted 18 April, 2021;
originally announced April 2021.
-
Ensembles of Random SHAPs
Authors:
Lev V. Utkin,
Andrei V. Konstantinov
Abstract:
Ensemble-based modifications of the well-known SHapley Additive exPlanations (SHAP) method for the local explanation of a black-box model are proposed. The modifications aim to simplify SHAP which is computationally expensive when there is a large number of features. The main idea behind the proposed modifications is to approximate SHAP by an ensemble of SHAPs with a smaller number of features. Ac…
▽ More
Ensemble-based modifications of the well-known SHapley Additive exPlanations (SHAP) method for the local explanation of a black-box model are proposed. The modifications aim to simplify SHAP which is computationally expensive when there is a large number of features. The main idea behind the proposed modifications is to approximate SHAP by an ensemble of SHAPs with a smaller number of features. According to the first modification, called ER-SHAP, several features are randomly selected many times from the feature set, and Shapley values for the features are computed by means of "small" SHAPs. The explanation results are averaged to get the final Shapley values. According to the second modification, called ERW-SHAP, several points are generated around the explained instance for diversity purposes, and results of their explanation are combined with weights depending on distances between points and the explained instance. The third modification, called ER-SHAP-RF, uses the random forest for preliminary explanation of instances and determining a feature probability distribution which is applied to selection of features in the ensemble-based procedure of ER-SHAP. Many numerical experiments illustrating the proposed modifications demonstrate their efficiency and properties for local explanation.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Interpretable Machine Learning with an Ensemble of Gradient Boosting Machines
Authors:
Andrei V. Konstantinov,
Lev V. Utkin
Abstract:
A method for the local and global interpretation of a black-box model on the basis of the well-known generalized additive models is proposed. It can be viewed as an extension or a modification of the algorithm using the neural additive model. The method is based on using an ensemble of gradient boosting machines (GBMs) such that each GBM is learned on a single feature and produces a shape function…
▽ More
A method for the local and global interpretation of a black-box model on the basis of the well-known generalized additive models is proposed. It can be viewed as an extension or a modification of the algorithm using the neural additive model. The method is based on using an ensemble of gradient boosting machines (GBMs) such that each GBM is learned on a single feature and produces a shape function of the feature. The ensemble is composed as a weighted sum of separate GBMs resulting a weighted sum of shape functions which form the generalized additive model. GBMs are built in parallel using randomized decision trees of depth 1, which provide a very simple architecture. Weights of GBMs as well as features are computed in each iteration of boosting by using the Lasso method and then updated by means of a specific smoothing procedure. In contrast to the neural additive model, the method provides weights of features in the explicit form, and it is simply trained. A lot of numerical experiments with an algorithm implementing the proposed method on synthetic and real datasets demonstrate its efficiency and properties for local and global interpretation.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
A Generalized Stacking for Implementing Ensembles of Gradient Boosting Machines
Authors:
Andrei V. Konstantinov,
Lev V. Utkin
Abstract:
The gradient boosting machine is one of the powerful tools for solving regression problems. In order to cope with its shortcomings, an approach for constructing ensembles of gradient boosting models is proposed. The main idea behind the approach is to use the stacking algorithm in order to learn a second-level meta-model which can be regarded as a model for implementing various ensembles of gradie…
▽ More
The gradient boosting machine is one of the powerful tools for solving regression problems. In order to cope with its shortcomings, an approach for constructing ensembles of gradient boosting models is proposed. The main idea behind the approach is to use the stacking algorithm in order to learn a second-level meta-model which can be regarded as a model for implementing various ensembles of gradient boosting models. First, the linear regression of the gradient boosting models is considered as a simplest realization of the meta-model under condition that the linear model is differentiable with respect to its coefficients (weights). Then it is shown that the proposed approach can be simply extended on arbitrary differentiable combination models, for example, on neural networks which are differentiable and can implement arbitrary functions of gradient boosting models. Various numerical examples illustrate the proposed approach.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
Gradient boosting machine with partially randomized decision trees
Authors:
Andrei V. Konstantinov,
Lev V. Utkin
Abstract:
The gradient boosting machine is a powerful ensemble-based machine learning method for solving regression problems. However, one of the difficulties of its using is a possible discontinuity of the regression function, which arises when regions of training data are not densely covered by training points. In order to overcome this difficulty and to reduce the computational complexity of the gradient…
▽ More
The gradient boosting machine is a powerful ensemble-based machine learning method for solving regression problems. However, one of the difficulties of its using is a possible discontinuity of the regression function, which arises when regions of training data are not densely covered by training points. In order to overcome this difficulty and to reduce the computational complexity of the gradient boosting machine, we propose to apply the partially randomized trees which can be regarded as a special case of the extremely randomized trees applied to the gradient boosting. The gradient boosting machine with the partially randomized trees is illustrated by means of many numerical examples using synthetic and real data.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.
-
An Adaptive Weighted Deep Forest Classifier
Authors:
Lev V. Utkin,
Andrei V. Konstantinov,
Viacheslav S. Chukanov,
Mikhail V. Kots,
Anna A. Meldo
Abstract:
A modification of the confidence screening mechanism based on adaptive weighing of every training instance at each cascade level of the Deep Forest is proposed. The idea underlying the modification is very simple and stems from the confidence screening mechanism idea proposed by Pang et al. to simplify the Deep Forest classifier by means of updating the training set at each level in accordance wit…
▽ More
A modification of the confidence screening mechanism based on adaptive weighing of every training instance at each cascade level of the Deep Forest is proposed. The idea underlying the modification is very simple and stems from the confidence screening mechanism idea proposed by Pang et al. to simplify the Deep Forest classifier by means of updating the training set at each level in accordance with the classification accuracy of every training instance. However, if the confidence screening mechanism just removes instances from training and testing processes, then the proposed modification is more flexible and assigns weights by taking into account the classification accuracy. The modification is similar to the AdaBoost to some extent. Numerical experiments illustrate good performance of the proposed modification in comparison with the original Deep Forest proposed by Zhou and Feng.
△ Less
Submitted 4 January, 2019;
originally announced January 2019.
-
A weighted random survival forest
Authors:
Lev V. Utkin,
Andrei V. Konstantinov,
Viacheslav S. Chukanov,
Mikhail V. Kots,
Mikhail A. Ryabinin,
Anna A. Meldo
Abstract:
A weighted random survival forest is presented in the paper. It can be regarded as a modification of the random forest improving its performance. The main idea underlying the proposed model is to replace the standard procedure of averaging used for estimation of the random survival forest hazard function by weighted avaraging where the weights are assigned to every tree and can be veiwed as traini…
▽ More
A weighted random survival forest is presented in the paper. It can be regarded as a modification of the random forest improving its performance. The main idea underlying the proposed model is to replace the standard procedure of averaging used for estimation of the random survival forest hazard function by weighted avaraging where the weights are assigned to every tree and can be veiwed as training paremeters which are computed in an optimal way by solving a standard quadratic optimization problem maximizing Harrell's C-index. Numerical examples with real data illustrate the outperformance of the proposed model in comparison with the original random survival forest.
△ Less
Submitted 1 January, 2019;
originally announced January 2019.
-
An FCA-based Boolean Matrix Factorisation for Collaborative Filtering
Authors:
Elena Nenova,
Dmitry I. Ignatov,
Andrey V. Konstantinov
Abstract:
We propose a new approach for Collaborative Filtering which is based on Boolean Matrix Factorisation (BMF) and Formal Concept Analysis. In a series of experiments on real data (Movielens dataset) we compare the approach with the SVD- and NMF-based algorithms in terms of Mean Average Error (MAE). One of the experimental consequences is that it is enough to have a binary-scaled rating data to obtain…
▽ More
We propose a new approach for Collaborative Filtering which is based on Boolean Matrix Factorisation (BMF) and Formal Concept Analysis. In a series of experiments on real data (Movielens dataset) we compare the approach with the SVD- and NMF-based algorithms in terms of Mean Average Error (MAE). One of the experimental consequences is that it is enough to have a binary-scaled rating data to obtain almost the same quality in terms of MAE by BMF than for the SVD-based algorithm in case of non-scaled data.
△ Less
Submitted 16 October, 2013;
originally announced October 2013.