Search | arXiv e-print repository

Positive-Unlabeled Learning with Non-Negative Risk Estimator

Authors: Ryuichi Kiryo, Gang Niu, Marthinus C. du Plessis, Masashi Sugiyama

Abstract: From only positive (P) and unlabeled (U) data, a binary classifier could be trained with PU learning, in which the state of the art is unbiased PU learning. However, if its model is very flexible, empirical risks on training data will go negative, and we will suffer from serious overfitting. In this paper, we propose a non-negative risk estimator for PU learning: when getting minimized, it is more… ▽ More From only positive (P) and unlabeled (U) data, a binary classifier could be trained with PU learning, in which the state of the art is unbiased PU learning. However, if its model is very flexible, empirical risks on training data will go negative, and we will suffer from serious overfitting. In this paper, we propose a non-negative risk estimator for PU learning: when getting minimized, it is more robust against overfitting, and thus we are able to use very flexible models (such as deep neural networks) given limited P data. Moreover, we analyze the bias, consistency, and mean-squared-error reduction of the proposed risk estimator, and bound the estimation error of the resulting empirical risk minimizer. Experiments demonstrate that our risk estimator fixes the overfitting problem of its unbiased counterparts. △ Less

Submitted 4 November, 2017; v1 submitted 1 March, 2017; originally announced March 2017.

Comments: NIPS 2017 camera-ready version (this paper was selected for oral presentation)

arXiv:1611.01586 [pdf, ps, other]

doi 10.1007/s10994-016-5604-6

Class-prior Estimation for Learning from Positive and Unlabeled Data

Authors: Marthinus C. du Plessis, Gang Niu, Masashi Sugiyama

Abstract: We consider the problem of estimating the class prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional sa… ▽ More We consider the problem of estimating the class prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional samples coming only from the positive class, the class prior of the unlabeled dataset can be estimated correctly. Our key idea is to use properly penalized divergences for model fitting to cancel the error caused by the absence of negative samples. We further show that the use of the penalized $L_1$-distance gives a computationally efficient algorithm with an analytic solution. The consistency, stability, and estimation error are theoretically analyzed. Finally, we experimentally demonstrate the usefulness of the proposed method. △ Less

Submitted 4 November, 2016; originally announced November 2016.

Comments: To appear in Machine Learning

arXiv:1605.06955 [pdf, other]

Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data

Authors: Tomoya Sakai, Marthinus Christoffel du Plessis, Gang Niu, Masashi Sugiyama

Abstract: Most of the semi-supervised classification methods developed so far use unlabeled data for regularization purposes under particular distributional assumptions such as the cluster assumption. In contrast, recently developed methods of classification from positive and unlabeled data (PU classification) use unlabeled data for risk evaluation, i.e., label information is directly extracted from unlabel… ▽ More Most of the semi-supervised classification methods developed so far use unlabeled data for regularization purposes under particular distributional assumptions such as the cluster assumption. In contrast, recently developed methods of classification from positive and unlabeled data (PU classification) use unlabeled data for risk evaluation, i.e., label information is directly extracted from unlabeled data. In this paper, we extend PU classification to also incorporate negative data and propose a novel semi-supervised classification approach. We establish generalization error bounds for our novel methods and show that the bounds decrease with respect to the number of unlabeled data without the distributional assumptions that are required in existing semi-supervised classification methods. Through experiments, we demonstrate the usefulness of the proposed methods. △ Less

Submitted 16 June, 2017; v1 submitted 23 May, 2016; originally announced May 2016.

Comments: Accepted to the 34th International Conference on Machine Learning (ICML 2017)

arXiv:1603.03130 [pdf, other]

Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning

Authors: Gang Niu, Marthinus Christoffel du Plessis, Tomoya Sakai, Yao Ma, Masashi Sugiyama

Abstract: In PU learning, a binary classifier is trained from positive (P) and unlabeled (U) data without negative (N) data. Although N data is missing, it sometimes outperforms PN learning (i.e., ordinary supervised learning). Hitherto, neither theoretical nor experimental analysis has been given to explain this phenomenon. In this paper, we theoretically compare PU (and NU) learning against PN learning ba… ▽ More In PU learning, a binary classifier is trained from positive (P) and unlabeled (U) data without negative (N) data. Although N data is missing, it sometimes outperforms PN learning (i.e., ordinary supervised learning). Hitherto, neither theoretical nor experimental analysis has been given to explain this phenomenon. In this paper, we theoretically compare PU (and NU) learning against PN learning based on the upper bounds on estimation errors. We find simple conditions when PU and NU learning are likely to outperform PN learning, and we prove that, in terms of the upper bounds, either PU or NU learning (depending on the class-prior probability and the sizes of P and N data) given infinite U data will improve on PN learning. Our theoretical findings well agree with the experimental results on artificial and benchmark data even when the experimental setup does not match the theoretical assumptions exactly. △ Less

Submitted 28 October, 2016; v1 submitted 9 March, 2016; originally announced March 2016.

Comments: NIPS 2016 camera-ready version

arXiv:1402.0288 [pdf, other]

Transductive Learning with Multi-class Volume Approximation

Authors: Gang Niu, Bo Dai, Marthinus Christoffel du Plessis, Masashi Sugiyama

Abstract: Given a hypothesis space, the large volume principle by Vladimir Vapnik prioritizes equivalence classes according to their volume in the hypothesis space. The volume approximation has hitherto been successfully applied to binary learning problems. In this paper, we extend it naturally to a more general definition which can be applied to several transductive problem settings, such as multi-class, m… ▽ More Given a hypothesis space, the large volume principle by Vladimir Vapnik prioritizes equivalence classes according to their volume in the hypothesis space. The volume approximation has hitherto been successfully applied to binary learning problems. In this paper, we extend it naturally to a more general definition which can be applied to several transductive problem settings, such as multi-class, multi-label and serendipitous learning. Even though the resultant learning method involves a non-convex optimization problem, the globally optimal solution is almost surely unique and can be obtained in O(n^3) time. We theoretically provide stability and error analyses for the proposed method, and then experimentally show that it is promising. △ Less

Submitted 3 February, 2014; originally announced February 2014.

arXiv:1305.0103 [pdf, ps, other]

Clustering Unclustered Data: Unsupervised Binary Labeling of Two Datasets Having Different Class Balances

Authors: Marthinus Christoffel du Plessis, Masashi Sugiyama

Abstract: We consider the unsupervised learning problem of assigning labels to unlabeled data. A naive approach is to use clustering methods, but this works well only when data is properly clustered and each cluster corresponds to an underlying class. In this paper, we first show that this unsupervised labeling problem in balanced binary cases can be solved if two unlabeled datasets having different class b… ▽ More We consider the unsupervised learning problem of assigning labels to unlabeled data. A naive approach is to use clustering methods, but this works well only when data is properly clustered and each cluster corresponds to an underlying class. In this paper, we first show that this unsupervised labeling problem in balanced binary cases can be solved if two unlabeled datasets having different class balances are available. More specifically, estimation of the sign of the difference between probability densities of two unlabeled datasets gives the solution. We then introduce a new method to directly estimate the sign of the density difference without density estimation. Finally, we demonstrate the usefulness of the proposed method against several clustering methods on various toy problems and real-world datasets. △ Less

Submitted 1 May, 2013; originally announced May 2013.

arXiv:1207.0099 [pdf, ps, other]

Density-Difference Estimation

Authors: Masashi Sugiyama, Takafumi Kanamori, Taiji Suzuki, Marthinus Christoffel du Plessis, Song Liu, Ichiro Takeuchi

Abstract: We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference. However, such a two-step procedure does not necessarily work well because the first step is performed without regard to the second step and thus a small error incurred in the first stage can… ▽ More We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference. However, such a two-step procedure does not necessarily work well because the first step is performed without regard to the second step and thus a small error incurred in the first stage can cause a big error in the second stage. In this paper, we propose a single-shot procedure for directly estimating the density difference without separately estimating two densities. We derive a non-parametric finite-sample error bound for the proposed single-shot density-difference estimator and show that it achieves the optimal convergence rate. The usefulness of the proposed method is also demonstrated experimentally. △ Less

Submitted 30 June, 2012; originally announced July 2012.

Showing 1–7 of 7 results for author: Plessis, M C d